Fossil Concepts
1.0 Introduction
[./index.wiki | Fossil] is a
[http://en.wikipedia.org/wiki/Software_configuration_management | software configuration management] system.
Fossil is software that is designed to control and track the
development of a software project and to record the history
of the project.
There are many such systems in use today. Fossil strives to
distinguish itself from the others by being extremely simple
to setup and operate.
This document is intended as a quick introduction to the concepts
behind fossil.
2.0 Composition Of A Project
A software project normally consists of a "source tree".
A source tree is a hierarchy of files that are used to generate
the end product. The source tree changes over time as the
software grows and expands and as features are added and bugs
are fixed. A snapshot of the source tree at any point in time
is called a "version" or "revision" or a "baseline" of the product.
In fossil, we use the name "check-in".
A "repository" is a database that contains copies of all historical
check-ins for a project. Check-ins are normally stored in the
repository in a highly space-efficient compressed format (delta encoding).
But that is an implementation detail that you the user need not worry over.
Think of the repository as a safe place where all your old check-ins are
securely stored away and available for retrieval whenever you need
them.
A repository in fossil is a single file on your disk. This file
might be rather large (dozens or hundreds of megabytes for a large
or long running project) but it is nevertheless just a file. You
can move it around, rename it, write it out to a memory stick, or
do anything else you normally do with files.
Each source tree that is controlled by fossil is associated with
a single repository on the local disk drive. You can tie two or more
source trees to a single repository if you want (though one
tree per repository is the most common configuration.) So a
single repository can be associated with many source trees, but
each source tree is associated with only one repository.
Fossil source trees may not overlap. A fossil source tree is identified
by a file named "_FOSSIL_" in the root directory of the source tree. Every
file that is a sibling of _FOSSIL_ and every file in every subfolder is
considered potentially a part of the source tree. The _FOSSIL_ file
contains (among other things) the pathname of the repository with which
the source tree is associated. On the other hand, the repository has
no record of its source trees. So you are free to delete a source tree
or move it around without consequence. But if you move or rename or
delete a repository, then any source trees associated with that repository
will no longer be able to locate their repository and will stop working.
When multiple developers are working on the same project, each
developer typically has his or her own local repository and an associated
source tree in which to work. Developers share their work by
"syncing" the content of their local repositories either directly
or through a central server. Changes can "push" from the local
repository into a remote repository. Or changes can "pull" from a
remote repository into a local repository. Or one can do a "sync"
which is a shortcut for doing both a push and a pull at the same time.
Fossil also has the concept of "cloning". A "clone" is like a "pull",
except that instead of beginning with an existing local repository,
a clone begins with nothing and creates a new local repository that
is a duplicate of a remote repository.
Communication between repositories is via HTTP. Remote
repositories are identified by URL. You can also point a web browser
at a repository and get human-readable status, history, and tracking
information about the project.
2.1 Identification Of Artifacts
A particular version of a particular file is called an "artifact".
Each artifact has a universally unique name which is the
SHA1 hash of the content
of that file expressed as 40 characters of lower-case hexadecimal. Such
a hash is referred to as the Artifact Identifier or Artifact ID
for the artifact. The SHA1 algorithm is created with the purpose of
providing a highly forgery-resistant identifier for a file. Given any
file it is simple to find the artifact ID for that file. But given a
artifact ID it is computationally intractable to generate a file that will
have that Artifact ID.
Artifact IDs look something like this:
6089f0b563a9db0a6d90682fe47fd7161ff867c8
59712614a1b3ccfd84078a37fa5b606e28434326
19dbf73078be9779edd6a0156195e610f81c94f9
b4104959a67175f02d6b415480be22a239f1f077
997c9d6ae03ad114b2b57f04e9eeef17dcb82788
When referring to an artifact using fossil, you can use a unique
prefix of the artifact ID that is four characters or longer. This saves
a lot of typing. When displaying artifact IDs, fossil will usually only
show the first 10 digits since that is normally enough to uniquely
identify a file.
Changing (or adding or removing) a single byte in a file results
in a completely different artifact ID. And since the artifact ID is the name of
the artifact, making any change to a file results in a new artifact.
In this way, artifacts are immutable.
A repository is really just an unordered collection of
artifacts. New artifacts can be added to the repository, but
existing artifacts can never be removed. (Well, almost never. There
is a [./shunning.wiki | "shunning"] mechanism that allows spam or other
inappropriate content to be removed if absolutely necessary, but such
removal is discouraged.)
Fossil is designed in
such a way that it can be handed a set of artifacts in any
order and it can figure out the relationship between those
artifacts and reconstruct the complete development history of
a software project.
2.2 Manifests
At the root of a source tree is a special file called the
"manifest". The manifest is a listing of all other files in
that source tree. The manifest contains the (complete) artifact ID
of the file and the name of the file as it appears on disk,
and thus serves as a mapping from artifact ID to disk name. The artifact ID
of the manifest is the identifier for the entire check-in. When
you look at a "timeline" of changes in fossil, the ID associated
with each check-in or commit is really just the artifact ID of the
manifest for that check-in.
Fossil automatically generates a manifest whenever you "commit"
a new check-in. So this is not something that you, the developer,
need to worry with. The format of a manifest is intentionally
designed to be simple to parse, so that if
you want to read and interpret a manifest, either by hand or
with a script, that is easy to do. But you will probably never
need to do so.
In addition to identifying all files in the check-in, a
manifest also contains a check-in comment, the date and time
when the check-in was established, who created the check-in,
and links to other check-ins from which the current check-in
is derived. There is also a couple of checksums used to verify
the integrity of the check-in. And the whole manifest might
be PGP clearsigned.
2.3 Key concepts
- A check-in is a set of files arranged
in a hierarchy.
- A repository keeps a record of historical check-ins.
- Repositories share their changes using push, pull,
sync, and clone.
- A particular version of a particular file is an artifact
that is identified by an artifact ID.
- Artifacts tracked by fossil are inherently immutable.
- Fossil automatically generates a manifest file that identifies
every artifact in a check-in.
- The artifact ID of the manifest is the identifier of the check-in.
3.0 Fossil - The Program
Fossil is software. The implementation of fossil is in the form
of a single executable named "fossil" (or "fossil.exe" on windows).
To install fossil on your system,
all you have to do is obtain a copy of this one executable file (either
by downloading a
pre-compiled version
or [./build.wiki | compiling it yourself]) and then
putting that file somewhere on your PATH.
Fossil is completely self-contained. It is not necessary to
install any other software in order to use fossil. You do not need
CVS, gzip, diff, rsync, Python, Perl, Tcl, Java, apache, PostgreSQL, MySQL,
SQLite, patch, or any similar software on your system in order to use
fossil effectively. You will want to have some kind of text editor
for entering check-in comments. Fossil will use whatever text editor
is identified by your VISUAL environment variable. Fossil will also
use GPG to clearsign your manifests if you happen to have it installed,
but fossil will skip that step if GPG missing from your system.
You can optionally set up fossil to use external "diff" programs,
though fossil has an excellent built-in "diff" algorithm that works
fine for most people.
To uninstall fossil, simply delete the executable.
To upgrade an older version of fossil to a newer version, just
replace the old executable with the new one. You might need to
run "fossil all rebuild" to restructure your repositories after
an upgrade. Running "all rebuild" never hurts, so when upgrading it
is a good policy to run it even if it is not strictly necessary.
To use fossil, simply type the name of the executable in your
shell, followed by one of the various built-in commands and
arguments appropriate for that command. For example:
fossil help
In the next section, when we say things like "use the help
command" we mean to use the command name "help" as the first
token after the name of the fossil executable, as shown above.
4.0 Workflow
Fossil has two modes of operation: "autosync" and
"manual-merge"
Autosync mode is reminiscent of CVS or SVN in that it automatically
keeps your changes in synchronization with your co-workers through
the use of a central server. The manual-merge mode is the standard workflow
for GIT or Mercurial in that your local repository develops
independently of your coworkers and you share and merge your changes manually.
An interesting feature of fossil is that it supports both autosync
and manual-merge work flows.
The default setting for fossil is to be in autosync mode. You
can change the autosync setting or check the current autosync
setting using commands like:
fossil setting autosync on
fossil setting autosync off
fossil settings
By default, fossil runs with autosync mode turned on. The
authors finds that projects run more smoothly in autosync mode since
autosync helps to prevent pointless forking and merge and helps keeps
all collaborators working on exactly the same code rather than on their
own personal forks of the code. In the author's view, manual-merge mode
should be reserved for disconnected operation.
4.1 Autosync Workflow
-
Establish a local repository using either the new command
to start a new project, or the clone command to make a clone
of a repository for an existing project.
-
Establish one or more source trees using
the open command with the name of the repository file as its
argument.
-
The open command in the previous step populates your local source
tree with a copy of the latest check-in. Usually this is what you want.
In the rare cases where it is not, use the update command to
switch to a different check-in.
Use the timeline or leaves commands
to identify alternative check-ins to switch to.
-
Edit the code. Add new files to the source tree using the add
command. Omit files from future check-ins using the rm command.
(Even when you remove files from future check-ins, those files continue
to exist in historical check-ins.) Test your changes.
-
Create a new check-in using the commit command. You will be prompted
for a check-in comment and also for your GPG key if you have GPG installed.
The commit copies the edits you have made in your local source
tree into your local repository. After your commit completes, fossil will
automatically push your changes back to the server
you cloned from or whatever server you most recently synced with.
-
When your coworkers make their own changes, you can merge those changes
into your local local source tree using the update command.
In autosync mode, update will first go back to the server you
cloned from or with which you most recently synced, and pull down all
recent changes into your local repository. Then it will merge recent
changes into your local source tree. If you do an update and
find that it messes something up in your source tree (perhaps a co-worker
checked in incompatible changes) you can use the undo command
to back out the changes.
-
Repeat all of the above until you have generated great software.
4.2 Manual-Merge Workflow
When autosync is disabled, the commit command is decoupled from
push and the update command is decoupled from pull.
That means you have to do a few extra steps in order to accomplish the
push and pull tasks manually.
-
Establish a local repository using either the new command
to start a new project, or the clone command to make a clone
of a repository for an existing project. The default setting for
a new repository is with autosync on, so you will need to turn it off
using the setting autosync off command with a -R option
to specify the repository.
-
Establish one or more source trees by changing your working directory
to where you want the root of the source tree to be, then issuing
the open command with the name of the repository file as its
argument.
-
The open command in the previous step populates your local source
tree with a copy of the latest check-in. Usually this is what you want.
In the rare cases where it is not, use the update command to
switch to a different check-in.
Use the timeline or leaves commands
to identify alternative check-ins to switch to.
-
Edit the code. Add new files to the source tree using the add
command. Omit files from future check-ins using the rm command.
(Even when you remove files from future check-ins, those files continue
to exist in historical check-ins.) Test your changes.
-
Create a new check-in using the commit command. You will be prompted
for a check-in comment and also for your GPG key if you have GPG installed.
The commit copies the edits you have made in your local source
tree into your local repository.
-
Use the push command to push your changes out to a server
where your co-workers can access them.
-
When co-workers make their own changes, use the pull command
to pull those changes into your local repository. Note that pull
does not move the changes into your local source tree, only into your
local repository.
-
Once changes are in your local repository, use
use the update command to merge them to your local source tree.
If you merge in some changes and find that the changes do not work out
or are not to your liking, you can back out the changes using the
undo command.
-
If two or more people ran "commit" against the same check-in, this will
result in a [./branching.wiki | fork] which you may want to resolve by
running merge followed by another commit.
-
Repeat all of the above until you have generated great software.
5.0 Setting Up A Fossil Server
With other configuration management software, setting up a server is
a lot of work and normally takes time, patience, and a lot of system
knowledge. Fossil is designed to avoid this frustration. Setting up
a server with fossil is ridiculously easy. You have three options:
- Setting up a stand-alone server
From within your source tree just use the server command and
fossil will start listening for incoming requests on TCP port 8080.
You can point your web browser at
http://localhost:8080/ and begin exploring. Or your coworkers
can do pushes or pulls against your server. Use the --port
option to the server command to specify a different TCP port. If
you do not have a local source tree, use the -R command-line
option to specify the repository file.
A stand-alone server is a great way to set of transient connections
between coworkers for doing quick pushes or pulls. But you can also
set up a permanent stand-alone server if you prefer. Just make
arrangements for fossil to be launched with appropriate arguments
after every reboot.
If you just want a server to browse the built-in fossil website
locally, use the ui command in place of server. The
ui command starts up a local server too, but it also takes
the additional step of automatically launching your webbrowser and
pointing at the new server.
- Setting up a CGI server
If you have a web-server running on your machine already, you can
set up fossil to be run from CGI. Simply create an executable script
that looks something like this:
#!/usr/local/bin/fossil
repository: /home/me/bigproject.fossil
Edit this script to use whatever pathnames are appropriate for
your project. Then point your web browser at the script and off you
go. The [./selfhost.wiki | self-hosting fossil repositories] are
all set up this way.
- Setting up an inetd server
If you have inetd or xinetd running on your system, you can set
those services up to launch fossil to deal with inbound TCP/IP connections
on whatever port you want. Set up inetd or xinetd to launch fossil
like this:
/usr/local/bin/fossil http /home/me/bigproject.fossil
As before, change the filenames to whatever is appropriate for
your system. You can have fossil run as any user that has write
permission on the repository and on the directory that contains the
repository. But it is safer to run fossil as root. When fossil
sees that it is running as root, it automatically puts itself into
a chroot jail and
drops all privileges prior to reading any information from the client.
Since fossil is a stand-alone program, you do not need to put anything
in the chroot jail with fossil in order for it to do its job.
6.0 Review Of Key Concepts
- The fossil program is a self-contained stand-alone executable.
Just put it somewhere on your PATH to install it.
- Use the clone or new commands to create a new repository.
- Use the open command to create a new source tree.
- Use the add and rm or delete commands to add and
remove files from the local source tree.
- Use the commit command to create a new check-in.
- Use the update command to merge in changes from others.
- The push and pull commands can be used to share changes
manually, but these things happen automatically in the default
autosync mode.