Table of Contents
When programming, writing system configurations, or even editing word documents, we want to be able to move back and forth between previous revisions, compare changes, tag certain versions, and so on. Similarly, we want to keep additional metadata that tells us when, why, and what was changed, and who made the changes.
Detailed tracking of changes in code has been a standard requirement since the early days of computing. One of the first tools for this purpose, called the Source Code Control System (SCCS), was developed in 1972 by the early developers working on Unix.
Being familiar with the concept of version control and a particular tool called Git (which is in widespread use today) is a must for any person working in IT.
Using a version control system typically provides the following benefits (listed in no particular order):
- Serves as a form of simple, immediate file backups, if we incorrectly edit files or mistakenly delete them
- Provides access to all current and earlier revisions of files in a repository, be it text or binary
- Allows moving back and forth between versions, comparing changes, etc.
- If needed, makes it easier to work with multiple versions of files concurrently
- Keeps track of metadata - e.g. when, why, and who changed particular files
- Enables collaboration between individuals and groups working on the same project
Brief Historical Overview
Source Code Control System (SCCS)
SCCS was one of the first tools developed to track changes in files during software development. It allowed users to retrieve any previous versions of the files it managed. It was developed at Bell Labs beginning in late 1972 by Marc Rochkind for an IBM System/370 computer running OS/360.
It kept changes between versions in so-called “interleaved deltas”, removing the need to store complete files on every change and taking up a lot of disk space.
Another feature of SCCS, which was inherited by its immediate successors, was that it supported the so-called sccsid strings. Those strings could be embedded in source files and were then automatically updated by SCCS. They contained the name, date, and optional comment (e.g. author) of the current version of the file. For example:
static char sccsid = "@(#)ls.c 8.1 (Berkeley) 6/11/93";
As visible, the content was stored in variables (typically in the C programming language) and it remained present in the compiled versions of files (object files). The string “@(#)” was chosen as a convention to represent a sequence rarely found in normal code, making it easy to find SCCS strings in binary files and determine the exact versions of files that were used during compilation.
Although sccsid is an archaic feature, searching for sccsid strings in a relatively modern
directory still shows some results:
strings -f -- /bin/* | grep '@(#)' bash: @(#)Bash version 5.0.3(1) release GNU crontab: @(#) Copyright 1988,1989,1990,1993,1994 by Paul Vixie file: @(#)$File: file.c,v 1.178 2018/10/01 18:50:31 christos Exp $ file: @(#)$File: seccomp.c,v 1.7 2018/09/09 20:33:28 christos Exp $ gprof: @(#) Copyright (c) 1983 Regents of the University of California. lorder: # @(#)lorder.sh 8.1 (Berkeley) 6/6/93 pidof: @(#)killall5 2.86 31-Jul-2004 firstname.lastname@example.org
Revision Control Software (RCS)
RCS was first released in 1982 by Walter F. Tichy at Purdue University.
An improvement of RCS over SCCS was that it did not store every revision as “interleaved delta”, but it stored a set of edit instructions that produced an earlier or later version of the file. Supposedly this was faster for most cases.
The program still had critical drawbacks for today’s standards:
- Only one user was able to edit files concurrently
- The tool only worked on individual files, not repositories of files
- Thus, there was no concept of “projects” containing multiple files
- Thus, it was not possible to modify multiple files as part of a single change
In addition, it relied on naming files in a particular way — it did not have a database, so revision history of files known to the system was stored in files having “,v” at the end of their name. When a file was added to RCS (i.e. when it was “checked-in”), RCS renamed it to “,v”, modified its content to include metadata at the top. and deleted the original file.
The commands for working with files in RCS were named
co (for “check-in” and “check-out”). The
ci, which was
easily produced as a misspelling of
vi, combined with the consequent renaming and mangling of file content, often
resulted in poor user experience.
Concurrent Versions System (CVS)
CVS as a project began in 1986 by Dick Grune. It was reworked and released as Free Software in 1990 by Brian Berliner.
CVS was developed as a frontend for RCS and it brought numerous improvements. It was the first widely-deployed VCS, used notably for development of much of Free Software.
CVS used delta compression for storing different versions of textual files, while binary files were kept on the server in full for each version.
It expanded the concept of version control by introducing the idea of “branches”, “modules” (groups of files, as opposed to single files in RCS), and “repositories” (groups of modules).
It also enabled the client-server model, it kept its internal files in a dedicated subdirectory named
.cvs, and it exposed user
functionality behind the main command
cvs and its subcommands (e.g.
cvs ci or
Finally, it also enabled concurrent editing by multiple users by accepting only the most recent version of the file, expecting developers to keep their working copies of files up-to-date with other people. The task of merging non-conflicting differences between versions was done automatically, requiring manual intervention only on conflicts.
CVS still had critical drawbacks and inconveniences:
While the addition of client-server model was necessary and welcome, most operations required on-line connection to the server, making them slow, or impossible to use when offline.
Checkouts of modules from the repository were intended for development. They weren’t complete copies of the upstream modules, so it was not possible to make further checkouts from them. All checkouts had to be done from the central repository directly.
Like RCS, CVS did not support making atomic changes to multiple files. Even though files were grouped, each file had its own version and was managed individually.
The commands were sometimes dangerously careless in regards to preserving the content of files. For example, if a module update was attempted after access was removed on the server side, CVS destroyed the local repository, regardless of local changes to files.
As Linus Torvalds said in a 2007 interview:
Take the Concurrent Versions System (CVS) as an example of what not to do; if in doubt, make the exact opposite decision. — Linus Torvalds.
However, notably, CVS was used to enable SourceForge, one of the first web services to offer free, centralized online location for development of open-source software.
Subversion was released in the year 2000 by CollabNet and, although it was a completely separate system that brought numerous implementation improvements over CVS, it remained logically compatible with it.
It kept its internal files in a dedicated subdirectory named
.svn. It exposed user functionality through commands
svn commit | svn ci or
svn checkout | svn co), and others prefixed with
It removed support for modules, keeping files directly in repositories. But otherwise it shared the same problem as CVS — most operations required on-line access to the central repository, making them slow or impossible to use when offline.
BitKeeper was a proprietary, distributed revision control software, that was first released in 2000 by BitMover Inc.. It was designed by Larry McVoy, who previously worked at Silicon Graphics and Sun Microsystems.
Development of the Linux kernel was initially done without version control software, and the first mentions of possibly using the upcoming BitKeeper product were voiced at the end of 1998.
BitMover provided free versions of BitKeeper to some Free Software projects, including the Linux kernel. In 2002 a decision was made to use BitKeeper for Linux kernel development.
Because of numerous problems, such as the terms of the license, incomplete functionality in the free version of BitKeeper, attempts by the open source community to produce enhancements, meta-information stored on BitMover’s servers, commercial considerations, etc., in 2005 BitMover announced the end of free availability of BitKeeper, which ended its use for the Linux kernel development.
Git was developed by Linus Torvalds in 2005 to support the development of the Linux kernel.
The implementation was incredibly quick — it took only 2.5 months from the start of development to Git managing the release of the next Linux kernel version, 2.6.12.
Although (or because) the tool was developed quickly, in the beginning it was not user friendly. User-friendly wrappers were often used to manage Git repositories instead of using Git directly.
Overall, Git is fast, maintains data integrity, supports distributed non-linear workflows (thousands of parallel branches on different computers), and in general represents modern development in version control systems.
Alternatives - Darcs and Mercurial
Funny Story - Atria ClearCase (CC)
In 2001 I had to use a proprietary product Atria ClearCase for version control on a project.
The software was a behemoth. As incredible as it sounds, it required a kernel module to run, and the module for Linux was only available for version 2.2.16. My workstation running Debian GNU at the time had Linux 2.2.19 (I did make it work, though).
One day I went to the Internet in search of other people’s comments on ClearCase. I happened to find a page written by someone frustrated by various “enterprise software” that was popular at the time.
The page was red with big yellow text, titled similar to “Top 10 worst enterprise software in the universe” in uppercase. The author provided a numbered list, mentioning software name and a summary of his experience with each.
Atria ClearCase was featured in the list. His comment was something like:
Last week we had to install Atria ClearCase on two computers in the office. On the first computer, total disaster — nothing worked. On the second computer, even bigger disaster — IT WORKED!!!
A similar testament to Atria ClearCase can be seen in an article on ClearCase by Erik Dietrich.
I confirm that in Atria ClearCase it was possible to lose files, which in some cases you could recover back to
/lost+found/ (the only software to date I’ve seen doing that, other than the filesystem utilities.)
Atria ClearCase was later sold to Rational Software. Strange, as its case was neither clear nor rational.
- I don’t recall, did Subversion remove support for “modules” and keep files directly in “repositories”?
- Similarly, is Subversion’s commit hash global per repository, or per directory?
If you can answer some of the questions above or would like to suggest updates to the article, please comment below.
This article is part of the following series:
- Part 1: Git - Introduction and Historical Overview (this article)