Git

Introduction to file versioning. History of version control tools SCCS, RCS, CVS, Subversion, BitKeeper, and Git. Funny story.

Article Collection

This article is part of the following series:

1. Git

Introduction

When programming, writing system configurations, or even editing word documents, we want to be able to move back and forth between revisions, compare changes, save certain versions, and so on. Similarly, we often want to keep additional metadata that tells us when, why, and what was changed, and who made the changes.

Detailed tracking of changes in source code has been a common need and practice since the early days of computing. One of the first tools for this purpose, called the Source Code Control System (SCCS), was developed in 1972 by the early developers working on Unix.

Using a version control system typically provides the following benefits, listed in no particular order:

It serves as a form of simple, immediate file backup (can restore deleted or incorrectly edited files)
Provides access to all current and earlier revisions of files
Allows moving back and forth between versions, comparing changes, etc.
If needed, makes it easier to work with multiple versions of files concurrently
Keeps track of metadata – e.g. when, why, and who changed particular files
Enables collaboration between individuals and groups working on the same project

Being familiar with the concept and practice of version control is a must for any person working in IT.

In this article we are going to describe the evolution of tools that were historically used for this purpose – SCCS, RCS, CVS, Subversion, BitKeeper, and finally Git.

Git was developed by Linus Torvalds in 2005 and has been an undisputed standard since then.

Brief Historical Overview

Source Code Control System (SCCS)

SCCS was one of the first tools developed to track changes in files during software development. It allowed users to retrieve any previous version of the files it managed. It was developed at Bell Labs beginning in late 1972 by Marc Rochkind for an IBM System/370 computer running OS/360.

It kept changes between versions in so-called “interleaved deltas”, removing the need to store complete files on every change.

Another feature of SCCS, also inherited by its immediate successors, was that it supported the so-called sccsid strings. A sccsid string, @(#), would be placed anywhere in a source file and SCCS would then automatically manage and update it, exposing the revision information as text in files. The info contained the name, date, and optional comment (usually author) of the current version of the file. The string @(#) was chosen as a sequence of characters that otherwise almost never appears in normal code.

The sccsid strings could be included anywhere, be it in comments or variables. In compiled programs it was customary to store them in variables so that they would remain present in the compiled object files. Here is an example from the C program ls:

static char sccsid[] = "@(#)ls.c        8.1 (Berkeley) 6/11/93";

With sccsid strings preserved in object files, it was easy to quickly grep and discover the exact versions of files included in compilation.

Although sccsids are an archaic feature, searching for sccsid strings in a relatively modern /bin/ directory still shows some results:

strings -f -- /bin/* | grep '@(#)'

bash: @(#)Bash version 5.0.3(1) release GNU
crontab: @(#) Copyright 1988,1989,1990,1993,1994 by Paul Vixie
file: @(#)$File: file.c,v 1.178 2018/10/01 18:50:31 christos Exp $
file: @(#)$File: seccomp.c,v 1.7 2018/09/09 20:33:28 christos Exp $
gprof: @(#) Copyright (c) 1983 Regents of the University of California.
lorder: #       @(#)lorder.sh   8.1 (Berkeley) 6/6/93
pidof: @(#)killall5 2.86 31-Jul-2004 miquels@cistron.nl

Revision Control Software (RCS)

RCS was first released in 1982 by Walter F. Tichy at Purdue University.

RCS’ improvement over SCCS was that it did not store every revision as an “interleaved delta”. Instead, it stored a set of edit instructions that produced an earlier or later version of the file. Supposedly this was faster for most cases.

The program still had critical drawbacks for today’s standards:

Only one user was able to edit files concurrently
It only worked on individual files; there was no concept of a “project” with files that are related
It was not possible to modify multiple files as part of a single change; every modified file was an isolated change

In addition, RCS did not have a database. It worked by renaming files and modifying their content. The revision history of each file was kept in a file of the same name with suffix ,v. So when a file was added to RCS (i.e. when it was “checked-in”), RCS renamed it to “filename,v”, modified its content to include metadata at the top, and deleted the original file.

The commands for working with files in RCS were named ci and co (for “check-in” and “check-out”). The ci, which was easily produced as a misspelling of vi, combined with the renaming and mangling of file content, often resulted in poor user experience.

Concurrent Versions System (CVS)

CVS was created in 1986 by Dick Grune. It was then reworked and released as Free Software in 1990 by Brian Berliner and became the first widely-deployed VCS, used notably for development of much of Free Software.

CVS was developed as a frontend for RCS and brought numerous improvements:

It used delta compression for storing different versions of text files, while binary files were kept on the server in full for each version.

It expanded the concept of version control by introducing the idea of “branches”, “modules” (groups of files, as opposed to single files in RCS), and “repositories” (groups of modules).

It enabled the client-server model — it kept metadata in a dedicated subdirectory named .cvs and exposed user functionality through the main command cvs and its subcommands (e.g. cvs ci or cvs co).

And it also enabled concurrent editing by multiple users in a simple, yet effective way – it always accepted only the most recent version of files, forcing developers to keep their working copies up-to-date before making changes. The task of merging the differences between versions was done automatically, requiring manual intervention only on conflicts.

CVS still had critical drawbacks and inconveniences:

While the addition of client-server model was necessary and welcome, most operations required on-line connection to the server, making CVS slow or impossible to use when offline
Checkouts of modules from the repository were intended for end user consumption. They weren’t complete copies of the upstream modules, so it was not possible to make further checkouts from them. All checkouts had to be done from the central repository directly
Like RCS, CVS did not support making atomic changes to multiple files. Even though files were grouped, each file was managed individually with its own version and history
The commands were sometimes dangerously careless in regards to preserving the content of files. For example, if a module update was attempted after access was removed on the server side, CVS destroyed the local repository along with any local changes

As Linus Torvalds said in a 2007 interview:

Take the Concurrent Versions System (CVS) as an example of what not to do; if in doubt, make the exact opposite decision. — Linus Torvalds

However, notably, CVS was used to enable SourceForge, one of the first code hosting platforms to offer free, centralized online location for development of free and open source software.

Subversion (SVN)

Subversion was released in the year 2000 by CollabNet and, although it was a completely separate system that brought numerous improvements over CVS, it remained logically compatible with it.

It kept its internal files in a dedicated subdirectory named .svn. It exposed user functionality through commands svnadmin, svn (e.g. svn commit | svn ci or svn checkout | svn co), and others prefixed with svn.

It removed support for modules, keeping files directly in repositories. But otherwise it shared the same problem as CVS — most operations required on-line access to the central repository, making them slow or impossible to use when offline.

BitKeeper (BK)

BitKeeper was a proprietary, distributed revision control software first released in 2000 by BitMover Inc.. It was designed by Larry McVoy, who previously worked at Silicon Graphics and Sun Microsystems.

Development of the Linux kernel was initially done without version control software, and the first mentions of possibly using the upcoming BitKeeper product were voiced at the end of 1998.

BitMover provided free versions of BitKeeper to some Free Software projects, including the Linux kernel. In 2002 a decision was made to use BitKeeper for Linux kernel development.

Because of numerous problems, such as the terms of the license, incomplete functionality in the free version of BitKeeper, attempts by the open source community to produce enhancements which irritated BitMover, meta-information stored on BitMover’s servers, commercial considerations, etc., in 2005 BitMover announced the end of free availability of BitKeeper, which ended its use for the Linux kernel development.

Git was developed by Linus Torvalds in 2005 to support the development of the Linux kernel.

The implementation was incredibly quick — it took only 2.5 months from the start of development to Git managing the release of the next Linux kernel version, 2.6.12.

Because the tool was developed quickly, in the beginning it was not user friendly. Separate wrappers were often used to manage Git repositories instead of using Git directly. That has since changed and now Git is used directly through its toplevel command git.

Overall, Git is fast, maintains data integrity, supports distributed non-linear workflows (thousands of parallel branches on different computers), and in general represents modern development in version control systems.

Git revolutionized the approach to version control and has become the new standard for all of the IT industry.

Alternatives - Darcs and Mercurial

Darcs and Mercurial have often been cited as viable alternatives to Git, although they did not gain notable market share.

Funny Story - Atria ClearCase (CC)

Historically, in addition to free or open source choices, “enterprise” environments also had their own revision control software products.

In 2001 I had to use a proprietary product Atria ClearCase for version control on a project.

The software was a behemoth and often got in the way. It required a kernel module to run, and the module for Linux was only available for version 2.2.16.

One day I went to the Internet in search of other people’s comments on ClearCase. I happened to find a page written by someone frustrated by various “enterprise software” that was popular at the time. The page was red with big yellow text, titled something like “Top 10 worst enterprise software in the universe” in uppercase. The author provided a numbered list, mentioning each software by name and describing his experience with it.

Atria ClearCase was featured. His comment was something like:

Last week we had to install Atria ClearCase on two computers in the office. On the first computer it was a total disaster — nothing worked. On the second computer it was an even bigger disaster — IT WORKED!!!

I also found a more recent, similar testament to Atria ClearCase in an article on ClearCase by Erik Dietrich.

I confirm that in Atria ClearCase it was possible to lose files, which in some cases you could recover back to /lost+found/ (the only software to date I’ve seen doing that, other than the filesystem utilities.)

Atria ClearCase was later sold to Rational Software. Strange, as its case was neither clear nor rational.

Article Collection

This article is part of the following series:

1. Git

Automatic Links

The following links appear in the article:

Git - Introduction and Historical Overview

Article Collection

1. Git

Table of Contents

Introduction

Brief Historical Overview

Source Code Control System (SCCS)

Revision Control Software (RCS)

Concurrent Versions System (CVS)

Subversion (SVN)

BitKeeper (BK)

Git

Alternatives - Darcs and Mercurial

Funny Story - Atria ClearCase (CC)

Links

Article Collection

1. Git

Automatic Links