This chapter will get you started with the concept of version control, and explain why it is indispensable for anybody working with files, regardless of the project. It will introduce the core features of version control in general, and the basics and key differences between centralized, and distributed version control. Finally, we will get Bazaar up and running on your system, learn the very basics of the command-line and graphical interfaces, and how to get help using the built-in documentation.
The following topics will be covered in this chapter:
What is a version control system and why you should care
What is centralized version control
What is distributed version control
What is Bazaar
How to install Bazaar and its plugins
How to interact with Bazaar using the command-line interface
How to interact with Bazaar using the graphical interface
How to upgrade Bazaar
How to uninstall Bazaar
How to get help
A version control system (VCS) is essentially a tool to organize and track the history of changes to files in a project. This is more than just good book-keeping. A version control system can change the way you work and make you more productive. How, exactly? This will become clearer after considering the core features of a version control system and its implications.
A version control system enables you to record your changes to the files in a project, effectively building up a history of revisions. Having a complete history of changes in your project enables you to switch back-and-forth between revisions if you need to. For example:
Restoring a file to a previous state; for example, to the point right before you deleted something important from it
Restoring files or directories that you deleted at some point of time in the past
Undoing changes introduced by specific revisions, affecting one or more files
These are the most obvious benefits of keeping the history. However, there is a very powerful hidden benefit too—knowing that you can easily switch back to any previous state liberates your mind from worries that you might break something. Being able to return to any previous state means that you cannot really break anything. Once a revision is recorded in the history, you can always return to that state. Revisions are like snapshots, or milestones that you can return to anytime.
As a consequence, you can go ahead and make even drastic changes with bold confidence. This is a crucial point. This key feature enables you to focus on the real work itself, without the fear of losing anything.
Have you ever made a copy of a file or a directory and added a timestamp to the original one, so that you could make experimental changes? With a version control system, you can stop making copies and avoid getting lost in the sea of timestamped files and directories. You are free to experiment, knowing that you can return to any previous state at any time.
Having a full history of revisions is one thing. It is also important to have a simple way of viewing the history of changes; for example, an overview of what has changed from revision to revision, as follows:
This way, in case you need to retrieve something from a past revision, the log messages help to identify the exact point to jump to in the history. In a version control system, this typically works by entering a brief summary when recording a new revision. Often, the easiest way to find a particular past revision is by reading or searching the log of these summary messages, which should serve as a readable timeline or "changelog" of the project.
Being able to view files at any past state is great, but often what is even more interesting is the difference between two states. With a version control system, it is possible to make comparisons between any two states of specific files, directories, or the entire project. For example, the difference between two revisions of a text file can be displayed as follows:
Let's call the compared revisions base and target. The left-hand side shows the file as it was at the base revision, while the right-hand side is at the target revision. The coloring indicates what has changed, going from the base state to the target state:
Lines with the red background in the left panel have been deleted
Lines with the green background in the right panel have been added
Lines with the blue background in both the panels have been changed; the changed part is highlighted with a deeper shade of blue
However, this kind of a detailed view of the differences is only possible for text files. In case of binary files, such as images, Word, or Excel files, the differences are binary and therefore are not human readable. In case of these and other binary formats, the only way to see the differences is to open both revisions of the file, and to compare them side by side.
Viewing the differences is most useful in projects with mostly plaintext files, such as software source code, system administration scripts, or other plaintext documents.
Being able to revert a project's files to any previous state gives you the freedom to make bold changes. What is even better, though, is if instead of completely undoing a set of experimental changes, you can work on multiple experimental improvements or ideas in parallel and switch between them easily.
Take, for example, a software project that is stable and works well at revision X. After revision X, you can start working on a new feature. As you progress, you can record a few revisions, but the feature is not complete yet. In fact the software is not stable at the moment until you finish the feature. At this point, the revision history may look something similar to the following:
During this time, users use the stable version of the software based on revision X, and discover a serious problem that had been overlooked. Your current version of the project is incomplete, but you must fix the problem urgently and release a new stable version of the software. What can you do?
One solution is to revert to revision X, fix the problem, release the fixed version for the users, restore your work on the new improvement, and continue. While this is possible and the version control system helps by minimizing your effort, this solution is tedious and makes the revision history confusing to follow:
Effectively, we have confined ourselves to a linear history. Although it works, the result is awkward. Also, keep in mind that at some point you will want to reach a state that includes both the completed new improvement and the bugfix you did in revision Y, further confounding the revision history.
A much better and more natural solution is to break the linearity of the history and introduce a new branch, as follows:
That is, instead of reverting your ongoing work on the new feature, create a new branch that is isolated from your current work and fix the problem of the stable version in that branch. A version control system can do this efficiently, using minimal additional disk space in the process.
Now, you have two parallel versions of the project—one that is stable and another that is a work in progress. The version control system makes it easy to switch between the two. You could have even more branches if needed. In reality, it is all too common that your current work must be interrupted for some reason, and branching is a practical solution in many situations. For example:
You realize that you need more input from a colleague or another department to complete the current improvement you are working on
A high priority task has come up that you have to switch to immediately
You realize that your current approach might not be the best solution and you would like to try another method without throwing away what you've done so far, reserving the possibility to return later if needed
Our work is interrupted every day. Being able to work on multiple branches and switch between them easily can help a lot, minimizing the impact of interruptions and thereby saving us time and increasing our productivity.
Although being able to work on branches is great, what is even more important is bringing the various branches together, which is called merging. In the preceding examples and in most practical situations, having multiple branches is not the end goal, and most of the time, branches are temporary and short-lived. The end goal is to have all the improvements done on a project, unified in a single place, on a single branch, as follows:
Revision Z is the result of merging the two branches—the stable branch and the branch of the completed new improvement, and it should include all the work done in these branches.
Merging is a complicated and error-prone operation. It is an important job of a version control system to make merging as painless as possible, and intelligently apply the changes that were recorded in the branches you are trying to merge. However, when there are conflicting changes in two branches; for example, one branch modified a file and another branch deleted the same file, then the version control system cannot possibly figure out the right thing to do. In such relatively rare cases, a user must manually resolve the conflict.
Branching and merging does not have to be an advanced operation reserved for power users. A version control system can make this relatively easy and natural. Once you become comfortable with this feature, it will boost your productivity, allowing you to work on multiple ideas in parallel in an organized way. Branching and merging are especially crucial in collaboration. Without branching and merging, it is not possible to work in parallel; collaborators will have to work in lockstep, with only one person recording new revisions at the same time, which can be inefficient and unnatural.
There are many acronyms and names related to version control that can be confusing sometimes, so it's probably worth clarifying them here:
Centralized version control systems were created to make it possible for multiple collaborators to work on projects together. In these systems, the history of revisions is stored on a central server, and all the version control operations by all collaborators must go through this server. If a collaborator records a new revision, then all other collaborators can download and apply the revision in their own environments to update their project to the same state as the central server:
To avoid conflicting changes on the same file by multiple collaborators, such as concurrent modifications to the same lines, collaborators have to work in lockstep—after collaborator A has made some changes, collaborator B must first download those changes before he can add any new changes of his own.
Thanks to its simplicity, this is still a very popular workflow today, used by many large and famous projects and organizations. However, despite their popularity, centralized systems have serious drawbacks:
Network access to the central server is required for all the operations that change the repository or access the revision history. As such, network outage and slowness can seriously impact productivity.
The central server is a single point of failure—if the server is unavailable or lost, so is the revision history of the entire project.
Administrative overhead—to prevent unauthorized access, user account and permission management must be configured and maintained.
Distributed version control systems were created to make collaboration possible without a central server, and thus overcome many of the common issues with CVCS. This can work based on a few core principles:
Each collaborator has the full revision history
Collaborators can branch and merge from each other easily
The result is an architecture where there is no technical center, and any participant can potentially be the center:
Instead of a central server with the complete revision history, each collaborator has the full history in his/her own personal branches. Although technically there is no need for a central server, typically there is a designated common "official" public branch aggregating the work of all collaborators:
In general, a DCVS can do everything that a CVCS can, and enable many additional features. One of the most interesting added features is the many possible workflows for exchanging revisions between collaborators, such as:
Merging revisions peer-to-peer
Centralized—a branch is designated as the "official" branch, which can be used by collaborators in exactly the same way as in centralized version control
Centralized with gatekeepers—the "official" branch is accessible by designated maintainers of the project, who merge changes peer-to-peer and publish releases in the "official" branch
Distributed version control is especially suitable for large teams with physically disconnected collaborators, such as most open source projects. However, it can be just as useful at smaller scales too, even in a solo project.
Distributed version control has important implications in terms of keeping backups of the project. By design, it is very easy to replicate the full revision history on a remote location or even on a local backup disk, thus providing a simple and consistent backup method. Considering that every collaborator begins working on new revisions by first grabbing the full history of the project, the vast majority of the revision history is very difficult to lose; the full history can only get lost if all the collaborators lose all their work. On the other hand, since the changes of all the collaborators are not necessarily at a single central location but distributed across all their local environments, there is also no single place to back up all the work done in the project. Thus, it is up to each individual collaborator to make sure that their local changes don't get lost before they are merged into the official branch or into other collaborator branches. Fortunately, this is not difficult to achieve, and we will provide examples to demonstrate how you can enjoy the benefits of distributed version control and at the same time stay safe by replicating your new revisions at another location.
Bazaar is a distributed version control system, and as such one of the most powerful version control tools that exists today. At the same time, it is friendly, flexible, consistent, and easy to learn. It can be used effectively from very small solo projects, to very large distributed projects, and everything else in between.
Bazaar is written in Python, it is open source and completely free, and is an official GNU project, licensed under GPLv2. It is sponsored by Canonical, and used by many large projects, such as the Ubuntu operating system, Launchpad, MySQL, OpenStack, Inkscape, and many others. The official website for hosting Bazaar projects is Launchpad (http://launchpad.net/), where you can find many interesting projects that use Bazaar.
This book will explain how to make the most out of version control, and how to accomplish all the features outlined earlier with Bazaar and much more. The next chapters will explain how to use Bazaar in increasingly advanced use cases. Each scenario will build on the previous one, gradually revealing the added benefits of each increasingly sophisticated setup, and how they will improve your productivity, whether you are working solo or as part of a large team.
Bazaar is implemented in Python, therefore it should work on any system where a supported version of Python is installed (2.4, 2.5, 2.6, or 2.7). The core module of Bazaar consists of
bzrlib, a Python library that implements the core functionality of Bazaar, and
bzr, the command-line interface. Bazaar is highly extensible, and a wide selection of official and unofficial plugins exist enriching its functionality. For the purpose of this book, the most important plugins to include are:
On Windows and Mac OS X, the official installer includes the core module and a good selection of commonly used plugins by default. On GNU/Linux and other systems, the core module and each plugin are packaged separately, and you must install them individually.
Visit the official download page to find the right installer and installation instructions for your system at http://wiki.bazaar.canonical.com/Download.
Here, we explain only the most typical and simple installation options. For more advanced scenarios, please refer to the download page for details.
GNU/Linux distributions include Bazaar in their official binary repositories, in the package
bzr. This package typically includes only the core functionality of Bazaar, and plugins are found in separate packages. The most important plugin we will use throughout the book is the Bazaar Explorer plugin, usually in the package
You can discover other plugins with additional functionality in packages starting with
bzr- in their name.
$ sudo apt-get install bzr bzr-explorer
$ sudo yum install bzr bzr-explorer
$ sudo zypper install bzr bzr-explorer
Keep in mind that generally it is recommended to install Bazaar using the official binary repository of your distribution, in order to benefit from the advanced package management features of your operating system, such as automatic security update notifications and software upgrades.
Pip is the next generation Python package management tool. The benefit of using pip to install Bazaar is that it provides the latest stable, and unstable versions of Bazaar, whereas the official binary repository of your operating system may be a bit out of date. If you prefer to have the latest version, then using pip can be a good option. Another potential benefit of using pip is that it allows you to install Bazaar inside your home directory rather than system-wide, thus it makes it possible to install Bazaar even if you don't have administrator rights in a system.
If you don't already have pip, you can install it using the graphical or the command-line package manager of your distribution; for example, in Ubuntu:
$ sudo apt-get install pip
If you don't have administrator rights, another way to install pip is using
easy_install, which is the legacy package manager utility of Python:
$ easy_install --user pip
Once you have pip, you can install Bazaar system-wide to make it available to all users, as follows:
$ sudo pip install bzr bzr-explorer
Or install only for your user (into
~/.local/), as follows:
$ pip install --user bzr bzr-explorer
To discover other Bazaar plugins with additional functionality, search for packages starting with
bzr- in their name, as follows:
$ pip search bzr-
There is a more detailed explanation on the Bazaar download page, which can be useful if you are not using the latest version of these distributions, if you are using another distribution, or if you prefer to build and install Bazaar from source:
The download page offers different types of the Bazaar installers, such as standalone or Python-based at http://wiki.bazaar.canonical.com/WindowsDownloads.
The standalone installer includes all the dependencies of Bazaar, most notably a Python interpreter. This installer is about 20 MB in size, and will use between 50 MB to 70 MB disk space on your computer, depending upon the components and plugins you select during installation. If you are not sure which installer to choose, then choose this one.
The Python-based installers assume that you already have a specific version of Python installed. This can be a good option if you want to save disk space. However, these installers do not include some dependencies of Bazaar, and you will have to install them by yourself. See the following documentation for details:
Depending upon the type of installer you choose, there may be different releases of Bazaar available. It is recommended that you pick up the latest stable release.
During installation, you can choose the components to install. The default selection includes the Bazaar Explorer, the documentation, and a good set of additional plugins. You may simply accept the defaults for now. If you want to install additional components later, simply run the installer again:
If you prefer to install Bazaar using Cygwin, you can use the standard Cygwin installer
setup.exe file and look for the package
During installation, you can choose the components to install. By default, all the components and plugins are selected, including the Bazaar Explorer and documentation, which will take up about 50 MB disk space on your computer. If you deselect some components now, you can install them later by running the installer again:
pip is the next generation Python package manager. If it is not installed in your shared hosting environment, you can try to install it with
$ easy_install --user pip
Before installing Bazaar itself, it is recommended to install
$ pip install --user pyrex $ pip install --user paramiko
At the time of this writing, when installing Bazaar with pip, it chooses the latest beta release instead of the latest stable release. If that is not what you want, you can specify the version like this:
$ pip install --user bzr==2.5 bzr-explorer
The most straightforward way to interact with Bazaar is the command-line interface. In this book, we will cover both the command-line interface and Bazaar Explorer, which is the official graphical user interface, but keep in mind that the latter is still beta status.
A good way to confirm that the installation was successful is checking the version. Open a terminal application, such as DOS prompt in Windows or terminal in other operating systems and run the following command:
$ bzr version
The output should look something similar to the following:
Bazaar (bzr) 2.5.0 Python interpreter: /usr/bin/python 2.6.6 Python standard library: /usr/lib/python2.6 Platform: Linux-3.2.0-2-amd64-x86_64-with-debian-wheezy-sid bzrlib: /usr/lib/python2.7/dist-packages/bzrlib Bazaar configuration: /home/jack/.bazaar Bazaar log file: /home/jack/.bzr.log Copyright 2005-2012 Canonical Ltd. http://bazaar.canonical.com/ bzr comes with ABSOLUTELY NO WARRANTY. bzr is free software, and you may use, modify and redistribute it under the terms of the GNU General Public License version 2 or later. Bazaar is part of the GNU Project to produce a free operating system.
In addition to the version number, the command prints other useful information, such as the location of the Python interpreter used, the Bazaar libraries (
bzrlib), and the user's configuration directory.
$ bzr explorer
In Windows, another way to launch Bazaar Explorer is from Program Files | Bazaar | Bazaar Explorer.
Bazaar Explorer will open with the Welcome view as follows:
The top part is a toolbar with buttons to perform the most common version control operations. The main part of the screen shows some typical operations you might want to perform, such as open existing projects, start a new project, or customize Bazaar. All of these options will be explained in the next chapter; for now, we just wanted to confirm that it works.
Bazaar Explorer is similar to a regular file explorer, except that it is specialized for viewing Bazaar project directories. When you open an existing Bazaar project, the Working Tree panel on the right looks just like a regular file explorer, showing the list of files and subdirectories in the project:
In addition, the Status column in the Working Tree panel and the left panel indicates files that have not been added to version control yet (nonversioned), or files that have been modified since the last recorded revision.
The "working tree" is the main method to interact with Bazaar and perform the various version control operations on a project. This concept will be explained in detail in the next chapter.
Another very practical use case is browsing the change history, with the various branches of the project presented in a nicely formatted way:
You can perform the most common operations with whichever interface, but each will have some advantages and disadvantages depending upon the situation. In general, the command-line interface can be faster and more efficient when you are already familiar with Bazaar's commands. On the other hand, typically for viewing operations such as browsing or searching in the history, or comparing revisions, Bazaar Explorer is often more practical. In this way, the two user interfaces complement each other.
Throughout this book, we will focus more on the command-line interface, mainly for the sake of clarity. Command-line expressions tend to be more accurate and unambiguous in general. For this reason, understanding the command-line interface is essential, while the graphical user interface is optional.
$ bzr Bazaar 2.5.0 -- a free distributed version-control tool http://bazaar.canonical.com/ Basic commands: bzr init makes this directory a versioned branch bzr branch make a copy of another branch bzr add make files or directories versioned bzr ignore ignore a file or pattern bzr mv move or rename a versioned file bzr status summarize changes in working copy bzr diff show detailed diffs bzr merge pull in changes from another branch bzr commit save some or all changes bzr send send changes via email bzr log show history of changes bzr check validate storage bzr help init more help on e.g. init command bzr help commands list all commands bzr help topics list all help topics
If you cannot find something in the built-in documentation, then you can explore the official online documentation, which contains everything from short tutorials to complete in-depth references, FAQ, glossary, and further resources at http://doc.bazaar.canonical.com/en/.
You should have a good idea why it is so important to use a version control system, and expect that Bazaar will make this very easy for you. With Bazaar, you will be able to revert to any past state, work on multiple ideas in parallel, and effectively use the various operations of version control. Having Bazaar installed on your computer, you are now ready to dive in and learn how to use all these features of version control.
In the next chapter, you will learn to convert any directory on your computer to a Bazaar repository, record changes, view the history of changes, and revert to a previous state. You will be able to apply these steps to any of your existing projects, and begin to enjoy the benefits of version control.