- Mastering Git
- Jakub Nar?bski
- 1882字
- 2021-07-09 19:37:27
Directed Acyclic Graphs
What makes version control systems different from backup applications is, among others, the ability to represent more than linear history. This is necessary, both to support the simultaneous parallel development by different developers (each developer in his or her own clone of repository), and to allow independent parallel lines of development—branches. For example, one might want to keep the ongoing development and work on bug fixes for the stable version isolated; this is possible by using inpidual branches for the separate lines of development. Version control system (VCS) thus needs to be able to model such a (non-linear) way of development and to have some structure to represent multiple revisions.

Fig 1. A generic example of the Directed Acyclic Graph (DAG). The same graph is represented on both sides: in free-form on the left, left-to-right order on the right.
The structure that Git uses (on the abstract level) to represent the possible non-linear history of a project is called a Directed Acyclic Graph (DAG).
A directed graph is a data structure from computer science (and mathematics) composed of nodes (vertices) that are connected with directed edges (arrows). A directed graph is acyclic if it doesn't contain any cycles, which means that there is no way to start at some node and follow a sequence of the directed edges to end up back at the starting node.
In concrete examples of graphs, each node represents some object or a piece of data, and each edge from one node to another represents some kind of relationship between objects or data, represented by the nodes this edge connects.
The DAG of revisions in distributed version control systems (DVCS) uses the following representation:
- Nodes: In DVCS, each node represents one revision (one version) of a project (of the entire tree). These objects are called commits.
- Directed edges: In DVCS, each edge is based on the relationship between two revisions. The arrow goes from a later child revision to an earlier parent revision it was based on or created from.
As directed edges' representation is based on a causal relationship between revisions, the arrows in the DAG of revisions may not form a cycle. Usually, the DAG of revisions is laid out left-to-right (root nodes on the left, leaves on the right) or bottom-to-top (the most recent revisions on top). Figures in this book and ASCII-art examples in Git documentation use the left-to-right convention, while the Git command line use bottom-to-top, that is, the most recent first convention.
There are two special type of nodes in any DAG (see Fig 1):
- Root nodes: These are the nodes (revisions) that have no parents (no outgoing edges). There is at least one root node in the DAG of revisions, which represents the initial (starting) version of a project.
Note
There can be more than one root node in Git's DAG of revisions. Additional root nodes can be created when joining two formerly originally independent projects together; each joined project brings its own root node.
Another source of root nodes are orphan branches, that is, disconnected branches having no history in common. They are, for example, used by GitHub to manage a project's web pages together in one repository with code, and by Git project to store the pregenerated documentation (the
man
andhtml
branches) or related projects (todo
). - Leaf nodes (or leaves): These are the nodes that have no children (no incoming edges); there is at least one such node. They represent the most recent versions of the project, not having any work based on them. Usually, each leaf in the DAG of revisions has a branch head pointing to it.
The fact that the DAG can have more than one leaf node means that there is no inherent notion of the latest version, as it was in the linear history paradigm.
Whole-tree commits
In DVCS, each node of the DAG of revisions (a model of history) represents a version of the project as a whole single entity: of all the files and all the directories, and of the whole directory tree of a project.
This means that each developer will always get the history of all the files in his or her clone of the repository. He or she can choose to get only a part of the history (shallow clone and/or cloning only selected branches) and checkout only the selected files (sparse checkout), but to date, there is no way to get only the history of the selected files in the clone of the repository. Chapter 9, Managing Subprojects - Building a Living Framework will show some workarounds for when you want to have the equivalent of the partial clone, for example, when working with large media files that are needed only for a selected subset of your developers.
Branches and tags
A branch operation is what you use when you want your development process to fork into two different directions to create another line of development. For example, you might want to create a separate branch to keep managing bug fixes to the released stable version, isolating this activity from the rest of the development.
A tag operation is a way to associate a meaningful symbolic name with the specific revision in the repository. For example, you might want to create v1.3-rc3
with the third release candidate before releasing version 1.3 of your project . This makes it possible to go back to this specific version, for example, to check the validity of the bug report.
Both branches and tags, sometimes called references (refs) together, have the same meaning (the same representation) within the DAG of revisions. They are the external references (pointers) to the graph of revisions, as shown in Fig 2.

Fig 2. Example graph of revisions in a version control system, with two branches "master" (current branch) and "maint", single tag "v0.9", one branching point with shortened identifier 34ac2, and one merge commit: 3fb00.
A tag is a symbolic name (for example, v1.3-rc3
) for a given revision. It always points to the same object; it does not change. The idea behind having tags is, for every project's developer, to be able to refer to the given revision with a symbolic name, and to have this symbolic name mean the same for each and every developer. Checking out or viewing the given tag should have the same results for everyone.
A branch is a symbolic name for the line of development. The most recent commit (leaf revision) on such a line of development is referred to as the top or tip of the branch, or branch head, or just a branch. Creating a new commit will generate a new node in the DAG, and advance the appropriate branch ref.
The branch in the DAG is, as a line of development, the subgraph of the revisions composed of those revisions that are reachable from the tip of the branch (the branch head); in other words, revisions that you can walk to by following the parent edges starting from the branch head.
Git, of course, needs to know which branch tip to advance when creating a new commit. It needs to know which branch is the current one and is checked out into the working directory. Git uses the HEAD pointer for this, as shown in Fig 2 of this chapter. Usually, this points to one of branch tips, which, in turn, points to some node in the DAG of revisions, but not always—see Chapter 3, Developing with Git, for an explanation of the detached HEAD situation; that is, when HEAD points directly to a node in the DAG.
Note
Full names of references (branches and tags)
Originally, Git stored branches and tags in files inside .git
administrative area, in the .git/refs/heads/
and .git/refs/tags/
directories, respectively. Modern Git can store information about tags and branches inside the .git/packed-refs
file to avoid handling a very large number of small files. Nevertheless, active references use original loose format—one file per reference.
The HEAD
pointer (usually a symbolic reference, for example ref: refs/heads/master
) is stored in .git/HEAD
.
The master
branch is stored in .git/refs/heads/master
, and has refs/heads/master
as full name (in other words, branches reside in the refs/heads/
namespace). The tip of the branch is referred to as head of a branch, hence the name of a namespace. In loose format, the file content is an SHA-1 identifier of the most current revision on the branch (the branch tip), in plain text as hexadecimal digit. It is sometimes required to use the full name if there is ambiguity among refs.
The remote-tracking branch, origin/master
, which remembers the last seen position of the master
branch in the remote repository, origin
, is stored in .git/refs/remotes/origin/master
, and has refs/remotes/origin/master
as its full name. The concept of remotes will be explained in Chapter 5, Collaborative Development with Git, and that of remote-tracking branches in Chapter 6, Advanced Branching Techniques.
The v1.3-rc3
tag has refs/tags/v1.3-rc3
as the full name (tags reside in the refs/tags/
namespace). To be more precise, in the case of annotated and signed tags, this file stores references to the tag object, which, in turn, points to the node in the DAG, and not directly to a commit. This is the only type of ref that can point to any type of object.
These full names (fully qualified names) can be seen when using commands is intended for scripts, for example, git show-ref
.
Branch points
When you create a new branch starting at a given version, the lines of development usually perge. The act of creating a pergent branch is denoted in the DAG by a commit, which has more than one child, that is a node pointed to by more than one arrow.
Note
Git does not track information about creating (forking) a branch, and does not mark branch points in any way that is preserved across clones and pushes. There is information about this event in the reflog (branch created from HEAD), but this is local to the repository where branching occurred, and is temporary. However, if you know that the B
branch started from the A
branch, you can find a branching point with git merge-base A B
; in modern Git you can use --fork-point
option to make it also use the reflog.
In Fig 2, the commit 34ac2 is a branching point for the master and maint branches.
Merge commits
Typically, when you have used branches to enable independent parallel development, you will later want to join them. For example, you would want bug fixes applied to the stable (maintenance) branch to be included in the main line of development as well (if they are applicable and were not fixed accidentally during the main-line development).
You would also want to merge changes created in parallel by different developers working simultaneously on the same project, each using their own clone of repository and creating their own lines of commits.
Such a merge operation will create a new revision, joining two lines of development. The result of this operation will be based on more than one commit. A node in the DAG representing the said revision will have more than one parent. Such an object is called a merge commit.
You can see a merge commit, 3fb00, in Fig 2.
- Advanced Machine Learning with Python
- Spring Cloud Alibaba微服務架構設計與開發實戰
- Cocos2d-x游戲開發:手把手教你Lua語言的編程方法
- Access 2010數據庫基礎與應用項目式教程(第3版)
- The DevOps 2.4 Toolkit
- Elasticsearch for Hadoop
- VMware虛擬化技術
- D3.js By Example
- Managing Microsoft Hybrid Clouds
- 大數據時代的企業升級之道(全3冊)
- SSH框架企業級應用實戰
- Android智能手機APP界面設計實戰教程
- Java程序設計教程
- 算法學習與應用從入門到精通
- C#開發之道