Friday, 28 December 2018

GIT: Submodules

Often in a project, you want to include libraries and other resources. The manual way is to simply download the necessary code files, copy them to your project, and commit the new files into your Git repository.
While this is a valid approach, it's not the cleanest one. By casually throwing those library files into your project, we're inviting a couple of problems:
  • This mixes external code with our own, unique project files. The library, actually, is a project of itself and should be kept separate from our work. There's no need to keep these files in the same version control context as our project.
  • Should the library change (because bugs were fixed or new features added), we'll have a hard time updating the library code. Again, we need to download the raw files and replace the original items.
Since these are quite common problems in everyday projects, Git of course offers a solution: Submodules.

Repositories Inside Repositories

A "Submodule" is just a standard Git repository. The only specialty is that it is nestedinside a parent repository. In the common case of including a code library, you can simply add the library as a Submodule in your main project.
A Submodule remains a fully functional Git repository: you can modify files, commit, pull, push, etc. from inside it like with any other repository.
Let's see how to work with Submodules in practice.

Adding a Submodule

In our sample project, we create a new "lib" folder to host this (and future) library code.
$ mkdir lib
$ cd lib
With the "git submodule add" command, we'll add a little Javascript library from GitHub:
$ git submodule add https://github.com/djyde/ToProgress


Let's have a look at what just happened:
  • (1) The command started a simple cloning process of the specified Git repository:
Cloning into 'lib/ToProgress'...
remote: Counting objects: 180, done.
remote: Compressing objects: 100% (89/89), done.
remote: Total 180 (delta 51), reused 0 (delta 0), pack-reused 91
Receiving objects: 100% (180/180), 29.99 KiB | 0 bytes/s, done.
Resolving deltas: 100% (90/90), done.
Checking connectivity... done.

  • (2) Of course, this is reflected in our file structure: our project now contains a new "ToProgess" folder inside the "lib" directory. As you can see from the ".git" subfolder contained herein, this is a fully-featured Git repository.
CONCEPT
It's important to understand that the actual contents of a Submodule are notstored in its parent repository. Only its remote URL, the local path inside the main project and the checked out revision are stored by the main repository.
Of course, the Submodule's working files are placed inside the specified directory in your project - in the end, you want to use the library's files! But they are not part of the parent project's version control contents.

  • (3) A new ".gitmodules" file was created. This is where Git keeps track of our Submodules and their configuration:
[submodule "lib/ToProgress"]
    path = lib/ToProgress
    url = https://github.com/djyde/ToProgress

  • (4) In case you're interested in the inner workings of Git: besides the ".gitmodules" configuration file, Git also keeps record of the Submodule in your local ".git/config" file. Finally, it also keeps a copy of each Submodule's .git repository in its internal ".git/modules" folder.
CONCEPT
Git's internal management of Submodules is quite complex (as you can already guess from all the .gitmodules, .git/config, and .git/modules entries...). Therefore, it's highly recommended not to mess with configuration files and values manually. Please do yourself a favor and always use proper Git commands to manage Submodules.


Let's have a look at our project's status:
$ git status
On branch master
Changes to be committed:
    (use "git reset HEAD ..." to unstage)

    new file:   .gitmodules
    new file:   lib/ToProgress


Git regards adding a Submodule as a modification like any other - and requests you to commit it to the repository:
$ git commit -m "Add 'ToProgress' Javascript library as Submodule"
Congratulations: we've now successfully added a Submodule to our main project! Before we look at a couple of use cases, let's see how you can clone a project that already has Submodules added.

Cloning a Project with Submodules

You already know that a project repository does not contain its Submodules' files; the parent repository only saves the Submodules' configurations as part of version control.
This shows when you clone a project that contains Submodules: by default, the "git clone" command only downloads the project itself. Our "lib" folder, however, would stay empty.
You have two options to end up with a populated "lib" folder (or wherever else you choose to save your Submodules; "lib" is just an example):
  • (a) You can add the "--recurse-submodules" option to "git clone"; this tells Git to also initialize all Submodules when the cloning is finished.
  • (b) If you used a simple "git clone" command without this option, you need to initialize the Submodules afterwards with "git submodule update --init --recursive"

Checking Out a Revision

A Git repository can have countless committed versions, but only one version's files can be in your working directory. Therefore, like with any Git repository, you have to decide which revision of your Submodule shall be checked out.
CONCEPT
Unlike normal Git repositories, Submodules always point to a specific commit - not a branch. This is because the contents of a branch can change over time, as new commits arrive. Pointing at a specific revision, on the other hand, guarantees that the correct code is always present.
Let's say we want to have an older version of our "ToProgress" library in our project. First, we'll have a look at the library's commit history. We change into the Submodule's base folder and call the "log" command:
$ cd lib/ToProgress/
$ git log --oneline --decorate
Before we take a look at the actual history, I'd like to stress an important point: Git commands are context-sensitive! By moving into the Submodule directory on the command line, all Git commands that we perform will be executed in the context of the Submodule, not its parent repository.
Now, in the log output, we spot a commit that is tagged "0.1.1":
83298f7 (HEAD, master) update .gitignore
a3b6186 remove page
ed693b7 update doc
3557a0e (tag: 0.1.1) change version code
2421796 update readme

This is the version we want to have in our project. To start with, we can simply check out this commit:
$ git checkout 0.1.1
Let's see what our parent repository thinks about all this. In the main project's base folder, execute:
$ git submodule status
+3557a0e0f7280fb3aba18fb9035d204c7de6344f   lib/ToProgress (0.1.1)
With "git submodule status", we're told which revision each Submodule is checked out at. The little "+" symbol in front of the hash is especially important: it tells us that the Submodule is at a different revision than is officially recorded in the parent repository. This makes sense - since we just changed the checked out revision to the commit tagged "0.1.1".
When performing a simple "git status" in the parent repository, we see that Git regards moving the Submodule's pointer as a change like any other:
$ git status
On branch master
Changes not staged for commit:
    (use "git add ..." to update what will be committed)
    (use "git checkout -- ..." to discard changes in working directory)

    modified:   lib/ToProgress (new commits)

We need to commit this to the repository in order to make it official:
$ git commit -a -m "Moved Submodule pointer to version 1.1.0"

Updating a Submodule When its Pointer was Moved

We just saw how to check out a Submodule at a specific revision. But what if one of our teammates does this in our project? Let's say we integrate his changes (through pull, merge, or rebase for example) after he has moved the Submodule pointer to a different revision:
$ git pull
Updating 43d0c47..3919c52
Fast-forward
 lib/ToProgress | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Git informs us, in a rather shy way, that "lib/ToProgress" was changed. Again, "git submodule status" provides more detailed information:
$ git submodule status
+83298f72c975c29f727c846579c297938492b245 lib/ToProgress (0.1.1-8-g83298f7)
Remember that little "+" sign? It tells us that the Submodule revision was moved - the version we currently have checked out in our project is not the one that is "officially" committed.
The "update" command helps us correct this:
$ git submodule update lib/ToProgress
Submodule path 'lib/ToProgress': checked out '3557a0e0f7280fb3aba18fb9035d204c7de6344f'
NOTE
In most cases, you can use the "git submodule" family of commands without specifying a particular Submodule. By providing a path like in the example above, however, you can address just a certain Submodule.
We now have the same version of the Submodule checked out that our teammate had committed to the repository.
Note that the "update" command also downloads changes for you: imagine that your teammate moved the Submodule's pointer to a revision that you don't have, yet. In that case, Git fetches the corresponding revision in the Submodule and then checks it out for you. Very handy.

Checking for New Changes in the Submodule

Normally, you don't want library code to change very often: you'll want to use a version of the Submodule that you've tested and which you know works flawlessly with your own code.
However, one of the best things about Submodules is that you can easily keep up with new releases (or minor new improvements).
Let's see if there's new code available in the Submodule:
    $ cd lib/ToProgress
    $ git fetch
    remote: Counting objects: 3, done.
    remote: Compressing objects: 100% (3/3), done.
    remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0
    Unpacking objects: 100% (3/3), done.
    From https://github.com/djyde/ToProgress
        83298f7..3e20bc2  master     -> origin/master
Note that, to do this, we simply change into the Submodule folder - and can then work like with any normal Git repository (because it is a normal Git repository).
The "git fetch" command, in this case, shows that there are indeed some new changes on the Submodule's remote.
CONCEPT
Before we go ahead and integrate these changes, I'd like to stress an important point once more. When checking the Submodule's status, we're informed that we're on a detached HEAD:
$ git status
    HEAD detached at 3557a0e
    nothing to commit, working directory clean
Normally, in Git, you always have a certain branch checked out. However, you can also choose to check out a specific commit (one that is not the tip of a branch). This is a rather rare case in Git and should normally be avoided.
However, when working with Submodules, it is the normal state to have a certain commit (and not a branch) checked out. You want to make sure you have an exact, static commit checked out in your project - not a branch which, by its nature, moves on with newer commits.


Now, let's integrate the new changes by pulling them into our local Submodule repository. Note that you cannot use the shorthand "git pull" syntax but instead need to specify the remote and branch, too.
This is because of the "detached HEAD" state we're in: since you're not a local branch at the moment, you need to tell Git on which branch you want to integrate the pulled down changes.
$ git pull origin master
If you now were to execute "git status" once more, you'd notice that we're still on that same detached HEAD commit as before - the currently checked out commit was not moved like when we're on a branch. If we want to use the new Submodule code in our main project, we have to explicitly move the HEAD pointer:
$ git checkout master
We're done working in our Submodule; let's move back into our main project:
$ cd ../..
$ git submodule status
+3e20bc25457aa56bdb243c0e5c77549ea0a6a927 lib/ToProgress (0.1.1-9-g3e20bc2)
Since we've just moved the Submodule pointer to a different revision, we need to commit this change to the main repository to make it official.

Working in a Submodule

In some cases, you might want to make some custom changes to a Submodule. You've already seen that working in a Submodule is like working in any other Git repository: any Git commands that you perform inside a Submodule directory are executed in the context of that sub-repository.
Let's say you want to change a tiny bit in a Submodule; you make your changes in the corresponding files, add them to the staging area and commit them.
This might already be the first banana skin: you should make sure you currently have a branch checked out in the Submodule before you commit. That's because if you're in a detached HEAD situation, your commit will easily get lost: it's not attached to any branch and will be gone as soon as you check out anything else.
Apart from that, everything else you've already learned still applies: in the main project, "git submodule status" will tell you that the Submodule pointer was moved and that you'll have to commit the move.
By the way: In case you have uncommitted local changes inside the Submodule, Git will also tell you in the main project:
$ git status
...
    modified:   lib/ToProgress (modified content)
Make sure to always keep a clean state in your Submodules.

Deleting a Submodule

Rather seldomly will you want to remove a Submodule from your project. But if you really want to do this, please don't do this manually: trying to mess with all the configuration files in a correct way will almost inevitably cause problems.
$ git submodule deinit lib/ToProgress
$ git rm lib/ToPogress
$ git status
...
    modified:   .gitmodules
    deleted:    lib/ToProgress
With "git submodule deinit", we made sure that the Submodule is cleanly removed from the configuration files.
With "git rm", we finally delete the actual Submodule files - and other obsolete parts of your configuration.
Commit this and your Submodule will be cleanly removed from the project.

0 comments:

Post a Comment