Version Control

Version Control#

The primary purposes of version control are to:

maintain a history of changes to a repository, allowing ‘undo’ or ‘rewind’ and auditing.
maintain multiple copies of a repository, such as a remote version
allow collaboration between multiple contributors
allow management of multiple workstreams
facilitate automation of tests and deployment
Allow release management

All software projects should be version controlled, it is a fundamental element of modern development practice for reliable, secure software and Open Science. Something just being ‘rough work’ or ‘work-in-progress’ is not a reason to avoid putting it in version control. Aside from the several other benefits, you are doing yourself and your colleagues a disservice by not sharing your work for feedback and collaborative learning.

‘Checking out’ an existing repository#

Fig. 19 shows some of the flows and terminology involved in working with the Git version control software. With Git multiple copies of a software repository (repo) are held, locally on developer’s machines and also on a centralized server. The server is usually set as a ‘remote’ relative to the copy on someone’s machine, with the default name origin.

../_images/version-control-terminology.png — Fig. 19 A schematic of some of the flows and terminology use in the Git version control software.#

The ICHEC GitLab hosts a ‘remote’ version of our repositories. You can get a copy by ‘cloning’ it:

git clone git@git.ichec.ie:my-group/my-project.git

which downloads it to your machine and ‘checks out’ a working copy of the files on a default ‘branch’. The term ‘branch’ derives from the graph-based structure that Git uses to record changes to repository content. Branches allow multiple people to work on shared code at once. When working on a collaborative project (likely most projects at ICHEC) you should work on a ‘feature branch’ rather than the ‘main’ or ‘master’ branch.

You can ‘checkout’ a feature branch by splitting it from a nominated common ‘trunk’ or ‘main’ branch and when finished working on it you can ‘merge’ your work back into the ‘main’ branch.

You can do this with the following steps after cloning:

git checkout -b my_feature_branch

Now you can make any changes you like the repo, for example modify or add some code.

Working with GitLab’s Web IDE#

To edit the repository directly in GitLab’s Web IDE (For easy and quick edits or minor fixes & documentation changes):

Navigate to the Repository: Go to the ICHEC project on GitLab, ensuring you’re on the correct branch.
Open the Web IDE: On the project homepage, click the “Edit” button between the “Find File” and “Code” buttons.
Edit Files: Use the Web IDE to browse and edit your project’s files.
Commit Changes: After making changes, click the source control icon on the left. Provide a commit message, then click the down arrow to select “Create new branch and commit” before committing.

After committing, GitLab will offer an option to create a Merge Request (MR) directly from the Web IDE, where you can add details, link relevant issues, and submit it for review by your colleagues.

Inspecting Changes before Committing#

Before committing your changes, it is essential to inspect what modifications you’ve made. Git provides several commands to help you do this for example:

View changes by specific file:

This will show line-by-line differences between the current version of the file and the last committed version:

git diff src/filename.exe

View Staged Changes: This will show you what will be committed, i.e., the differences between the last commit and the staged changes:

git diff --cached

Discarding Changes (If Necessary): Incase you decide that you do not want to keep certain changes, you can discard them and restore the file to its previous state by running:

git restore src/filename.exe

Hint

The first time you log into ICHEC Gitlab you will be put in a pending approval state. A Gitlab admin will need to approve your access so reach out and ask.

Starting a new repository#

From GitLab#

If you would like to start a new repository for a new project it is easiest to do this in GitLab first. You should decide if it is a ‘personal’ project or a ‘group’ project.

If it is a personal project you can create it under your own name. This would be intended for code only you plan to use.

If it is a group project you should consider whether it belongs to an ICHEC ‘theme’ or activity, e.g. Quantum or Performance and if so create a repository under one of these groups. This may require discussion with the theme/activity lead. If it doesn’t fit under an existing one you could come up with a broad theme to place it under, ideally avoiding creating projects at the ‘top level’.

Your new project should at least include a descriptive README, saying what the project is and who to contact for more details. You may want to consider adding a LICENSE if the repository will become public or shared externally at any point.

Using existing code#

If you have existing code elsewhere and want to add it to a repository you can create a ‘stub’ repository in GitLab. It will give you a url like ‘git@git.ichec.ie:my-group/my-project.git’.

If your project isn’t under version control you can add it with:

git init

in the top-level directory.

You can then ‘push’ it to the remote repository by first adding the remote and then pushing to it:

git remote add origin git@git.ichec.ie:my-group/my-project.git
git push origin

Merging changes back to the repo#

Assuming you are working on your branch ‘my_branch’, when you are happy with your change you can do:

git status

which will show ‘staged’ changes. You can inspect them to make sure they are as expected. If happy you can add them:

git add .

Finally you can ‘commit’ these changes, with a descriptive message about what they are:

git commit -m "Descriptive message about the change"

You can then ‘push’ your changes to the remote repository, which is by default tagged ‘origin’.

git push origin my_branch

After you push to remote you can go to the GitLab webview for your project. If you are signed in you should get a ‘banner’ near the top of the page asking if you want to create a ‘Merge Request’ for your branch. After agreeing to that you can fill out the Merge Request template giving more details on the change, and possibly linking to a GitLab Issue if working from one.

You should check the ‘diff’ in the web UI, which will show the changes you are about to make to the target branch. You may be required to wait for automated tests to pass or for a colleague to review the code, depending on your projects settings.

After these steps you will get a ‘green light’ in the UI and will be able to hit the ‘merge’ button to merge your changes to the common branch.

Changing repo location#

At times it can be neccessary to change the location of a repo, e.g. moving it between namespaces. If this occurs, you must change the origin in any local copies of the repo. To check the origin, use:

git remote -v

This will give an output such as

origin	git@git.ichec.ie:my_group/my_project.git (fetch)
origin	git@git.ichec.ie:my_group/my_project.git (push)

The remote can be changed using

git remote set-url origin git@git.ichec.ie:new_location/new_project_name.git

Verify your results by running git remote -v once more and you should see the new locations listed.

Signing Git Commits#

By adding a digital signature to your commit, you provide extra assurance that the commit originated from you and not an impostor. When a commit has been signed, it will display a ‘verified’ badge in the GitLab UI. On GitLab, commits can be signed with GPG keys, SSH keys, or a personal 509.x certificate. Here we will walk through the process of setting up commit signing using SSH keys.

The same SSH key may be used for authentication to GitLab and signing commits, provided the key usage is listed as ‘authentication and signing’.

Configuring SSH key signing#

If you have not already done so, enerate an SSH key, the recommended type is ed25529. Upload this to your GitLab account with the usage type ‘Authentication and Signing’.
Configure git to use the SSH key for signing

git config --global gpg.format ssh

You can verify this by displaying your global git config file by printing the contents of your git config file, e.g.

cat ~/.gitconfig

3) Add the path to your key to the git config file, to specify which key to use. The filename and location of your key may differ depending on how it was created, substitute `~/.ssh/my_key.pub` with the location of your public key. Never share your private key. 

```bash
git config --global user.signingkey ~/.ssh/my_key.pub

Using your SSH key to sign commits#

Once you have set up signing with your SSH key, you can sign your commits using:

git commit -S -m "Commit message"

Optionally, to configure all commits to be signed run

git config --global commit.gpgsign true

You will be prompted to enter the password for your ssh key if it is password protected. Once you have pushed your commit to gitlab, you should see a verified badge alongside the commit number.

Version Control Do’s and Don’ts#

Here are some suggestions for keeping a repository clean and workable:

Do not commit large files to version control - really nothing binary should be in there. They can significantly slow down git operations and make working on the repo painful. If you have no other option consider the git-lfs tool to manage large files, but beware it will complicate things for anyone collaborating on the repo with you.
Do not commit secrets or credentials - GitLab has a built-in secret manager
Do not push your local version of the ‘main’ or ‘master’ branch directly to the repo - this makes it difficult to collaborate on the repo, use the Merge Request flow.
Do clean up remote branches when finished with them. Avoid leaving partially completed work on the remote for a long time - others won’t know what to do with it in future.
Do not add third-party repository contents to the repo - this will make the repository slower and the history version noisy. Use a local package manager (preferable) or git submodules to handle third-party libraries.
Do carefully check your staged files if doing git add . - it is very easy to add noise or worse secrets to a repo this way.

Choosing a suitable Git workflow#

Fig. 20 shows a schematic of three common ways for managing a git project. If you are the only contributor (and likely to remain the only contributor) then you may not need a feature branch and can commit straight to the default ‘main’ branch. If you need to make releases you can do so with the git tag feature.

If you have a collaborative project with expensive or long running automated tests, or have concerns about your ability to maintain quality and stability on the ‘main’ branch you can create a ‘dev’ or ‘devel’ branch and merge feature branches to it. You can then set up automated commits to the ‘main’ branch if suitable tests pass. This approach has the downside of being complicated to implement and encouraging anti-patterns as it does not encourage the development of fast and reliable automation.

Merging feature branches directly to ‘main’ is a simple flow for collaborative projects. Releases and hot-fixes can be managed by Git Tags, with branching off tag points if needed. A downside to this approach is that it is easier for main to be destabilized. The need for a stable main branch depends on how you are planning to deliver releases to your project downstreams.

../_images/version-control-flows.png — Fig. 20 An example of some common methods for managing collaborative git projects.#

Setting a global configuration#

When working on Mac and with certain IDEs they will put files specific to your setup into the source tree of your repositories. Often they are hidden files and can be easy to commit by accident. To avoid this you can create a global git ignore, to ignore such files from all of your repos. This is by creating and populating the file: ~/.config/git/ignore.

Troubleshooting#

Removing a large file from the git history#

Addition of large files to a repo can make it difficult to work with in future. Removal of these files afterwards is very problematic as it breaks the project history. Adding large files is one of the most damaging thing you can do to a repo without admin rights, so please try to avoid it.

If it is too late, we can use BFG to clean up. First obtain BFG on our local machine:

wget https://repo1.maven.org/maven2/com/madgag/bfg/1.14.0/bfg-1.14.0.jar

Assuming we have checked out our repository, first make sure you have it backed up in case anything goes wrong!

Then find the large ‘deleted’ files in the .git/objects subfolder of the repository and manually delete them with rm.

Then we run

java -jar -bfg-1.14.0.jar --strip-blobs-bigger-than 100M name_of_git_repository.git

where our repository is named `name_of_git_repository.git’ and we want to remove all git objects of size greater than 100M (change this number as required).

This will clean the git repository history objects of files larger than the size we specified. Then we can do git commit and git push and we are done.