---
icreports:
  tags: ['public'] 
---

# Software Engineering #


## Refactoring and Migration ##

Refactoring is the restructing of software or services without changing its external behaviour. It can be undertaken to improve the design of a piece of software or a system or as part of introducing new elements.

Refactoring of a production system can be done in two broad ways:

* incrementally replacing elements of a live system
* bringing up a parallel system and switching to it

For an established production system the former should be preferred where feasible. Switching to a parallel system (the shiny new thing) is often tempting but has the following pitfalls:

* you will end up maintaining two systems - at least during the development phase, and very often after too. 
* if new features need to be added during the refactor they either need to be added to both or users will end up confused by which system a given feature was available on
* without care, having two system can interfere with each other - for example in terms of namespacing
* established production systems can have quircks that aren't in the public API but users have discovered and depend on. They are often only discovered post-switch, and all at once
* the new system will have undiscovered bugs - which users will hit after the switch. Service quality may drop for some time post-switch.
* it can introduce unhealthy habits and culture - of software being throw-away or of quickly jumping to a new technology. Seeing software as disposable can reduce accountability and motivation. It is contrary to agile practices of incremental improvement.

Incremental refactoring should be done in the following stages:

1) Rough scoping or sketching of planned changes, including required resources and estimated timelines
2) Agreement in principle with project stakeholders, including end users, software developers and budget holders. This includes, very importantly, establishing a business priority for the project. Those doing the refactoring should be able to suitably prioritze thier work and get the support they need from others. A decision may need to be made to sacrifice other projects or time spent on other activities related to the production system to support the refactor.
3) Documentation of the existing system, including functional and behavioural requirements, architecture and, for services, maintenance and support playbooks and degree of existing deployment automation and observability
4) Development of a refactoring plan, including timelines with milestones and checkpoints
5) Agreement of plan with project stakeholders.
6) Evaluation of automated test covergage of public APIs and deployment automation, including use of pre-production environments. Performance testing of original system for benchmarking.
7) Filling broad automation and observability gaps
8) Replacement of an atomic (small) feature - backed by automated testing and monitoring
9) Iteration on 8) and gradual replacement of features 

Before doing any refactoring the following bare minimal things should be in place:

* The locations of project materials have been established, including:
  * source code
  * scripts
  * live machines
  * documentation
  * point(s) of contact for questions
  * access credentials
* Stakeholders who need to document the project have at least read access to project materials. Project experts have the time to support the documentation effort.
* The project is documented
* Project materials have been repositorized and those doing the refactoring have edit access to them:
  * source code, scripts and workflows/pipelines are in version control on ICHEC's Gitlab
  * documentation is on an accessible system with version tracking - Sharepoint, Handbooks, Gitlab READMEs or wikis, Draw.io or similar. Ideally architecture diagrams can be version controlled - e.g. stored as diffable plain text.
  * credentials are stored in a suitable system with access controls and auditing - TBD with systems but Gitlab credential management is a start.
* There is at least a documented deployment and maintenance process. 

The following should be targetted early or as high priority during a systems refactoring:

* addition of observability and monitoring - this includes system health, security including access and change auditing, and system performance.
* removal of root access to all systems - system setup should be standarized and remove the need for this
* stopping manual interaction with production systems, e.g. 'ssh-ing in to live system'. System state should be managed by dedicated tooling such as Ansible and changes tested and rolled out through pre-production environments.
* addition of automated tests - at least E2E tests for primary APIs.
* reliance on out-of-date/deprecated dependencies or APIs

The following should be reviewed during system refactoring:

* Use of suitable technologies:
  * security setup - access and data security
  * should the system rely on containers, VMs, system services (systemd etc) and orchestration - and where
  * storage platform - hardware and software
  * backup and restore
  * scalability
  * hosting platform - hardware and software
  * vendor lock-in
* Performance
* Functionalities

Elements of captured documentation should include:

* system architecture
* design decisions, rationale and history
* development guide
* operations guide
* user guide
* code documentation - readme, APIs, internal

## Version Control ##

### Git Subtree ###


#### Motivation ####

Git subtrees are a resource that allows us to keep track of the history of software dependencies of our projects. 
This works by nesting the dependency git repository inside our project's git repository.
Correct usage of Git subtrees can simplify workflows in the following cases;

* our project requires source code from another project 
* we are working with twin projects
* we wish to create a superbuild
* a patched version of a dependency project is required - changes may be stored in the primary project's repository


Git subtrees are interacted with through the `git subtree` subcommand. 

#### Adding Subtrees to a Project ####

Subrees should reside in an `external/` directory at the root of our project directory. 
Create one if it does not already exist.
Suppose we wish to add branch, `branch-name` (usually `master` or some release), of a project, `proj-name`, with repository url, `repo-url`, as a subtree or our project.
This can be acheived using the following sequence of commands;

```
git remote add [proj-name] [repo-url]
git fetch [proj-name] [branch-name]
git subtree add --squash -P external/[proj-name] -m "Add [INSERT PROJ NAME PLUS VERSION] to external/[proj-name]" [proj-name] [branch-name]
```

We use a squashed commit to preserve a clean commit history and to better manage repository size.

#### Pulling Changes to a Subtree ####

The best way to update a subtree for a project, `proj-name`, to the latest version of branch, `branch-name`, is using the following command;

```
git subtree pull --squash -P external/[proj-name] -m "external/[proj-name]: Update [proj-name] to [INSERT COMMIT HASH]" [proj-name] [branch-name]
```

Once again, a squash commit preserves a clean commit history and regulates the subtree's footprint.