Software Engineering#
In the context of this Handbook Software Engeering is regarded as the set of processes and activities needed to deliver and maintain a software project.
Software engineering is thus central to many ICHEC activities, whether it is working with a data intensive application, troubleshooting a large HPC code or writing user tutorials or recipes.
This chapter introduces basic concepts and tooling used in software projects. It starts by overviewing the Software Development Lifecycle to introduce how software projects can be managed.
Given its important place in software project management, version control is covered next. This is followed by an overview of how to set up a Local Development Environment and how to collaborate with colleagues on a Cloud Development Platform.
The chapter finishes with some information on Licensing and Copyright which is important to be aware of when consuming other’s software and making our software available for re-use by others.
The chapter starts with some basic quick tips and best practices when developing software, which are covered in more detail in subsequent sections.
Introduction#
Quick Tips and Best Practices#
These tips are intended to form a quick guide or checklist when contributing to a software project. At ICHEC we rely on software to perform our work and support others perfoming theirs. It is important that we produce high quality software and that we do it efficiently, generating the maximum value from the resources we put into it.
Make it easy for others to understand your code#
Software is a communication mechanism between humans, other humans and computers. Just like typical languages have commonly agreed rules and structures to make it easy for humans to communicate, so should software.
Follow existing styles and patterns#
If everyone makes up their own styling rules when working on a project then communication and collaboration become more difficult. When working on a project take note of the existing conventions in the project and of the chosen language (eg Python) or framework (eg Django/React) that you are using. Note:
the naming patterns and structures of directories and files
the use of casing in file, directory, method, module, variable and class names (
MyFile.dat
,my_file.dat
etc)standards and style guides, such as PEP8 for Python or Google style guides per language and framework
the availability of automated tooling, such as linters and static analysers which are available in most IDEs as plugins.
Document your project and code#
Source code alone is insufficient to give information on the intent, scope and future plans of a software project. If you want others (or your future self) to get value from your work it should come with clear documentation, including:
An overall project README file stating the project purpose and how someone can get more info
A wiki and/or issue tracker recording future plans and past decision making
Design documentation including architecture diagrams and rationale for previous decisions
Use of in-project READMEs describing the content of non-trivial directories. For example, if a directory has a standalone script it should have info on how to run it and expected inputs and outputs.
Use of source code documentation with suitable style, for example Sphinx for Python or Doxygen for C++.
Make the code itself understandable#
Easily understandeable code is associated with higher quality and more reliable software. Code that follows an expected structure or style is easier to understand. In addition, the following steps help:
Follow the ‘Don’t Repeat Yourself’ (DRY) principles:
Avoid copy-pasting large blocks of code
If two scripts have largely similar code they should take it from a single common module
Regularly refactor code to reduce repitition and to make it more simple
Don’t use or introduce code you don’t understand - it is easy to introduce bugs and others will struggle to understand it too. This particularly pertains to code from Stack Overflow and LLM outputs. Take the time to look at the documentation for the code or API you are using and bookmark it for future referal. This is a way to improve as a developer long-term.
Avoid functions and files with large amounts of code - it is difficult for a reader to keep more than 100 lines of code in their head at a time. Break them into small, clearly named and documented, units.
Avoid functions with many arguments (more than 5) and classes with many attributes.
Avoid deeply nested loops and branches (if-else)
Use descriptive variable and function naming. Length should be related to scope - variables with a large scope should have a long name. Variables with short scope (e.g. a loop counter in a small loop) can have short names.
Keep scopes small - programs with variables in global or file level scope are hard to debug and follow
Don’t mix data and code - avoid hardcoding any (magic) numbers or settings into code, they should all be exposed for modification by users. Make these parametes available for input via command line and/or config file.
Data should have a sensible model - think about what it will look like when saved as JSON for example.
Use patterns and APIs from standard libraries, for example the STL in C++ or built-in modules and structures in Python (for example list comprehensions and generator expressions).
Make software behaviour predictable#
Software should not surprise users. If something does go wrong they should be able to quickly troubleshoot and understand the problem. Some ways to allow this are:
Use error handling - if you are in an unexpected situation throw an error with a descriptive message. Anticipate and handle errors from dependencies.
Make extensive use of logging with different levels (info, warn, error). Make logging messages clear and actionable. Use a logger instead of print statements, it gives the user more control over what is output and where.
You software should have an easy to use CLI, GUI or API as appropriate. Users should not need to modify or read your source code to use the software’s featues.
Make it possible to reproduce a computation - record inputs and cache them. Include checkpoints and recovery mechanisms for long-running computations.
Don’t mess up your user’s systems#
Using software developed by someone else requires trust - that the software you write won’t spy on them, consume excessive resources or otherwise break or damage their systems
Use automated tests - you should add testing early in the software development process. Testable code has a different design to untestable. It is difficult to add good tests to already written code - the code will need to be changed to facilitate the tests. Testing is a habit - it only slows you down until you get familiar and comfortable with doing it.
Automate the infrastrastucture related to planning, building and delivering your software toward fully reproducible deployments.
Take care with third-party dependencies, use enough to provide suitable software functionality but don’t include or use more than you need. This reduces bloat and the attack surface of your software on people’s machines. For example, in Python packages audit your
requirements.txt
to remove unneeded packages and consider version pinning or vendoring in your release process.Correctly package your software so it can be installed in a predictable and reversible way on people’s systems. Make it available in standard package managers and repos, e.g. PIP/PYPI, following best practice for package development. Don’t make users ‘sudo’ to use your software unless it is a system tool - even then having a rootless installation mechanism is best.
Let your users try your software without having to modify their system - Python packages or containers (Docker etc) are a good way to do this.
Take basic measures to avoid excessive resource consumption, like filesystem, cpu, memory and network use.
Use a secure release and deployment process via version control and Continuos Integration. Never push binaries from your machine to a release package or a production system. Never modify production code on a live system. Release binaries and production changes should only come from a secure CI/CD process.
Provide options (e.g. via command line) for users to choose the directory for all program outputs. Don’t delete user files or cause data loss.
Get and give feedback often#
Reading other people’s code is a very effective way to grow as a developer. The code review process is a great mechanism for knowledge exhange, as well as helping code quality. To benefit from this, all work from both experienced and junior developers should be visible, and an opportunity given to give and recieve effective feedback. To this end you should:
use a version control system and collaborative (cloud) development platform (e.g. Gitlab)
make small commits often (multiple times per day) and push them to the cloud platform
commit to ‘main’/trunk often via merge requests - they are the most effecitve way to get feedback on code.
don’t have long running branches just for you (more than one day) - they go stale and are bad for collaboration
clean old branches - don’t leave stale ones lying around, it is hard to know if that work is ongoing
use an issue tracker to explain ongoing and planned work
read other repos and code contributions in the organization. Browse the organization’s projects and note good or useful techniques and areas that could be helped or improved.
look out for a chance to use or propose a set of common tools. Common tooling is a great way to collaborate.