---
icreports:
  tags: ['public'] 
---

# C #

This Chapter is a basic introduction to the `c` programming language. c is an imperative, procedural language. It is relatively simple and is primarily used in 'low-level' applications where explicit control of resources such as memory are required.

Although simple to get started with, c places a lot of responsibility on the developer to suitably use and free resources. c code can also be verbose, since the language is relatively unexpressive there can often be several levels of duplication.

c is ubiqituos in systems programming and is commonly used in HPC. Knowledge of c is useful in getting an understanding in how Unix-like operating systems work and in working with some HPC codes. Higher level languages, such as C++ or Julia are recommended for most new HPC work however - with c mainly used in this area for low-level optimizations or for custom hardware.


# Hello world #

A 'hello world' program in c looks as follows:

```c
#include <stdio.h>

int main(void)
{
    printf("Hello world\n");
    return 0;
}
```

First we `include` the 'header' file `stdio.h` - which has a declaration of the function `printf` that we will call. We define the function `main`, which takes no arguments (in this use case) and returns an integer `int` as a return code. It is a special function which is called as the entry point to a c program.

The function `printf` takes a character string `"Hello world\n"`, where the combination `\n` is a special character meaning 'new line'. Statements in the function are finished with a semi-colon ';'. The function returns 0 as a return code - to indicate that is has finised ok.

We save this in a file `hello.c` and can compile it, i.e. transform it to an executable binary, with:

```sh
cc -o hello hello.c
```

where `cc` is the c compiler on Unix-like systems and the `-o` flag is the name we want for the prodcued executable. The `cc` command is just a link to an actual compiler on the system, such as the GNU Compiler Collection (GCC) c compiler - `gcc`.

After executing this we can run the `hello` executable binary and see 'Hello world' and a new line printed to the terminal:

```sh
./hello
>>"Hello world"
>>
```

# Strings, Arrays and Pointers #

c is a simple language, with basic conditionals via `if-else` and `switch` statements and loops through `for`, `do` and `while` structures. The primary confusion new users have and source of defects is in the use of strings, arrays and pointers.

Two relevant fundamental types in c are characters (`char`) which represent a byte (8 bits on most modern systems) and integers (`int`) which are four bytes. Since 8 bit bytes can represent `2^8 = 256` different values, characters are often used to hold a single ASCII symbol (ASCII encodes common characters from the English alphabet, numbers and some computer control and punctuation symbols).

In c we can declare, assign or both declare and assign these basic types as follows:

```c
char a; // declare a
int b; // declare b
a = 'a'; // assign a
b = 1;
char c = 'c'; // declare and assign c
int d = 2;
```

here we have assigned the value of the ASCII character `a` to the variable `a` which has `char` type. If you were to look at the bits in memory pointed to by the variable `a` and convert them to decimal you would get '97' - which is the decimal value of the ASCII 'a' character.

So we can represent ASCII characters with the `char` type - but how about ASCII representable words, like `apple`? We can use an array of characters as follows:

```c
char[] my_fruit = "apple";
```

Here the `[]` brackets appended to the `char` type indicate that this is an array of characters. In c an array of a type is a sequence of instances of those types stored (sequentially) in memory. We can print the value of `my_fruit` to the terminal to inspect it with:

```c
printf("%s\n", my_fruit);
```

Here the combination `%s` is a special character formatting string, which says that we want to print `my_fruit` as a character string. In the construction of the output the value of `my_fruit` is substituted into the place of `%s`, while the prepended `\n` again gives us a new line in the terminal.

Let's say we want to know how many characters are in the string `my_fruit`, we could count them as follows:

```c
#include <sys/types.h>

ssize_t get_len(char[] s)
{
    ssize_t pos = 0;
    char c;
    do{
        c = my_fruit[pos];
        pos++;
    } while(c != '\0')
    return pos - 1;
}
```

There is a lot to take in here. First we include a system header which gives us the `ssize_t` type - which is an integer based type suitable for holding the size of arrays. Then we define a function `get_len` which takes a character array as input and returns its size. We use a `do-while` construct to loop over the individual array characters, using `[pos]` to index into the array. For each iteration we increment `pos` by one with `++`. We end the loop when we get a character of value `\0`. This is a special 'end of string' value in c. It marks the end of a string - even if the containing array has a larger size than the string. This is the source of much confusion and many bugs in c. Although the c standard library is quite minimal - it does at least give us a built-in equivalent of this function, called `strlen`.

So far we were able to construct a string by hard-coding the value `"apple"` - but what happens if we need to work with strings dynamically - perhaps based on some user-provided input? To work with dynamic memory in c we need to manage it ourselves, which is distinct from many languages we may be familiar with. Let's say (as a quite contrived example) we want to dynamically create a string "pear" - we can do:

```c
#include <unistd.h>
char* my_fruit = malloc(sizeof(char)*5);
my_fruit[0] = 'p';
my_fruit[1] = 'e';
my_fruit[2] = 'a';
my_fruit[3] = 'r';
my_fruit[4] = '\0';
```

Again, there is a lot going on here. First, we include the header `unistd.h` to get the `malloc` and `sizeof` definitions. Next we declare the variable `my_fruit`, but now with type `char*`. Here the `*` indicates that we are dealing with a 'pointer' type. This 'pointer' type's value is a memory address (i.e. the variable 'points to' a memory address) while the `char` element of the type definition tells the compiler that we expect things at this memory address to be of `char` type. The `malloc` function reserves (or allocates) some memory from the operating system and returns its address, which we store in our address pointer `my_fruit`. The amount of memory in bytes is the size of the character type (usually 1 byte) multiplied by the number of characters we want to store (4 for `p`, `e`, `a`, `r` and an extra one for the string terminator `\0`). 

Once the memory has been reserved and we have its address we can assign our character values to the address locations, e.g. `my_fruit[0] = 'p'`. Here we are doing something unusual - we are bringing back our array notation to index with. This is allowed by a quirk of c, where array access and pointer offsetting are interchangeable. We use it here as it semantically reflects that we are dealing with a string but we could also work directly with the pointer by doing `*(my_fruit + 1) = 'e'`. Here we are getting the address `my_fruit` and incrementing it by the size of a pointer to get the next memory address (`my_fruit + 1`) and then using the `*` operator to refer to the value of the memory at that address. Finally we assign the value of the character `e` to the value in the address.

Once we are finished working with `my_fruit` (e.g. printing it to the terminal) we need to explictly free the allocated memory. This is done by calling `free(my_fruit)`. A common bug arises from forgetting to free the allocated memory, or doing it incorrectly. This can cause memory leaks or corruptions.

For ease of use we will likely want our string creation to happen in a function, this could look as follows:

```c
#include <stdio.h>
#include <unistd.h>

void get_fruit(char** fruit)
{
    *fruit = malloc(sizeof(char)*5);
    *fruit[0] = 'p';
    *fruit[1] = 'e';
    *fruit[2] = 'a';
    *fruit[3] = 'r';
    *fruit[4] = '\0';
}

int main(void)
{
    char* my_fruit;
    get_fruit(&my_fruit);
    printf("My fruit is: %s\n", my_fruit);
    free(my_fruit);
}
```

This needs some careful reading. In the main function we define the character pointer `my_fruit`. We then pass the address of the pointer itself using the `&` (reference) operator. The `&` operator gets the address of any variable. The `get_fruit` method takes a pointer to a character pointer type as its argument (`char**`). This is because we are going to modify the value of the character pointer (number representing the memory address) itself by assigning it to the output of the `malloc` call. Inside the `get_fruit` function we use the `*` operator to dereference the `fruit` and access its value - the character pointer.

After printing the value of `my_fruit` we will free the memory allocated in the `get_fruit` function. This raises a further source of bugs in c - not cleaning up resources allocated inside functions - especially external or third-party functions. It is important when reading function documentation to understand who is responsible for allocating or cleaning up resources. This is less so the case in a language with smart pointers such as C++ or not at all an issue in a language with a garbage collector such as Python.

In this basic example we haven't included any error handling. In addition to checking resource use we should pay attention to any errors that can occur when calling external functions. In particular `malloc` can produce errors if it fails to allocate the requested memory - which we should suitably handle.

This section covers what is typically regarded as the most confusing or error prone elements of c - once you are comfortable with the concepts introduced here it is mainly then a matter of familiarizing yourself with the many available utilities in the standard library and operating system provided headers. `man` pages - available online or in your terminal are a great way to do this.


# Build Tooling #

Aside from tutorials or proofs of concept it is rare to have a c project with a single file or dependence only on system libraries. `make` is a commonly used tool for building more complex c projects. It takes a project definition via a `makefile` and by running the `make` command generates or builds project output, usually an executable binary. An important feature of make in larger projects is that it only re-builds neccessary elements when a file changes based on modification timestamps - thus we can save a lot of time.

Below is a simple, but slightly contrived, makefile:

```make
hello: hello.o
	gcc hello.o -o hello

hello.o: hello.c
	gcc -c hello.c

.PHONY: clean
clean:
	rm hello.o
```

Here we declare three targets `hello`, `hello.o` and `clean`. The `hello` target depends on the `hello.o` target (and associated file), which in turn depends on the presence of a file `hello.c`. The body of the `hello.o` target compiles the `hello.c` file using the `gcc` compiler. The `hello` target uses the resulting object file `hello.o` to create the `hello` executable. The target `clean` will clean up any build output and is marked `PHONY` since it doesn't generate any files itself.


# Memory Checking #

Checking for memory leaks or corruption is vital in any non-trivial c project - due to the inherent difficulty for any human to avoid them while using the language. `valgrind` is a popular memory checking tool for this purpose.









