Julia#

The purpose of this section is to create a list and reference of Julia libraries that ICHEC staff may find useful.

Contents#

  • Acceleration

  • Logging

  • Measurement - All things for probing code and extracting performance metrics

  • Numerical

  • Serialisation

  • Testing

  • Utility

Benchmarking#

JuliaCI/BenchmarkTools.jl

The BenchmarkTools library supplies a framework for writing and running groups of benchmarks in Julia. It can be used in both a development environment for optimisation purposes as well as for CI to monitor possible performance regressions.

The library is used for running the base benchmarks for the Julia language itself.

Introduction#

Due to a large number of contributing factors, effectively benchmarking software can be challenging.

A simple, naive way of benchmarking code can be simply recording the time before and after the code you would like to execute and calculating the elapsed time. This can be done in Julia using the @time macro:

julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia @time x = A\b;
  0.951087 seconds (3.32 M allocations: 160.341 MiB, 2.83% gc time)

Seems pretty slow and a lot of memory was used for the small task it was given… This is because Julia is Just-In-Time compiled (JIT) and therefore the numbers we’re seeing here include the compilation. Running the same line again results in the following:

julia> @time x = A\b;
  0.000177 seconds (5 allocations: 79.984 KiB)

Much better!

To try and overcome these sources of variance on timings, we can run a large number of samples and simply average the times. While this is obviously better, we must then worry about how many tests to run, the distribution of the timings, whether the average is a reasonable metric, etc.

So, this compilation phase coupled with the usual suspects that make benchmarking non-trivial call for a robust benchmarking framework. That’s where this library comes in.

The BenchmarkTools manual states that the library was created to facilitate the following tasks:

  1. Organize collections of benchmarks into manageable benchmark suites

  2. Configure, save, and reload benchmark parameters for convenience, accuracy, and consistency

  3. Execute benchmarks in a manner that yields reasonable and consistent performance predictions

  4. Analyze and compare results to determine whether a code change caused regressions or improvements

The manual also outlines some terminology used throughout:

  • “evaluation”: a single execution of a benchmark expression.

  • “sample”: a single time/memory measurement obtained by running multiple evaluations.

  • “trial”: an experiment in which multiple samples are gathered (or the result of such an experiment).

  • “benchmark parameters”: the configuration settings that determine how a benchmark trial is performed

The difference between a sample and an evaluation is that an evaluation may execute faster than your timing resolution and therefore result in an invalid measurement. Therefore, a number of evaluations must be run per sample to give a representative time per evaluation.

Installation#

As with all Julia packages, the library is installed using the built in package manager:

julia> using Pkg
julia> Pkg.add("BenchmarkTools")

Usage#

BenchmarkTools offers three main macros:

  • @btime

  • @benchmark

  • @benchmarkable

which will be covered here.

@btime#

The @btime macro is used as a drop in replacement for the @time macro mentioned above. It is mostly used for quick sanity checks giving an indication of the time required to run the code as well as the allocated memory. Its main advantage is that it ignores the timing of the initial compilation step.

julia> using BenchmarkTools
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia> @btime x = $A\$b;
  81.713 μs (5 allocations: 79.98 KiB)

Note the use of $A and $b when calling @btime, which will be discussed in the gotchas section.

@benchmark#

The @benchmark macro is used to give a more detailed analysis of a code section including statistics on the timings.

Following on with the example outlined above of solving for x in Ax = b (the $A and $b syntax is used here also, which is discussed in the gotchas section):

julia> using BenchmarkTools
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia> @benchmark x = $A\$b
  BenchmarkTools.Trial:
    memory estimate:  79.98 KiB
    allocs estimate:  5
    --------------
    minimum time:     88.310 μs (0.00% GC)
    median time:      129.171 μs (0.00% GC)
    mean time:        149.087 μs (5.14% GC)
    maximum time:     6.987 ms (0.00% GC)
    --------------
    samples:          10000
    evals/sample:     1

@benchmarkable#

The @benchmarkable macro is used to create a benchmark without running it immediately. The benchmark can be started by passing it to the run function:

julia> using BenchmarkTools
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia> bench = @benchmarkable x = $A\$b
  Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(bench)
  BenchmarkTools.Trial:
    memory estimate:  79.98 KiB
    allocs estimate:  5
    --------------
    minimum time:     82.495 μs (0.00% GC)
    median time:      123.419 μs (0.00% GC)
    mean time:        142.598 μs (6.47% GC)
    maximum time:     8.160 ms (81.93% GC)
    --------------
    samples:          10000
    evals/sample:     1

You can optionally pass the benchmark structure to a tuning function which will be discussed in the Tuning section. This optional tuning is carried out automatically with the @benchmark macro; however, this is not always desired.

This is especially useful when creating suites of benchmarks which are discussed below.

Gotchas#

Globals#

A common performance tip with Julia is to avoid global variables. This is due to the fact that a variable may have its value, and therefore type, change at any time and this makes it difficult for the compiler to optimise code that uses global variables.

To work around this issue, it is suggested to “interpolate” the variables into the benchmark using the $(...) syntax:

# rand(1000) is executed for each evaluation
julia> @benchmark sum(rand(1000))
  BenchmarkTools.Trial:
    memory estimate:  7.94 KiB
    allocs estimate:  1
    --------------
    minimum time:     1.566 μs (0.00% GC)
    median time:      2.135 μs (0.00% GC)
    mean time:        3.071 μs (25.06% GC)
    maximum time:     296.818 μs (95.91% GC)
    --------------
    samples:          10000
    evals/sample:     10

# rand(1000) is evaluated at definition time, and the resulting
# value is interpolated into the benchmark expression
julia> @benchmark sum($(rand(1000)))
  BenchmarkTools.Trial:
    memory estimate:  0 bytes
    allocs estimate:  0
    --------------
    minimum time:     101.627 ns (0.00% GC)
    median time:      101.909 ns (0.00% GC)
    mean time:        103.834 ns (0.00% GC)
    maximum time:     276.033 ns (0.00% GC)
    --------------
    samples:          10000
    evals/sample:     935
Compiler Optimisations#

Note that for trivial, non-observable tasks, the compiler may very well optimise the code away entirely:

julia> a = 1; b = 2;
julia> @btime $a + $b
  0.024 ns (0 allocations: 0 bytes)
3

To combat this in the above example, we instead interpolate a reference to a variable and dereference it within the benchmark:

julia> a = 1; b = 2;
julia> @btime $(Ref(a))[] + $(Ref(b))[]
  1.277 ns (0 allocations: 0 bytes)
3

Benchmark Parameters#

The default benchmark parameters are stored in BenchmarkTools.DEFAULT_PARAMETERS whose structure and default values are shown below:

Parameter

Type

Default

Explanation

samples

Int64

10000

Number of samples to take

seconds

Float64

5.0

Seconds budgeted for benchmarking

evals

Int64

1

Number of evals per sample

overhead

Float64

0.0

Offset to subtract due to loop overhead (in ns)

gctrial

Bool

true

Run gc() before each trial

gcsample

Bool

false

Run gc() before each sample

time_tolerance

Float64

0.05

Percentage noise tolerance on times used for analysis

memory_tolerance

Float64

0.01

Percentage noise tolerance on memory used for analysis

The values can be changed be setting the individual fields of BenchmarkTools.DEFAULT_PARAMETERS or by setting values as part of a benchmarkable macro:

julia> bench = @benchmarkable sum($(rand(1000))) seconds=1 time_tolerance=0.01
julia> run(bench)

Setup and teardown#

Setup and teardown is supported with setup and teardown expressions.

Back to our Ax = b example above, it may be written as follows using a setup expression:

julia> using BenchmarkTools
julia> @benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n))

Note the lack of variable interpolation in the benchmarked expression. setup and teardown expressions are executed once per sample, not each evaluation (as described in the Introduction. This is something to keep in mind when mutating state within evaluations. For example, if running N evaluations of a sorting algorithm where the sorting is carried out in place, then N-1 of the evaluations will be carried out on a sorted array.

Handling Results#

BenchmarkTools has four types related to benchmark results:

  1. Trial - Contains all collected samples and benchmark parameters

  2. TrialEstimate - A single estimate to summarise a Trial

  3. TrialRatio - A comparison between two Trails

  4. TrialJudgement - A classifier of a TrialRatio as invariant, regression, or improvement

Trials are the return type of running benchmarks:

julia> using BenchmarkTools
julia> trial = @benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n))
  BenchmarkTools.Trial:
    memory estimate:  79.98 KiB
    allocs estimate:  5
    --------------
    minimum time:     88.470 μs (0.00% GC)
    median time:      128.398 μs (0.00% GC)
    mean time:        140.650 μs (5.12% GC)
    maximum time:     5.099 ms (97.06% GC)
    --------------
    samples:          10000
    evals/sample:     1

The representation of a Trial shows the results of the minimum, median, mean, and maximum TrialEstimates. There are calculated using overloads of their respective functions.

When considering which estimator to use, refer to the following:

  • The minimum is a robust estimator for the location parameter of the time distribution, and it should not be considered an outlier

  • The median, as a robust measure of central tendency, should be relatively unaffected by outliers

  • The mean, as a non-robust measure of central tendency, will usually be positively skewed by outliers

  • The maximum should be considered a primarily noise-driven outlier, and it can change drastically between benchmark trials.

TrialRatios are computed by calling the ratio function with two TrialEstimates. These ratios can then be passed to judge to classify whether the benchmark is an invariant, a regression, or an improvement:

julia> bench1 = median(@benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n)));
julia> bench2 = median(@benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n)));
julia> bench_ratio = ratio(bench1, bench2)
  BenchmarkTools.TrialRatio:
    time:             0.9872530444481794
    gctime:           1.0
    memory:           1.0
    allocs:           1.0
julia> judge(bench_ratio) # or judge(bench1, bench2)
  BenchmarkTools.TrialJudgement:
    time:   -1.27% => invariant (5.00% tolerance)
    memory: +0.00% => invariant (1.00% tolerance)

Custom time and memory tolerances may be passed to judge via the time_tolerance and memory_tolerance keyword arguments.

Tuning#

It can be difficult, or just time consuming, to determine the correct number of evaluations required to get reasonable results from a benchmark. To aid in this task, BenchmarkTools offers a utility to to automatically tune a benchmark called tune!.

Back to the Ax = b example:

julia> bench = @benchmarkable x = A\b setup=(n=10; A=rand(n,n); b=rand(n))
Benchmark(evals=1, seconds=5.0, samples=10000)

julia> tune!(bench)
Benchmark(evals=10, seconds=5.0, samples=10000)

Here, we use a smaller value for n so that individual evaluations will run quickly. This will result in inaccurate timings as described in the Introduction. Passing the benchmarkable structure to tune! updates the number of evaluations required to achieve reliable timings.

As mentioned previously, the automatic tuning applied with the @benchmark macro is not always desirable as it is time consuming and may change from one benchmark run to another. This is not ideal when you would like your benchmarks to be repeatable and consistent. A workaround for this is shown below in the Benchmark Suites section.

Benchmark Suites and Regression Testing#

Benchmark Suites#

A benchmark suite can be defined as shown in the script below. The suite is defined to have three groups: string, trig, and dot, each of which are given tags such as "unicode".

Each of the benchmark groups are assigned functions to benchmark using the @benchmarkable macro

Before running the benchmark suite, a file containing tuning parameters is checked for. If the file is found, the parameters are loaded, and if not, the benchmarks are tuned and the parameter file saved for future runs.

Finally, the benchmarks are run and the output saved.

using BenchmarkTools
using Random

# Define a parent BenchmarkGroup to contain our suite
const suite = BenchmarkGroup()

# Add some child groups to our benchmark suite.
suite["string"] = BenchmarkGroup(["unicode"])
suite["trig"] = BenchmarkGroup(["math", "triangles"])
suite["dot"] = BenchmarkGroup(["broadcast", "elementwise"])

# This string will be the same every time because we're seeding the RNG
teststr = join(rand(MersenneTwister(1), 'a':'d', 10^4))

# Add some benchmarks to the "utf8" group
suite["string"]["replace"] = @benchmarkable replace($teststr, "a" => "b") seconds=Float64(π)
suite["string"]["join"] = @benchmarkable join($teststr, $teststr) samples=42

# Add some benchmarks to the "trig"/"dot" group
for f in (sin, cos, tan)
    for x in (0.0, pi)
        suite["trig"][string(f), x] = @benchmarkable $(f)($x)
        suite["dot"][string(f), x] = @benchmarkable $(f).([$x, $x, $x])
    end
end

# If a cache of tuned parameters already exists, use it, otherwise, tune and cache
# the benchmark parameters. Reusing cached parameters is faster and more reliable
# than re-tuning `suite` every time the file is included.
paramspath = joinpath(dirname(@__FILE__), "params.json")

if isfile(paramspath)
    loadparams!(suite, BenchmarkTools.load(paramspath)[1], :evals);
else
    tune!(suite)
    BenchmarkTools.save(paramspath, params(suite));
end

@show suite

results = run(suite; verbose=true);
BenchmarkTools.save("results.json", results);

The suite has the following structure:

suite = 3-element BenchmarkTools.BenchmarkGroup:
  tags: []
  "string" => 2-element BenchmarkTools.BenchmarkGroup:
      tags: ["unicode"]
      "join" => Benchmark(evals=1, seconds=5.0, samples=42)
      "replace" => Benchmark(evals=1, seconds=3.141592653589793, samples=10000)
  "dot" => 6-element BenchmarkTools.BenchmarkGroup:
      tags: ["broadcast", "elementwise"]
      ("cos", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("tan", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("cos", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("sin", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("sin", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("tan", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
  "trig" => 6-element BenchmarkTools.BenchmarkGroup:
      tags: ["math", "triangles"]
      ("cos", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("tan", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("cos", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("sin", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("sin", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
      ("tan", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)

The results produced by BenchmarkTools are not particularly reader friendly as an output of the above snippet is ~1.2MB. However, the output can be very useful for regression testing on merge requests for a given repository.

Indexing and Filtering BenchmarkGroups#

Indexing#

BenchmarkGroups may be indexed by their tags to run benchmarks or inspect a subset of results of the total suite:

julia> run(suite["string"])
  2-element BenchmarkTools.BenchmarkGroup:
    tags: ["unicode"]
    "join" => Trial(147.403 ms)
    "replace" => Trial(158.841 μs)

or

julia> @show results["string"]
  results["string"] = 2-element BenchmarkTools.BenchmarkGroup:
    tags: ["unicode"]
    "join" => Trial(151.097 ms)
    "replace" => Trial(158.255 μs)

Furthermore, most of the functions expecting result-related types (Trial, TrialEstimate, TrialRatio, and TrialJudgement) work on BenchmarkGroup structures as well. Therefore, it is possible to extract TrialEstimates from a BenchmarkGroup:

julia> @show median(results["string"])
  2-element BenchmarkTools.BenchmarkGroup:
    tags: ["unicode"]
    "join" => TrialEstimate(165.516 ms)
    "replace" => TrialEstimate(165.529 μs)

Filtering#

In the case of a complex benchmark suite, it is possible that you would like to run a number of benchmarks spanning different groups without running the whole suite. This can be achieved by using the @tagged macro and supplying a boolean expression consisting of the tags that you would like to run combined with !, (), ||, and &&.

For example, given the following BenchmarkGroup:

julia> g = BenchmarkGroup([], # no tags in the parent
                          "c" => BenchmarkGroup(["5", "6", "7"]), # tagged "5", "6", "7"
                          "b" => BenchmarkGroup(["3", "4", "5"]), # tagged "3", "4", "5"
                          "a" => BenchmarkGroup(["1", "2", "3"],  # contains tags and child groups
                                                "d" => BenchmarkGroup(["8"], 1 => 1),
                                                "e" => BenchmarkGroup(["9"], 2 => 2)));

we can whittle down the benchmark group using something like

julia> g[@tagged ("3" || "7") && !("1")]
  BenchmarkTools.BenchmarkGroup:
    tags: []
    "c" => BenchmarkGroup(["5", "6", "7"])
    "b" => BenchmarkGroup(["3", "4", "5"])

A group, g, is considered to have a given tag, t, if one of the following is true:

  • t is attached explicitly to g by construction (e.g. g = BenchmarkGroup([t]))

  • t is a key that points to g in g’s parent group (e.g. BenchmarkGroup([], t => g))

  • t is a tag of one of g’s parent groups (all the way up to the root group)

Regresstion Testing#

A regression test can be carried out using the regression function shown in the following code block. This assumes that you have run and saved the benchmark results for both the master branch and the merge request and saved them as master.json and mr.json respectively.

julia> master = BenchmarkTools.load("master.json")[1]
julia> mr = BenchmarkTools.load("mr.json")[1]
julia> regs = regressions(judge(minimum(mr), minimum(master)))
julia> pairs = leaves(regs) # leaves creates an iterator over the BenchmarkGroup's leaf nodes

which may result in something like the following:

julia> @show pairs
  1-element Array{Any,1}:
    (Any["string","join"],BenchmarkTools.TrialJudgement:
     time:   +41.13% => regression (1.00% tolerance)
     memory: +0.00% => invariant (1.00% tolerance))

This output highlights the benchmarks that have regressed, thus showing where a user’s attention should be focused in writing a fix.

A simple interface like this can also be very useful for continuous integration systems.

Further Reading#