Julia#
The purpose of this section is to create a list and reference of Julia libraries that ICHEC staff may find useful.
Contents#
Acceleration
Logging
Measurement - All things for probing code and extracting performance metrics
Numerical
Serialisation
Testing
Utility
Benchmarking#
The BenchmarkTools library supplies a framework for writing and running groups of benchmarks in Julia. It can be used in both a development environment for optimisation purposes as well as for CI to monitor possible performance regressions.
The library is used for running the base benchmarks for the Julia language itself.
Introduction#
Due to a large number of contributing factors, effectively benchmarking software can be challenging.
A simple, naive way of benchmarking code can be simply recording the time before and after the code you would like to execute and calculating the elapsed time.
This can be done in Julia using the @time
macro:
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia @time x = A\b;
0.951087 seconds (3.32 M allocations: 160.341 MiB, 2.83% gc time)
Seems pretty slow and a lot of memory was used for the small task it was given… This is because Julia is Just-In-Time compiled (JIT) and therefore the numbers we’re seeing here include the compilation. Running the same line again results in the following:
julia> @time x = A\b;
0.000177 seconds (5 allocations: 79.984 KiB)
Much better!
To try and overcome these sources of variance on timings, we can run a large number of samples and simply average the times. While this is obviously better, we must then worry about how many tests to run, the distribution of the timings, whether the average is a reasonable metric, etc.
So, this compilation phase coupled with the usual suspects that make benchmarking non-trivial call for a robust benchmarking framework. That’s where this library comes in.
The BenchmarkTools manual states that the library was created to facilitate the following tasks:
Organize collections of benchmarks into manageable benchmark suites
Configure, save, and reload benchmark parameters for convenience, accuracy, and consistency
Execute benchmarks in a manner that yields reasonable and consistent performance predictions
Analyze and compare results to determine whether a code change caused regressions or improvements
The manual also outlines some terminology used throughout:
“evaluation”: a single execution of a benchmark expression.
“sample”: a single time/memory measurement obtained by running multiple evaluations.
“trial”: an experiment in which multiple samples are gathered (or the result of such an experiment).
“benchmark parameters”: the configuration settings that determine how a benchmark trial is performed
The difference between a sample and an evaluation is that an evaluation may execute faster than your timing resolution and therefore result in an invalid measurement. Therefore, a number of evaluations must be run per sample to give a representative time per evaluation.
Installation#
As with all Julia packages, the library is installed using the built in package manager:
julia> using Pkg
julia> Pkg.add("BenchmarkTools")
Usage#
BenchmarkTools offers three main macros:
@btime
@benchmark
@benchmarkable
which will be covered here.
@btime
#
The @btime
macro is used as a drop in replacement for the @time
macro mentioned above.
It is mostly used for quick sanity checks giving an indication of the time required to run the code as well as the allocated memory.
Its main advantage is that it ignores the timing of the initial compilation step.
julia> using BenchmarkTools
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia> @btime x = $A\$b;
81.713 μs (5 allocations: 79.98 KiB)
Note the use of $A
and $b
when calling @btime
, which will be discussed in the gotchas section.
@benchmark
#
The @benchmark
macro is used to give a more detailed analysis of a code section including statistics on the timings.
Following on with the example outlined above of solving for x
in Ax = b
(the $A
and $b
syntax is used here also, which is discussed in the gotchas section):
julia> using BenchmarkTools
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia> @benchmark x = $A\$b
BenchmarkTools.Trial:
memory estimate: 79.98 KiB
allocs estimate: 5
--------------
minimum time: 88.310 μs (0.00% GC)
median time: 129.171 μs (0.00% GC)
mean time: 149.087 μs (5.14% GC)
maximum time: 6.987 ms (0.00% GC)
--------------
samples: 10000
evals/sample: 1
@benchmarkable
#
The @benchmarkable
macro is used to create a benchmark without running it immediately.
The benchmark can be started by passing it to the run
function:
julia> using BenchmarkTools
julia> n = 100;
julia> A = rand(n, n);
julia> b = rand(n);
julia> bench = @benchmarkable x = $A\$b
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> run(bench)
BenchmarkTools.Trial:
memory estimate: 79.98 KiB
allocs estimate: 5
--------------
minimum time: 82.495 μs (0.00% GC)
median time: 123.419 μs (0.00% GC)
mean time: 142.598 μs (6.47% GC)
maximum time: 8.160 ms (81.93% GC)
--------------
samples: 10000
evals/sample: 1
You can optionally pass the benchmark structure to a tuning function which will be discussed in the Tuning section.
This optional tuning is carried out automatically with the @benchmark
macro; however, this is not always desired.
This is especially useful when creating suites of benchmarks which are discussed below.
Gotchas#
Globals#
A common performance tip with Julia is to avoid global variables. This is due to the fact that a variable may have its value, and therefore type, change at any time and this makes it difficult for the compiler to optimise code that uses global variables.
To work around this issue, it is suggested to “interpolate” the variables into the benchmark using the $(...)
syntax:
# rand(1000) is executed for each evaluation
julia> @benchmark sum(rand(1000))
BenchmarkTools.Trial:
memory estimate: 7.94 KiB
allocs estimate: 1
--------------
minimum time: 1.566 μs (0.00% GC)
median time: 2.135 μs (0.00% GC)
mean time: 3.071 μs (25.06% GC)
maximum time: 296.818 μs (95.91% GC)
--------------
samples: 10000
evals/sample: 10
# rand(1000) is evaluated at definition time, and the resulting
# value is interpolated into the benchmark expression
julia> @benchmark sum($(rand(1000)))
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 101.627 ns (0.00% GC)
median time: 101.909 ns (0.00% GC)
mean time: 103.834 ns (0.00% GC)
maximum time: 276.033 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 935
Compiler Optimisations#
Note that for trivial, non-observable tasks, the compiler may very well optimise the code away entirely:
julia> a = 1; b = 2;
julia> @btime $a + $b
0.024 ns (0 allocations: 0 bytes)
3
To combat this in the above example, we instead interpolate a reference to a variable and dereference it within the benchmark:
julia> a = 1; b = 2;
julia> @btime $(Ref(a))[] + $(Ref(b))[]
1.277 ns (0 allocations: 0 bytes)
3
Benchmark Parameters#
The default benchmark parameters are stored in BenchmarkTools.DEFAULT_PARAMETERS
whose structure and default values are shown below:
Parameter |
Type |
Default |
Explanation |
---|---|---|---|
|
|
|
Number of samples to take |
|
|
|
Seconds budgeted for benchmarking |
|
|
|
Number of evals per sample |
|
|
|
Offset to subtract due to loop overhead (in ns) |
|
|
|
Run |
|
|
|
Run |
|
|
|
Percentage noise tolerance on times used for analysis |
|
|
|
Percentage noise tolerance on memory used for analysis |
The values can be changed be setting the individual fields of BenchmarkTools.DEFAULT_PARAMETERS
or by setting values as part of a benchmarkable
macro:
julia> bench = @benchmarkable sum($(rand(1000))) seconds=1 time_tolerance=0.01
julia> run(bench)
Setup and teardown#
Setup and teardown is supported with setup
and teardown
expressions.
Back to our Ax = b
example above, it may be written as follows using a setup expression:
julia> using BenchmarkTools
julia> @benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n))
Note the lack of variable interpolation in the benchmarked expression.
setup
and teardown
expressions are executed once per sample, not each evaluation (as described in the Introduction.
This is something to keep in mind when mutating state within evaluations.
For example, if running N
evaluations of a sorting algorithm where the sorting is carried out in place, then N-1
of the evaluations will be carried out on a sorted array.
Handling Results#
BenchmarkTools has four types related to benchmark results:
Trial
- Contains all collected samples and benchmark parametersTrialEstimate
- A single estimate to summarise aTrial
TrialRatio
- A comparison between twoTrail
sTrialJudgement
- A classifier of aTrialRatio
asinvariant
,regression
, orimprovement
Trial
s are the return type of running benchmarks:
julia> using BenchmarkTools
julia> trial = @benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n))
BenchmarkTools.Trial:
memory estimate: 79.98 KiB
allocs estimate: 5
--------------
minimum time: 88.470 μs (0.00% GC)
median time: 128.398 μs (0.00% GC)
mean time: 140.650 μs (5.12% GC)
maximum time: 5.099 ms (97.06% GC)
--------------
samples: 10000
evals/sample: 1
The representation of a Trial
shows the results of the minimum
, median
, mean
, and maximum
TrialEstimate
s.
There are calculated using overloads of their respective functions.
When considering which estimator to use, refer to the following:
The minimum is a robust estimator for the location parameter of the time distribution, and it should not be considered an outlier
The median, as a robust measure of central tendency, should be relatively unaffected by outliers
The mean, as a non-robust measure of central tendency, will usually be positively skewed by outliers
The maximum should be considered a primarily noise-driven outlier, and it can change drastically between benchmark trials.
TrialRatio
s are computed by calling the ratio
function with two TrialEstimates
.
These ratios can then be passed to judge
to classify whether the benchmark is an invariant
, a regression
, or an improvement
:
julia> bench1 = median(@benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n)));
julia> bench2 = median(@benchmark x = A\b setup=(n=100; A=rand(n,n); b=rand(n)));
julia> bench_ratio = ratio(bench1, bench2)
BenchmarkTools.TrialRatio:
time: 0.9872530444481794
gctime: 1.0
memory: 1.0
allocs: 1.0
julia> judge(bench_ratio) # or judge(bench1, bench2)
BenchmarkTools.TrialJudgement:
time: -1.27% => invariant (5.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance)
Custom time and memory tolerances may be passed to judge
via the time_tolerance
and memory_tolerance
keyword arguments.
Tuning#
It can be difficult, or just time consuming, to determine the correct number of evaluations required to get reasonable results from a benchmark.
To aid in this task, BenchmarkTools offers a utility to to automatically tune a benchmark called tune!
.
Back to the Ax = b
example:
julia> bench = @benchmarkable x = A\b setup=(n=10; A=rand(n,n); b=rand(n))
Benchmark(evals=1, seconds=5.0, samples=10000)
julia> tune!(bench)
Benchmark(evals=10, seconds=5.0, samples=10000)
Here, we use a smaller value for n
so that individual evaluations will run quickly.
This will result in inaccurate timings as described in the Introduction.
Passing the benchmarkable structure to tune!
updates the number of evaluations required to achieve reliable timings.
As mentioned previously, the automatic tuning applied with the @benchmark
macro is not always desirable as it is time consuming and may change from one benchmark run to another.
This is not ideal when you would like your benchmarks to be repeatable and consistent.
A workaround for this is shown below in the Benchmark Suites section.
Benchmark Suites and Regression Testing#
Benchmark Suites#
A benchmark suite can be defined as shown in the script below.
The suite is defined to have three groups: string
, trig
, and dot
, each of which are given tags such as "unicode"
.
Each of the benchmark groups are assigned functions to benchmark using the @benchmarkable
macro
Before running the benchmark suite, a file containing tuning parameters is checked for. If the file is found, the parameters are loaded, and if not, the benchmarks are tuned and the parameter file saved for future runs.
Finally, the benchmarks are run and the output saved.
using BenchmarkTools
using Random
# Define a parent BenchmarkGroup to contain our suite
const suite = BenchmarkGroup()
# Add some child groups to our benchmark suite.
suite["string"] = BenchmarkGroup(["unicode"])
suite["trig"] = BenchmarkGroup(["math", "triangles"])
suite["dot"] = BenchmarkGroup(["broadcast", "elementwise"])
# This string will be the same every time because we're seeding the RNG
teststr = join(rand(MersenneTwister(1), 'a':'d', 10^4))
# Add some benchmarks to the "utf8" group
suite["string"]["replace"] = @benchmarkable replace($teststr, "a" => "b") seconds=Float64(π)
suite["string"]["join"] = @benchmarkable join($teststr, $teststr) samples=42
# Add some benchmarks to the "trig"/"dot" group
for f in (sin, cos, tan)
for x in (0.0, pi)
suite["trig"][string(f), x] = @benchmarkable $(f)($x)
suite["dot"][string(f), x] = @benchmarkable $(f).([$x, $x, $x])
end
end
# If a cache of tuned parameters already exists, use it, otherwise, tune and cache
# the benchmark parameters. Reusing cached parameters is faster and more reliable
# than re-tuning `suite` every time the file is included.
paramspath = joinpath(dirname(@__FILE__), "params.json")
if isfile(paramspath)
loadparams!(suite, BenchmarkTools.load(paramspath)[1], :evals);
else
tune!(suite)
BenchmarkTools.save(paramspath, params(suite));
end
@show suite
results = run(suite; verbose=true);
BenchmarkTools.save("results.json", results);
The suite
has the following structure:
suite = 3-element BenchmarkTools.BenchmarkGroup:
tags: []
"string" => 2-element BenchmarkTools.BenchmarkGroup:
tags: ["unicode"]
"join" => Benchmark(evals=1, seconds=5.0, samples=42)
"replace" => Benchmark(evals=1, seconds=3.141592653589793, samples=10000)
"dot" => 6-element BenchmarkTools.BenchmarkGroup:
tags: ["broadcast", "elementwise"]
("cos", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
("tan", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
("cos", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
("sin", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
("sin", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
("tan", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
"trig" => 6-element BenchmarkTools.BenchmarkGroup:
tags: ["math", "triangles"]
("cos", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
("tan", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
("cos", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
("sin", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
("sin", π) => Benchmark(evals=1, seconds=5.0, samples=10000)
("tan", 0.0) => Benchmark(evals=1, seconds=5.0, samples=10000)
The results produced by BenchmarkTools are not particularly reader friendly as an output of the above snippet is ~1.2MB. However, the output can be very useful for regression testing on merge requests for a given repository.
Indexing and Filtering BenchmarkGroup
s#
Indexing#
BenchmarkGroup
s may be indexed by their tags to run benchmarks or inspect a subset of results of the total suite:
julia> run(suite["string"])
2-element BenchmarkTools.BenchmarkGroup:
tags: ["unicode"]
"join" => Trial(147.403 ms)
"replace" => Trial(158.841 μs)
or
julia> @show results["string"]
results["string"] = 2-element BenchmarkTools.BenchmarkGroup:
tags: ["unicode"]
"join" => Trial(151.097 ms)
"replace" => Trial(158.255 μs)
Furthermore, most of the functions expecting result-related types (Trial
, TrialEstimate
, TrialRatio
, and TrialJudgement
) work on BenchmarkGroup
structures as well.
Therefore, it is possible to extract TrialEstimate
s from a BenchmarkGroup
:
julia> @show median(results["string"])
2-element BenchmarkTools.BenchmarkGroup:
tags: ["unicode"]
"join" => TrialEstimate(165.516 ms)
"replace" => TrialEstimate(165.529 μs)
Filtering#
In the case of a complex benchmark suite, it is possible that you would like to run a number of benchmarks spanning different groups without running the whole suite.
This can be achieved by using the @tagged
macro and supplying a boolean expression consisting of the tags that you would like to run combined with !
, ()
, ||
, and &&
.
For example, given the following BenchmarkGroup
:
julia> g = BenchmarkGroup([], # no tags in the parent
"c" => BenchmarkGroup(["5", "6", "7"]), # tagged "5", "6", "7"
"b" => BenchmarkGroup(["3", "4", "5"]), # tagged "3", "4", "5"
"a" => BenchmarkGroup(["1", "2", "3"], # contains tags and child groups
"d" => BenchmarkGroup(["8"], 1 => 1),
"e" => BenchmarkGroup(["9"], 2 => 2)));
we can whittle down the benchmark group using something like
julia> g[@tagged ("3" || "7") && !("1")]
BenchmarkTools.BenchmarkGroup:
tags: []
"c" => BenchmarkGroup(["5", "6", "7"])
"b" => BenchmarkGroup(["3", "4", "5"])
A group, g
, is considered to have a given tag, t
, if one of the following is true:
t
is attached explicitly tog
by construction (e.g.g = BenchmarkGroup([t])
)t
is a key that points tog
ing
’s parent group (e.g.BenchmarkGroup([], t => g)
)t
is a tag of one ofg
’s parent groups (all the way up to the root group)
Regresstion Testing#
A regression test can be carried out using the regression
function shown in the following code block.
This assumes that you have run and saved the benchmark results for both the master branch and the merge request and saved them as master.json
and mr.json
respectively.
julia> master = BenchmarkTools.load("master.json")[1]
julia> mr = BenchmarkTools.load("mr.json")[1]
julia> regs = regressions(judge(minimum(mr), minimum(master)))
julia> pairs = leaves(regs) # leaves creates an iterator over the BenchmarkGroup's leaf nodes
which may result in something like the following:
julia> @show pairs
1-element Array{Any,1}:
(Any["string","join"],BenchmarkTools.TrialJudgement:
time: +41.13% => regression (1.00% tolerance)
memory: +0.00% => invariant (1.00% tolerance))
This output highlights the benchmarks that have regressed, thus showing where a user’s attention should be focused in writing a fix.
A simple interface like this can also be very useful for continuous integration systems.