CPUs#
A processor is a physical computer element that can perform computer processing operations on a data stream, usually from some form of computer memory. A ‘central processing unit’ (CPU) is a processor with sufficient circuitry to execute the instructions of a computer program. It is distinct from other ‘coprocessors’, such as GPUs in that it is intended to provide a system’s general compute functionality.
CPUs have their own hardware cache independent of main system memory, which is physically closer to the CPU and usually implemented with a different technology favouring throughput over capacity. CPU caches are leveled (L1, L2, L3 etc) with L1 being ‘smallest but fastest’ with slower but higher capacity caches for increasing level numbers.
A Multi-core Processor is processor on a single integrated circuit with two or more processing units - called cores
. Each core can independently operate on program instructions. Cores will typically share some memory cache, often at the L2 level - with an L1 cache per-core, but different topologies are common also - with some form of inter-core communication.
The term ‘multi-cpu’ is typically used to describe systems with multiple physically separate processing units with a different form of communication between the units than inter-core.
Multithreading is the ability of a core to support multiple concurrent ‘threads of execution’, that is some independent elements of a program that can be run concurrently. In multithreading the threads share the resources of the core, in contrast to multiprocessing where they have exclusive access to the core. The efficiency of multithreading relative to multiprocessing depends heavily on the program in question and the ability of the core to schedule execution of the thread instructions.
Simultaneous multithreading (SMT) is a multithreading technique that can execute instructions from more than one thread in a single pipeline stage. On Intel chips this is known as HyperThreading
.
Querying CPU Info#
There are many tools available to query the available CPU topology on a system. Some lower-level tools are introduced first, with higher-level tools (which are built on top of them but include extra logic or features) introduced after.
Mac#
Recent Apple Mac CPUs run on ARM based System-on-Chip (SoC) systems. SoC systems combine many of the computer’s components on single integrated circuits - in contrast to conventional systems with may have independent electronic circuitry for communication between hetergenous system elements.
macOS is based on BSD kernel elements so we can use BSD tools for system inspection. sysctl
is a command-line utility to get or set the kernel state. Running sysctl -a $KEY
will list all system values for the provided key.
The hw
key can be used to query system processor, cache and topology.
Relevant keys for Mac SoC CPUs are:
hw.activecpu
/hw.logicalcpu
: number of enabled logical processors on the SoC.hw.ncpu
/hw.logicalcpu_max
: number of logical processors on the SoC, noting that some may not be active or available to the OS.hw.physicalcpu
: number of enabled physical cores on the SoC.hw.physicalcpu_max
: number of physical processors on the SoC.
Memory, page and cache line sizes are also given under the hw
key. The sysctl.c source can give further hints on parameter meanings. The SoC can have processors with different performance levels
- the number of which is given by hw.nperflevels
. The hw.perflevelN
key gives the topology of cores sharing LX caches and the cache sizes for level N
.
The machdep
namespace holds the subset of ‘machine dependent’ keys. Key names are described in the sysctl source. Running sysctl
with the machdep
namespace for the M1 chip you will get something like:
machdep.cpu.cores_per_package: 10
machdep.cpu.core_count: 10
machdep.cpu.logical_per_package: 10
machdep.cpu.thread_count: 10
machdep.cpu.brand_string: Apple M1 Pro
with key meanings:
cpu.cores_per_package
: number of physical processors on the SoC, akahw.physicalcpu_max
cpu.core_count
: number of active processors on the SoC, akahw.physicalcpu
cpu.logical_per_package
: number of logical cores on the Soc, akahw.logicalcpu_max
cpu.thread_count
: number of active logical cores on the Soc, akahw.logicalcpu
Linux#
This section has some notes on the linux /proc/cpuinfo
file - which details available CPUs on a system. The file format is:
key0: val0
key1: val1
key0: val0
key1: val1
key0: val0
key1: val1
with a newline separated block for each logical processor. Block keys are optional - absence of a key is informative regarding processor capabilities. Keys in a single processor, 1 core system with no SMT will be:
processor: 0 # Logical processor id
model name: Vender Model String
cache size: 64 KB
For a single processor with 1 core and SMT it will be:
processor : 0
model name : Vender Model String
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
processor : 1
model name : Vender Model String
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
Here we have two logical processors (0 and 1) with the same physical processor id physical id
and core id
and a sibling count siblings
of 2. When the number of cores is NOT equal to the sibling count on a given physical processor (socket) then SMT is active. For a multiprocessor with 4 cores on a socket we have:
processor : 0
model name : Vender Model String
cache size : 6144 KB
physical id : 0
siLinuxblings : 4
core id : 0
cpu cores : 4
processor : 1
model name : Vender Model String
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 1
cpu cores : 4
processor : 2
model name : Vender Model String
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 2
cpu cores : 4
processor : 3
model name : Vender Model String
cache size : 6144 KB
physical id : 0
siblings : 4
core id : 3
cpu cores : 4
Here the socket, or multi-core processor, has physical id
0 and each code has an id core id
. The number of cores matches the number of siblings so SMT is not active. For a multiprocessor with two cores:
processor : 0
model name : Vender Model String
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
processor : 1
model name : Vender Model String
cache size : 2048 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
The number of siblings is equal to number of codes, so no SMT. Next a multiprocessor with two physical cpus each with a single core with have:
processor : 0
model name : Vender Model String
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
processor : 1
model name : Vender Model String
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
processor : 2
model name : Vender Model String
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 1
processor : 3
model name : Vender Model String
cache size : 1024 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 1
Here the number of siblings on a core (2) is not equal to the number of cores on the processor (1) so SMT is enabled. Finally a two CPU system, each with two cores and no SMT:
processor : 0
model name : Vender Model String
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
processor : 1
model name : Vender Model String
cache size : 4096 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
processor : 2
model name : Vender Model String
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 0
cpu cores : 2
processor : 3
model name : Vender Model String
cache size : 4096 KB
physical id : 3
siblings : 2
core id : 1
cpu cores : 2
Here we have two physical ids, on each there are two core ids and a sibling count of 2 - so no SMT.
Higher Level Tools#
hwloc is a library for obtaining information about the topology of the system resources. It is developed and used as part of OpenMPI, so is a good way to understand the topology as seend by that ecosystem.
pusutil
is a Python library for querying system resources in general, including CPU info.
The PyTorch cpuinfo library can analyse CPU info across platforms. This info is made available in higher-level PyTorch APIs.
ICHEC’s icsystemutils
library has examples of using lower level libraries to get CPU info.