caching
Caching is configured through .yaml
files. The .bw.yaml
(see bw_yaml) defines the default caching configuration
but this can be overriden on the command line with the --cache-config
option.
Each caching configuration must use a !Caching
tag as its root element, as
per the example below:
The fields of the !Caching
tag are:
Field | Default | Description |
---|---|---|
enabled | True |
Whether to enable caching by default (overridable on command line) |
targets | False |
Whether to pull targetted1 transforms from the cache |
trace | False |
Whether to enable (computationally intensive) debug tracing. |
caches | [] |
A list of !Cache configurations (see below) |
1 Targetted transforms are those selected to be run by a workflow, in contrast to those which are only dependencies of targetted transforms.
The fields of the !Cache
tag are:
Field | Required | Default | Description |
---|---|---|---|
name | The name of the cache (used in logging etc) | ||
path | The python import path to the cache implementation2 | ||
max_size | None |
The maximum cache size, which the cache will self-prune down to | |
store_condition | False |
The condition for storing to the cache3 | |
fetch_condition | False |
The condition for fetching from the cache3 | |
check_determinism | True |
Whether to check object determinism4 |
2 Specified as <package>.<sub-package>.<class-name>
3 Specified as:
True
: Always store-to or fetch-from this cacheFalse
: Never store-to or fetch-from this cache-
<x>B/s
: Only store-to or fetch-from this cache if the transforms byte-rate is below the provided value. This is useful when a cache is networked, and it may be more efficient to just re-compute quick-to-run, high-output transforms than pull them down. Some example values are:1B/s
: > 1 second to create each byte.1GB/h
: > 1 hour to create each Gigabyte.5MB/4m
> 4 minutes to create each 5 Megabytes.
Note: The rate-specification may be removed in the future, in favour of a dynamic scheme.
4 When enabled, if a transform hash exists in the cache and the
transform is re-run, check that both produced the same output hashes. It is
recommended this is left on, but it may be desirable to turn this off if
cache lookups are expensive for a particular cache. Note, this will result
in fetches of the key-data even when fetch_condition is False
.