BCM is a C++ software package for the Bayesian analysis of computational models. It provides efficient implementations of eleven sampling algorithms for generating posterior samples and calculating marginal likelihoods. Additional tools are included which facilitate the process of specifying models and visualizing the sampling output. This toolkit can be used for analyzing the uncertainty in the parameters and the predictions of computational models using Bayesian statistics.
The toolkit is further described in the manuscript: BCM: toolkit for Bayesian analysis of Computational Models using samplers.
If you have any questions, comments or suggestions, please feel free to contact us at b.thijssen@nki.nl.
Version 1.0.4 [download .tar.gz or .zip] — this is the release used in the BCM publication.
Version 2.0.1 [download .tar.gz or .zip] — New release of 2017-04-20 — added support for SBML, NetCDF and some ODE functionality using CVode, as well many other changes.
For Linux users: these prerequisites are usually available through your distribution’s package manager.
For Mac users: CMake and Boost can be most easily installed using Homebrew.
BCM has the following prerequisites:
BOOST_ROOT
environment variable which points to the root directory of the Boost libraryBCM_ROOT
environment variable which points to the root directory of BCM. This is only necessary if you wish to use the model parsing tool.cmake ..
in a command prompt, or using the CMake GUI tool.%BCM_ROOT%/bin/Release
BCM_ROOT
environment variable which points to the root directory of BCM. This is only necessary if you wish to use the model parsing tool.BOOST_ROOT
environment variable which points to the root directory of the Boost librarycmake ..
in a command prompt, or using the CMake GUI tool.make
$BCM_ROOT/bin
There are three main ways of using BCM:
An example for how to use the model parsing tool is located in the src/examples/GaussianShellsUsingMdlParse directory.
%BCM_ROOT%\scripts\make_model.bat shells10D.txt
$BCM_ROOT/scripts/make_model.sh shells10D.txt
%BCM_ROOT%\bin\Release\mdlinf shells10D –c config_smc.txt
$BCM_ROOT/bin/mdlinf shells10D -c config_smc.txt
An example for a basic dynamic library is provided in the src/examples/LikelihoodDynamicLibrary directory.
An inference always consists of two files, a DLL/so file which contains the likelihood function, and an XML file describing the parameters to be inferred and their prior distributions.
In the DLL/so, there are two functions to be implemented:
Both the DLL and XML file should have the same base file name, e.g. my_model.dll/so and my_model.xml. Furthermore, both files need to be stored in a folder with again the same name, my_model. The configuration of the sampling algorithm to be used can be specified in a configuration file. This gives the following directory structure:
some_path\.
\config.txt
\my_model\my_model.dll [or .so on linux&mac]
\my_model\my_model.xml
You can then run the inference as follows:
%BCM_ROOT%\bin\Release\mdlinf my_model –c config.txt
$BCM_ROOT/bin/mdlinf my_model -c config.txt
If you wish to run BCM using multiple threads, the two functions in the DLL need to be re-entrant. A thread index is supplied which can be used to store local data; the total number of threads that will be used is supplied to the DLL at startup as well. The easiest way to ensure thread safety may be to allocate several model instances as shown in the example.
There is a C++ example in the src/examples/GaussianShells directory which shows how the sampling toolkit can be used from inside a C++ application.
The CPU & memory usage of BCM depends on which algorithm is selected for the inference.
The CPU usage is controlled through the –numthreads (-n) option. The default setting is to use the number of hardware threads available on the current machine. For the algorithms based on parallel-tempering, the number of threads that can be used is limited to the number of parallel chains, so in that case there is no benefit in increasing the number of threads beyond the number of parallel chains.
The memory usage is proportional to the number of variables that are being sampled. For the algorithms based on parallel-tempering, only summary statistics and a buffer of 100 samples is stored in memory; thus this class of algorithms usually has the lowest memory usage. For the algorithms based on nested sampling, the current population of samples is stored in memory, thus the memory usage is proportional to number_of_variables*population_size. For the algorithms based on sequential Monte Carlo, both the current and the previous population of samples is stored in memory, thus the memory usage is proportional to 2*number_of_variables*population_size.
Basic usage:
mdlinf model_name –c config_file
There should be a subdirectory with the model’s name, e.g. “my_model”. This subdirectory should contain a dynamic library for the likelihood named “my_model.dll/so”, and an XML file for the prior named “my_model.xml”. Please see the examples for how the XML prior file should be formatted.
All options can be supplied either on the command line or in a configuration file. The configuration file follows the Boost Program options format. For example, there is an option “–sampler.type”. This option can be specified as such on the command line, for example “–sampler.type=smc”. Alternatively, it can be specified in a configuration file; in this case there should be a section “[sampler]” in the configuration file, and in this section the option can be specified as “type=smc”. It is usually easiest to specify most options in a configuration file. Some example configuration files are given in the examples folder. Note that options given on the command line will override the configuration file.
The most important options are the following:
Below is the full list of options:
Generic options: --help produce help message -j [ --numthreads ] arg (=0) number of threads to use; set to 0 for using all available hardware concurrency Command-line only options: -c [ --config_file ] arg (=config.txt) Configuration file Configuration: --model arg (=model) Name of the model, determines the filename: library will be [x].dll/.so, parameters will be [x].xml --output.folder arg (=output) Folder in which output files will be generated, subfolder of the model directory. --output.posterior_format arg (=bin) Format of the posterior samples, can be either "bin" or "tsv". --sampler.type arg Sampler type, can be "mh", "pt", "smc" or "nested". --sampler.burnin_period arg (=1000) [MH/PT] Number of samples for burnin-period, these are discarded. --sampler.sampling_period arg (=5000) [MH/PT] Number of samples to produce, after the burnin period. --sampler.use_every_nth arg (=1) [MH/PT] Subsampling, a value of 10 means that every 10th sample is used and the 9 samples in between are discarded. --sampler.num_mcmc_moves_per_iteration arg (=1) [MH/PT] Number of mcmc moves to make per iteration, mainly useful for decreasing the number of context switches when using multiple threads. --sampler.proposal_scaling arg (=1) [MH] Scaling factor for the proposal distribution. --sampler.numchains arg (=8) [PT] Number of chains --sampler.min_temperature arg (=0) [PT] Temperature of the lowest chain. Note that the evidence calculation is not correct if this is not set to 0. --sampler.adapt_proposal arg (=2500) [PT] Number of samples after which the proposal variance should be adapted, 0 for no adaptation. --sampler.exchange_probability arg (=0.5) [PT] Probability of choosing an exchange step instead of a MH-move. --sampler.temperatures arg [PT] Filename specifying a file containing the temperatures to be used for each chain, if empty use the default temperature schedule. --sampler.initial_temperature_power arg (=5) [PT] Specifies the rate of the power-law used for the initial temperature schedule, used if min_temperature=0. --sampler.adapt_temperatures_times arg (=0) [PT] Number of iterations of temperature adaptation steps. --sampler.adapt_temperatures_period arg (=1000) [PT] Number of samples to produce during each temperature adaptation step. --sampler.proposal_type arg (=autoblock) [PT] Proposal type, can be either: scalar, vector, autoblock --sampler.population_size arg (=1000) [SMC/Nested] Number of particles. --sampler.start_temperature arg (=0) [SMC] Starting temperature, between 0 and 1. --sampler.end_temperature arg (=1) [SMC] End temperature, between 0 and 1, typically 1. --sampler.temperature_schedule arg (=adaptive) [SMC] Schedule for the temperatures, can be "fixed_linear", "fixed_loglinear" or "adaptive". --sampler.temperature_schedule_adaptive_ratio arg (=1.0) [SMC] Ratio specifying the target of the next population's ESS, [0-1). Only meaningful if temperature_schedule=adaptive --sampler.temperature_schedule_fixed_number arg (=1000) [SMC] The number of iterations in a fixed temperature schedule. Only meaningful if temperature_schedule=fixed_linear or fixed_loglinear. --sampler.resampling_threshold arg (=0.5) [SMC] Threshold for resampling; the population will be resampled if ESS < threshold * population_size. --sampler.smc_strategy arg (=mcmc_global) [SMC] Strategy for sampling the next population, can be "mcmc_global", "mcmc_global_covariance", "mcmc_local", "kde_local". --sampler.smc_mcmc_steps arg (=1) [SMC] Number of MCMC steps per population iteration. Only meaningful if smc_strategy=mcmc_global or mcmc_local. --sampler.smc_mcmc_recalculate arg (=20) [SMC] After how many iterations to recalculate the proposal distribution. Only meaningful if smc_strategy=mcmc_global or mcmc_local. --sampler.smc_mcmc_local_num_nearby arg (=2) [SMC] Number of nearby particles to calculate local proposal as a multiple of the dimensionality. Only meaningful if smc_strategy=mcmc_local. --sampler.smc_mcmc_max_correlation arg (=1) [SMC] Run MCMC until the correlation of all marginal parameters between successive steps is less than or equal to this threshold, in chunks of smc_mcmc_steps. --sampler.sample_weights arg (=1) [Nested] Specify whether to calculate the weights by sampling ("true") or by expectation value ("false"); sampling the weights allows an estimation of the uncertainty in the evidence approximation. --sampler.termination_criterion arg (=remaining_evidence)[Nested] Termination criterion, can be "fixed", "ntimesh" or "remaining_evidence". --sampler.termination_num_steps arg (=1000) [Nested] Number of steps. Only meaningful if termination_criterion=fixed. --sampler.termination_remaining_evidence arg (=0.01) [Nested] Stop sampling when the current population can at most contribute this much to the evidence term. Only meaningful if termination_criterion=remaining_evidence. --sampler.termination_ntimesh_factor arg (=2.0) [Nested] Stop sampling when the number of steps exceeds this factor * population_size * H. Only used if termination_criterion=ntim esh. --sampler.nested_sampling_strategy arg (=multinest) [Nested] Strategy to use for proposing new samples, can be "prior", "ellipsoid", "multinest", or "mcmc". --sampler.nested_num_posterior_samples arg (=1000) [Nested] Number of posterior samples to be generated, can be different from population_size --sampler.rngseed arg (=0) Random number generator seed, set to 0 to use an arbitrary seed based on the computer time.