Wire Cell Toolkit Manual
Table of Contents
- 1. Installation
- 2. Configuration
- 3. Howtos
- 4. Internals
- 5. Packages
- 6. Data Flow Programming
- 7. Other Topics
1 Installation
The Wire Cell Toolkit (WCT) should be easy to build on any POSIX’y system with a recent C++ compiler. This section describes how to build releases and development branches, it gives guidance for supplying the few software dependencies, and documents how releases are made.
1.1 Toolkit installation
This assumes you already have available the required dependencies. See section 1.3.
Installation requires four steps:
- get the source
- configure the source
- build the code
- install the results
1.1.1 Source code
WCT source is composed of several packages (see section 5) and all source is available from the Wire Cell GitHub organization. Releases of each package are made and documented on GitHub (eg here) and can be downloaded as archives. However, using git to assemble a working source area is recommended and easier. Releases and development branches are handled slightly differently.
To obtain a release requires no GitHub authentication:
$ git clone --recursive --branch 0.5.x \ https://github.com/WireCell/wire-cell-build.git
This gets the tip of the release branch for the 0.5.x
series. If a specific release is desired a few more commands are needed. For example, if the 0.5.0
release that started the series is wanted:
$ git checkout -b 0.5.0 0.5.0 $ git submodule init $ git submodule update $ git submodule foreach git checkout -b 0.5.0 0.5.0
To obtain the development branch requires SSH authentication with GitHub:
$ git clone --recursive \ git@github.com:WireCell/wire-cell-build.git wct
Which ever way the source is obtained, enter the resulting directory
$ cd wire-cell-build/
At some time later if there is a need to switch between HTTP or SSH a switch-git-urls
script is available in this directory.
1.1.2 Configuring the source
At a minimum, the source must be configured with an installation location for the build results and to allow it to find its dependencies. This, and the remaining steps are done with the provided wcb
script which is an instance of Waf.
$ ./wcb --prefix=/path/to/install configure
This will print the results of the attempts to detect required and optional dependencies. Missing but optional dependencies will not cause failure. See below for guidance on installing dependencies if this step fails or if desired optional dependencies are not found.
Dependencies will likely be found automatically if pkg-config
is available possibly by suitably setting the PKG_CONFIG_PATH
environment variable. If automatic location fails then missing locations can be explicitly specified. The following shows an example where all externals are installed at a single location identified by the WCT_EXTERNALS
environment variable (not, this variable has no other special meaning other than to make this example brief).
$ ./wcb configure --prefix=/path/to/install \ --boost-includes=$WCT_EXTERNALS/include \ --boost-libs=$WCT_EXTERNALS/lib --boost-mt \ --with-root=$WCT_EXTERNALS \ --with-fftw=$WCT_EXTERNALS \ --with-eigen=$WCT_EXTERNALS \ --with-jsoncpp=$WCT_EXTERNALS \ --with-jsonnet=$WCT_EXTERNALS \ --with-tbb=$WCT_EXTERNALS
If the externals are not all in one directory then their locations must be accordingly specified individually.
1.1.3 Building the source
After the above is successful the configure
results are cached and all other build related commands are brief. To build the code to the temporary build/
directory one simply runs:
$ ./wcb
If there are build failures more information can be obtained by repeating the build with more verbosity:
$ ./wcb -vv
The build will try to run tests which can be avoided to save time:
$ ./wcb --notests
1.1.4 Running unit tests
Unit tests are meant to be small, focused tests. Some have grown beyond this intention and into full, if ad-hoc, applications themselves. These need to be reworked or moved into the wire-cell-validate package.
Unless --notests
are passed as above, the build system will build and run the many unit test programs. In general, all unit tests should run successfully (in practice some small fraction may not). To give them a chance to succeed they must at least be run with a properly set up environment. In particular LD_LIBRARY_PATH
must contain all library directories for external packages. Setting this is user/system dependent and so is left to the user.
Developers wishing to run unit tests that exercise code they are developing should take care in setting LD_LIBRARY_PATH
. If the WCT installation area is included then the unit tests will run against those libraries, effectively masking the locally built versions in the development area. Alternatively, they must run ./wcb install
and then manually re-run the unit test.
Setting LD_LIBRARY_PATH
is not as above required for building. To avoid polluting the build environment with superfluous settings it is possible to create a little shell script that will be used to run each test. As an example, we create tester.sh
that looks like:
#!/bin/sh /usr/bin/env LD_LIBRARY_PATH=/path/to/extern1/lib:/path/to/extern2/lib "$@"
After making this script executable it can be used like:
$ ./wcb --testcmd="/path/to/tester.sh %s"
Another useful option is --dump-test-scripts
which will produce a test_<name>_run.py
file for each test_<name>
that bakes in the environment and gives you a per-test runner that you can execute directly. You can use the same tester.sh
script here
$ /path/to/tester.sh ./wcb --dump-test-scripts --alltests $ ./build/util/test_fft_run.py
Where these two commands are executed in a shell that has no LD_LIBRARY_PATH
set.
1.1.5 Install the results
To install the build results into the location given by --prefix
simply issue:
$ ./wcb install
1.1.6 Other build commands
These other commands may be useful:
$ ./wcb clean # clean build products $ ./wcb distclean # also clean configuration # build with debug symbols $ ./wcb configure --build-debug=-ggdb3 [...] # to save some time, just # rebuild the given test # and don't run any tests $ ./wcb --notests --target=test_xxx $ ./wcb --help # see more options.
1.2 Runtime environment
Managing environment is usually a personal choice or computer facility policy and WCT does not place any significant requirements on this. The usual setting of PATH
like variables will likely be needed.
FIXME: we should look into setting RPATH
.
Internally, WCT does not require any environment however it will search a WIRECELL_PATH
when locating configuration or other (non data) input files. More information is in the section 2.
1.3 Guide for installation of dependencies
The WCT depends on a number of third-party “external” software packages which are not expected to be provided by a typical unix-like system:
- Boost
- various functions
- Eigen3
- matrix representation, interface to FFTW
- FFTW3
- for fast Fast Fourier Transforms
- JsonCPP
- basis for configuration and input data files
- Jsonnet
- structured configuration files.
- ROOT
- for many tests and I/O packages, but not the core library code
Additional, optional package are needed for additional functionality:
- TBB
- for data flow programming paradigm support
This list may not represent current reality.
To get a full, up-to-date list of what packages WCT can use run ./wcb --help
.
The following subsections gives some guidance for obtaining these “external” packages.
1.3.1 Manual Installation of Externals
In the DIY mode, the installer is free to provide the third-party packages in any convenient way. Many of them are available on well supported operating systems such as Debian/Ubuntu. Homebrew for Mac OS X is not a core developer platform but may provide many. Redhat derived Linux distributions may find suitable package on EPEL. Most of the required packages are fairly easy to build from source.
However the installer decides to build in DIY-mode the WCT build system should be able to be given proper installation locations via the --with-*
flags as described above. If it seems not to be the case, please contact the developers.
1.3.2 Automated Installation with Spack
Spack is a “meta build system” that runs the individual build systems that come with packages. It allows one to manage an ever growing installation area which can accommodate multiple versions of a package. It also comes with support for Environment Modules to handle your users’ setup of these packages or can make targeted release “views” of its package tree.
WCT provides a package wire-cell-spack which collects instructions and an Spack “repo” that builds WCT and its third-party dependencies. This leverages Spacks built-in “repo” to provide dependencies needed by WCT’s direct dependencies. Using it will tend to build packages that one may already have installed through the OS (eg, Python). However, this duplication should not add much to the overall build time which is automatic nor lead to any problems.
An installer that wishes to use wire-cell-spack to provide the dependencies should begin by following its README file.
1.3.3 Externals provided by UPS
Fermi National Accelerator Lab (FNAL) uses a user environment system similar to Environment Modules. It is typical to download binaries provided by FNAL, either manually of automatically via a CVMFS mount, and then use the UPS shell function setup
to configure a user environment with many environment variables. For each package (“UPS product”) that is so setup there is a variable that gives the installation location. These can be used to provide suitable values for the --with-*
flags to wcb
as described above. The source provides a script waftools/wct-configure-for-ups.sh
which may help run ./wcb configure
in such an environment.
1.4 Release management
Releases are made by developers as needed and as described in this section.
1.4.1 Release versions
WCT label releases are made following a fixed procedure. Releases are labeled with the common three-number convention: X.Y.Z
. These take the following semantic meanings:
- X
- a major release is made when developers believe some substantial milestone has been achieved or to being wholly new or a globally breaking development path.
- Y
- a minor or feature release is made when substantial new and in particular any breaking development is made.
- Z
- a bug release fixes problems without otherwise substantial changes.
1.4.2 Branch policy
Any new major or minor releases produce a new Git branch in each package. Only bug fixes are made to this branch. Where applicable, release bug fixes should be applied to master
. Nominally, all development is on the master
branch however developers are free to make their own feature branches. They are encourage to do this if their development is expected to be disruptive to other developers.
1.4.3 Branch mechanics
To make releases, the above details are baked into two test scripts make-release.sh and test-release.sh. See comments at the top of each for how to run them. These scripts can be used by others but are meant for developers to make official releases.
2 Configuration
As the Wire Cell Toolkit (WCT) is a toolkit, it is up to the parent application to provide some mechanism for the user to provide configuration information to WCT components. Users should refer to the application’s documentation for details. This section of the manual documents the configuration mechanism that is provided by WCT itself. If an application decides to use the WCT file format then its users may refer to this document. Developers of WCT components should read it as well.
2.1 Introduction
WCT itself provides a mechanism which is exposed to the user by the wire-cell
command line application. Any application may easily adopt this same mechanism by making use of the WireCell::ConfigManager
class.
This WCT configuration mechanism is described here from the point of view of user and developer. Details for each role are given in the following sections. However, both user and developer must understand one aspect of WCT internal design in order to understand configuration: A WCT application is composed of a number of component classes. Components work together in some way to enact the job of the application. A component is specifically a C++ class which implements one or more interface base classes. One interface pertinent here is IConfigurable
. A component that implements this interface is called a configurable component or just configurable. A configurable then is the atomic unit of WCT configuration and this unit is reflected in what the user provides for configuration and what developers should expect if they write configurable components.
The user then provides an ordered sequence of configuration objects or simply configurations. Each configuration is associated (by WCT) with exactly one instance of a configurable component class. This association is done via two string identifiers:
- type
- specifies the “configurable type” which often matches the C++ class name with any C++ namespace removed. However, developers of configurable components are free to chose any unique type name.
- name
- specifies a “configurable instance”, that is an C++ object instance of the C++ class associated with the configurable type identifier. The name is free form and may be omitted in which case it defaults to the empty string. A specific name is needed if multiple instances are required or if multiple configurables require sharing a component.
A type/name pair is are also used to initially construct and later locate any instance of a WCT component (not just configurable components).
Finally, configurations have a third attribute:
- data
- specifies a data structure following a schema specific to the configurable type. This is the “payload” that WCT gives to the instance of the configurable component.
In the next section, WCT user-configuration support is described. After it, the following section gives guidance to developers who wish to write their own configurable components.
2.2 Configuration from a user point of view user
Users of the WCT command line interface wire-cell
(or any WCT application that uses WireCell::ConfigManager
) can provide configuration information in the form of one or more files. These files express the ordered configuration sequence that is conceptually described above.
2.2.1 File formats
WCT supports two related configuration file formats: JSON and Jsonnet. Of the two, JSON is more fundamental while the Jsonnet data templating language provides a powerful way to organize and construct complex configurations. Jsonnet files are compiled into JSON by WCT and the result is then fed to the WCT configurable components.
2.2.2 Basic command line
A user gives one or more configuration files to the wire-cell
application each with a -c
flag:
$ wire-cell -c myparameters.cfg [...]
If a relative path is given, the file will be searched for starting in the current working directory and then in each directory listed in a WIRECELL_PATH
environment variable, if given. When multiple configuration are used, their top-level arrays are conceptually concatenated in the order on which they are given on the command line.
2.2.3 Diving into JSON
An example JSON configuration for a single component might look like:
[ { "data" : { "clight" : 1, "step_size" : 0.10000000000000001, "tracks" : [] }, "name" : "", "type" : "TrackDepos" } ]
Here we see an array holding one element which is an object with the type
, (instance) name
and payload data= structure as described above. If wire-cell
were to load this configuration it would create a default instance of the component type TrackDepos
which happens to correspond to the C++ class WireCell::Gen::TrackDepos
(see the simulation package manual for more information). This component is responsible for produces deposition (IDepo
) objects using a simple linear source model.
The tracks
array in this example is empty and no depositions would be produced. The user most certainly should specify a nonempty set of tracks. In principle, the user may produces a huge tracks
array. WCT support bzip2 compressed JSON files (see the section on persistence in the util package manual).
2.2.4 Limitations of JSON
As the complexity of a wire-cell
job grows, hand crafting JSON becomes tedious and error prone. Splitting the files and/or using WIRECELL_PATH
can provide some rudimentary means of organizing a large, complex configuration.
However, a user will quickly outgrow direct authoring of JSON files. An accomplished user will likely turn to some form of JSON generation using a more expressive language maybe by developing some scripts. Or, some part of a configuration may need to be extracted or converted from another source. For example, Geant4 steps might be extracted and formatted into a TrackDepos
configuration as a long tracks
array.
Another limitation is that any numerical quantities must be expressed in the base units used by the WCT system of units (see the section on units in the Utilities manual). This places a burden on the configuration author and is a source of error.
The user is free to generate JSON in any manner they wish as long as the result conforms to the required schema. However, WCT provides a second, more powerful JSON-like configuration file format which described next.
2.2.5 Learning Jsonnet
WCT provides support for configuration files following the Jsonnet data templating language. This language is evaluated to produce JSON. WCT can evaluation Jsonnet files directly. The user may also install the jsonnet
command line program which is useful for validating Jsonnet files. Either the valid Jsonnet or the JSON it produces may be given to WCT.
To learn how to write Jsonnet in general, the user should refer to its documentation which is excellent. There are many ways to structure Jsonnet and the wire-cell-cfg package provides a number of examples. It also provides support files that can help the user craft their configuration in Jsonnet. In particular the WCT system of units and some common data structures used by WCT are exported to Jsonnet in wirecell.jsonnet. Some of this exported functionality is illustrated below.
WCT locates Jsonnet files as it does JSON files through the environment variable WIRECELL_PATH
. Unlike JSON files, Jsonnet files may not be compressed.
2.2.5.1 System of units
Wire Cell provides an internal system of units as described in the section on units in the Utilities manual and as stated above, users must take care to give numerical quantities in WCT units when providing JSON. However, when writing Jsonnet one can provide explicit units which is easy and far less error prone. For example:
local wc = import "wirecell.jsonnet"; [ { type:"TrackDepos", data: { step_size: 1.0 * wc.millimeter, // or could abreviate with wc.mm } } ]
2.2.5.2 Functions
Some data sub-structures are needed in multiple laces and it can be laborious to write them by hand. Jsonnet provides functions to assist in this. A number of functions are defined to assist in representing common data types. For example point()
and ray()
:
{ // ... tracks : [ wc.ray(wc.point(10,0,0,wc.cm), wc.point(100,10,10,wc.cm)) ] },
2.2.5.3 Default parameters
It is typical that different components must share common values, or separate values which derive from common values. Jsonnet allows for this to be expressed in the configuration in a simple manner. For example, in the gen
package both Drifter
and Ductor
may apply statistical fluctuations. For debugging it can be useful to turn this feature off. This can be done in a consistent manner like in a global parameter file
// in uboone/globals.json { // ... // True if simulation should do fluctuations fluctuate: true, // ... }
This file can then be imported so that this variable may be applied where ever it is needed.
// in uboone/components.jsonnet local params = import "uboone/globals.jsonnet"; { // ... drifter: { type : "Drifter", data : { // ... other parameters ... fluctuate : params.fluctuate, } }, ductor: { type : 'Ductor', data : { // ... other parameters ... fluctuate : params.fluctuate, } }, // ... }
See next how these definitions are used.
2.2.5.4 Default structures
One useful way to factor a configuration is to have one Jsonnet file which holds default values and one or more that customize on top of those defaults. For example one the MicroBooNE configuration provided by wire-cell-cfg
defines a default configuration for the FourDee
WCT app.
An “app” is a top level main class run by WCT while an “application” refers to a program built with WCT that a user runs.
This app is configured with a list of components to use for certain portions of the “FourDee” simulation. By default these can are configured with the default types provided directly in the gen
package. Note, these configuration are generally in the form "TypeName:InstanceName"
but the defaults to not specify an instance name.
// in uboone/components.json { // ... fourdee : { type : 'FourDee', data : { DepoSource: "TrackDepos", Drifter: "Drifter", Ductor: "Ductor", Dissonance: "SilentNoise", Digitizer: "Digitizer", FrameSink: "DumpFrames", }, }, // ... }
The default type for FrameSink
is given as DumpFrames
. This component just prints a little bit of info to the terminal. The user probably wants to be able to save the result of the simulation in some more useful way. The simple I/O package provides a FrameSink
which will save the resulting simulated waveforms as 2D ROOT histograms. The user merely needs to override FrameSink
like:
// assumes user has this directory in their WIRECELL_PATH local uboone = import "uboone/components.jsonnet"; [ // ... skip other overrides ... uboone.fourdee { data : super.data { FrameSink: "HistFrameSink", } }, ]
This says to override uboone.fourdee
with what’s given. The type
is inherited. The data
is replaced by the parent’s via super.data
plus the additional override of the FrameSink
attribute.
2.2.5.5 Commas
One of the most irritating aspect of crafting JSON files by hand is that any array or object must not have an internal trailing comma. Jsonnet allows this otherwise extraneous comma, as shown in the example above. For this reason alone and if no other features are used, writing Jsonnet instead of raw JSON is worth the added dependency!
2.2.6 Specific detector support
The wire-cell-cfg
package also provides support for popular LArTPC detectors. You can find these files under a directory named for the experiment (such as that for MicroBooNE).
2.2.7 Using the jsonnet
command line program
Jsonnet’s command line program jsonnet
is fast and gives good error messages. It’s often easiest to develop a Jsonnet configuration using it for periodic validating. Assuming the current working directory is the top of the WCT source then running the following:
$ jsonnet -J cfg cfg/uboone/fourdee.jsonnet
should reward you with a big screen full of JSON. You can then run wire-cell
something like:
$ wire-cell -c uboone/fourdee.jsonnet
This relies on the WIRECELL_PATH
to include the cfg/
directory as well as any other directories holding any configuration data files referenced by the configuration.
2.3 TODO Configuration from a developer point of view devel
For the C++ part of developing WCT components or applications the developer should refer to the configuration section in the manual on WCT Internals and the section on configuration implementation.
In addition, a developer is encouraged to provide Jsonnet files that abstract away any less important details and give users a simplified way to configure the developers components.
In particular, if the developer writes multiple components, an application component or a component that refers to another component, working example configuration files should be provided.
3 Howtos
This section of the manual gives brief guidance on how to do various things with WCT.
3.2 Add a new component class
This describes the steps to add a Wire Cell Toolkit component. As an example, it walks through the creation of a component which will produce noise waveforms. The sections below are organized more or less in the order in which a developer advances from initial concept to design and implementation.
3.2.1 Conceptual design
Noise waveforms will be generated based on a voltage amplitude spectrum represented in the frequency domain. This amplitude will be sampled and abide by some fluctuation distribution and may have some parameters that allow for parametric scaling or other transformation so that the user may explore the results. For simplicity, the time domain noise waveforms will be produced given a fixed sample period and readout time which will also be configurable. It is assumed that the user arranges that these parameters, if needed elsewhere, are set consistently. (See Sec. 2).
Producing noise waveforms as described here require no other input data and in particular, none that would change over time or be a function of some “event” So, the component will follow the pattern of being a source of data. That is, waveforms will be produced on demand and in a manner which they are independent. Below will show how this pattern is realized.
3.2.2 Estimating dependencies
Before starting coding up an implementation it is prudent to understand what software dependencies are required to develop it. Some trade-offs need to be understood. Relying on “external” 3rd party packages to perform the “heavy lifting” of the implementation can make that implementation easier to develop. However, adding dependencies increase the difficulty in installing and using the result on a wide variety of operating systems and system architectures.
Investigating dependencies before implementation also forces the developer to make an optimal decision that balances performance, support, documentation, ease of development, portability and a host of other qualifiers. Relying on whatever software happens to be familiar and available forgoes making this optimal choice.
One touchstone that has been made by the WCT developers is that the “core” packages of WCT shall not depend on ROOT. Depending on ROOT brings in many additional dependencies and this can limit, for example, usage on high-core architectures with limited RAM. ROOT is very useful, and indeed parts of WCT depend on it (test, I/O packages) but for a package to be considered “core” its library must not require ROOT. If a component requires ROOT or other package not accepted by the “core” packages then the component must be placed in an optional package (see Section 3.2.3).
For the noise source, the main functionality required is drawing random variables from an arbitrary distribution (the noise spectrum) and drawing from some conventional distributions (for the fluctuation). The spectrum can be provided through WCT configuration services and drawing from it is a simple algorithm based on forming the cumulative distribution that can be directly integrated. Pseudo random number generation can be done directly in C++ or when this ticket is closed the WCT pRNG interface.
3.2.3 Selecting a package
While the implementer is free to develop their component in any manner they wish, if the component is to be distributed as part of the WCT then its code needs to be in some WCT package. For here, the main thing to note is that the util
package is the lowest in the dependency tree, on top of it is iface
(more on interfaces below) and above that are all the implementation packages. The developer must choose an existing package or elect to make a new one depending on two main criteria: what dependencies are required and what implementation category does the component satisfy.
For the noise waveform source, given the relative lack of dependencies and the fact that it provides a simulation, the wire-cell-gen package provides a suitable home. If no suitable package exists the developer can see Section 5 for details on WCT packages.
3.2.4 Selecting interfaces
All major functionality, and indeed the defining characteristic of a WCT component is that it implements one or more WCT interfaces which are C++ abstract base classes. Each interface defines some number of methods that the implementation must provide.
More information is in the Section 4.3 but for here, what is needed is that there are several categories of interfaces. First, any set of related methods can be grouped into an otherwise anonymous interface. One special interface is IConfigurable
which is used if the implementation wishes to receive user-provided configuration information. Another category contains the interfaces inheriting from INode
. An implementation inherits from one of these if it will participate in the 6 paradigm as implemented by WCT. Or, more generally, if the component is expected to share or pass “event” data (but really any data) with other component.
The noise waveform source requires configuration and will be an IFrameSource
as it will produce frames of “traces” (waveform segments). As development progresses, it may come to light that portions of the implementation are general and are factored into separate classes which themselves may be accessed via an Interface.
3.2.5 Component header
Developing the noise source component begins with a header file which ties together the interfaces through inheritance, declares the interface methods to be implemented and any private methods and data the implementation may need. Following the layout conventions this header is placed in gen/inc/WireCellGen/NoiseSource.h. Some features to note:
- use
#ifndef/#define/#endif
include protection - place inside
WireCell
namespace and a subnamespace that names the implementation (Gen
) in this case - add methods for each interface. Due to the templating used in the node interfaces it’s not always obvious which methods must be implemented unless one follows their inheritence tree. However, most high level interfaces based on
INode
should provide a comment in their header file giving what to implement.
3.2.6 Component implementation
The implementation of a component, following the layout conventions this header, is placed in gen/src/NoiseSource.cxx.
3.2.6.1 Component boilerplate
A few lines of boilerplate are needed so that the component can be dynamically resolved by WCT. Toward the top of the file, and in particular before any using namespace
statements the following is needed.
#include "WireCellUtil/NamedFactory.h" WIRECELL_FACTORY(NoiseSource, WireCell::Gen::NoiseSource, WireCell::IFrameSource, WireCell::IConfigurable);
This is a CPP macro with the following arguments:
- The “type name” of the component (without quotes). This is usually the C++ class name with any namespaces removed but it may differ. It should be unique across all components
- The C++ type of the component
- The remaining arguments are variable in length and enumerate all interfaces through which this component may be accessed.
3.2.6.2 Configuration implementation
A WCT configurable component must provide two methods. The first returns a default configuration object via the default_configuration()
method. This object should represent as much of a working configuration as is possible to specify using hard-coded or otherwise default knowledge. If some parameter can not meaningfully be given a default value it should nonetheless be included with some default, possibly bogus value, eg null
, 0
or empty string or list. This can then be dumped via the wire-cell
command line program as JSON to give the user guidance on how to provide correct input.
The second method configure()
accepts a configuration object from WCT and applies it to any internal state. The component should expect this configuration object to follow a data schema determined by the component itself. The developer of the component should document this schema so that users know what to provide. When accessing the configuration object the code should, where possible, allow for missing parameters by substituting defaults. The code should also be written to allow and ignore any unknown parts of the data structure to the extent that the intended data schema is not otherwise violated.
The component may also provide a constructor or other method which takes configuration in any form. This can be useful to facilitate developing unit tests for the component to allow configuration to be directly set. A common pattern is to let configuration information “flow” starting from the constructor, into private data members of the component, and then out through default_configuration()
and finally used to provide default values when accessing the user configuration object inside configure()
. This is illustrated in this NoiseSource
example.
A component must have a constructor that takes no arguments. If a constructor which takes arguments is added its arguments must either all have default values or a second argument-free constructor must be also included.
4 Internals
This doc describes the Wire Cell Toolkit (WCT) internal structure and support facilities. It is intended for developers to read carefully, understand and follow. It may be of interest to users as well. It does not cover the “batteries included” or “reference implementations” such as the simulation, signal processing, imaging, etc which are described in section 5.
4.1 Toolkit packages
The WCT is composed of a number of packages. Each package has an associated with a Git source repository. Most packages produce a shared library, which may also be a WCT plugin library, C++ header files, some number of main or test applications. Others include a single package holding all Python code in various modules, a package providing support for developing WCT configuration files and the documentation package holding this document. One special type of package is a build package described more in section on the build package.
4.1.1 Names
Package repositories are named like wire-cell-<name>
where <name>
is some short identifier giving indication of the main scope of the package. In the documentation the wire-cell-
prefix is often dropped and only this short name is used.
If a package produces a shared library it should be named in CamelCase
with a prefix WireCell
. For example the gen
package produces a library libWireCellGen.so
. As a plugin name or an entry in the build system, the lib
and .so
are dropped. If the package has public header files to expose to other packages they should use this same name for a subdirectory in which to hold them. Package layout is described move below.
4.1.2 Dependencies
Some of the C++ packages are designated as core packages. These include the packages providing the toolkit C++ structure (described later in this document) as well as the reference implementations (eg, gen
, sigproc
). These packages have strict requirements on what dependencies may be introduced and in particular their shared libraries are not allowed to depend on ROOT (although their apps and tests are, see sections 4.1.3 and 4.1.4).
The base package is util
and it must not depend on any other WCT package. The next most basic is iface
and it must not depend on any other WCT except util
. Core implementation packages such as gen
or sigproc
may depend on both but should not depend on each other.
Fixme: there is a need to factor some general utility routines and data structures that depend on iface
and which the implementation packages should use that needs to be created.
WCT also provides a number of peripheral implementation packages, which are free to have more dependencies, including ROOT, than “core” packages. These are mostly for the purpose of providing WCT components which provide file I/O. The sst
package in particular support the so called celltree
ROOT TTree
format used by the Wire Cell prototype code.
Finally, there may be third-party implementation packages. They are free to mimic WCT packages but WCT itself will not depend on them. They should not make use of the WireCell::
C++ namespace.
4.1.3 Package structure
The WCT package layout and file extensions must follow some conventions in order to greatly simplify the build system. In the description below, WireCellName
is as described above.
src/*.cxx
- C++ source file for libraries with .cxx extensions or private headers
inc/WireCellName/*.h
- public/API C++ header files with .h extensions
test/test_*.cxx
- main C++ programs named like test*.cxx, may also hold Python, shell scripts, private headers
apps/*.cxx
- main application(s), one appname.cxx file for each app named
appname
, should be very limited in number
In the root of each C++ package directory must exist a file called wscript_build
. It typically consist of a single line with a method call like:
bld.smplpkg('WireCellName', use='...')
The bld
object is automagically available. If the package has no dependencies then only the name is given. Most packages will need to specify some dependencies via use
or may specify a different list of dependencies just for any applications (using app_use
) or for the test programs (via test_use
). Dependencies are transitive so one must only list those on which the package directly depends.
Fixme: make a script that generates a dot file and show the graph.
4.1.4 Build package
To actually build WCT see the section on toolkit installation (section 1). The build system is based on Waf and uses the wcb
command and a wscript
file provided by the top level build package. More details on the build system are given in section Waf tools.
Besides holding the main build instructions this package aggregates all the other packages via Git’s “submodule” feature. In principle, there may be more than one build package maintained. This allows developers working on a subset to avoid having to build unwanted code. In practice there is a single build package which is at: https://github.com/wirecell/wire-cell-build.
4.1.5 Adding a new code package
To add a new code package to a build package from scratch, select a <name>
following guidance above and do something like:
$ mkdir <name> $ cd <name>/ $ echo "bld.smplpkg('WireCell<Name>', use='WireCellUtil WireCellIface')" > wscript_build $ git init $ git add wscript_build $ git commit -a -m "Start code package <name>"
Replace <name>
with your package name. You can create and commit actual code at this time as well following the layout in 4.1.3.
Now, make a new repository by going to the WireCell GitHub and clicking “New repository” button. Give it a name like wire-cell-<name>
. Copy-and-paste the two command it tells you to use:
$ git remote add origin git@github.com:WireCell/wire-cell-<name>.git $ git push -u origin master
If you made your initial package directory inside the build package move it aside. Then, from the build package directory, add this new repository as a Git submodule:
$ cd wire-cell-build/ # or whatever you named it $ git submodule add -- git@github.com:WireCell/wire-cell-<name>.git <name> $ git submodule update $ git commit -a -m "Added <name> to top-level build package." $ git push
In order to be picked up by the build the new package short name must be added to the wscript
file.
4.2 Coding conventions
4.2.1 C++ code formatting
- Base indentation should be four spaces.
- Tabs should not be used.
- Opening braces should not be on a line onto themselves, closing braces should be.
- Class names should follow
CamelCase
, method and function names should followsnake_case
, class data attributes should be prefixed withm_
(signifying “member”). - Doxygen triple-slash
///
or double-star/** */
comments must be used for in-source reference documentation. - Normal comments may be used for implementation documentation.
- Interface classes and their types and methods must each have a documenting Doxygen comment.
- Header files must have
#ifndef/#define/#endif
protection. - The C++
using namespace
keyword must not be used at top file scope in a header. - Unused headers should not be retained.
- Any =#include# need in an implementation file but not the corresponding header file should not be in the header file.
4.2.2 C++ namespaces
- All C++ code part of WCT proper and which may be accessed by other packages (eg, exported via “public headers”) must be under the
WireCell::
namespace. - WCT core code (
util
andiface
packages) may exist directly underWireCell::
but bare functions must be in a sub namespace. - Non-core, WCT implementation code (eg contents of
gen
package) must use secondary namespace (egWireCell::Gen::
). - Any third-party packages providing WCT-based components or otherwise depending on WCT should not use the
WireCell::
namespace.
4.3 Interfaces
A central design aspect of the WCT is that all “important” functionality which may have more than one implementation must be accessed via an pure abstract interface class. All such interface classes are held in the iface package. Interface classes should present a very limited number of purely abstract methods that express a single, cohesive concept. Implementations typically inherent from more than one interface. If two concepts are close but not cohesive they are best put into two interface classes. Besides defining the method interface, Interface classes may define types. They may also be templated.
After an implementation of an interface is instantiated and leaves local scope it should be referenced only through one of its interfaces. It should be held through an appropriately typed std::shared_ptr<>
of which one should be defined as ITheInterface::pointer
.
Interfaces are used not only to access functionality but the data model for major working data is defined in terms of interfaces inheriting from WireCell::IData
. Once an instance is created it is immutable.
Another category of interfaces are those which express the “node” concept. They inherit from WireCell::INode
. These require implementation of an operator()
method. Nodes make up the main unit of code. They are somewhat equivalent to Algorithm
concept from the Gaudi framework where the operator()
method is equivalent to Gaudi’s execute()
method. They also require some additional instrumenting in order to participate in the data flow programming paradigm described below.
4.4 Components
Components are implementations an interface which itself inherits from the WireCell::IComponponent
interface class (this interface class is in util
as a special case due to dependency issues. fixme: needs to be solved with a general package depending on both iface and =util
). This inheritance follows CRTP.
Components also must have some tooling added in their implementation file. This is in the form of a single CPP macro which generates a function used to load a factory that can create and retain instances based on a type name and an instance name. For WireCell::Gen::TrackDepos
the tooling looks like:
#include "WireCellUtil/NamedFactory.h" WIRECELL_FACTORY(TrackDepos, WireCell::Gen::TrackDepos, WireCell::IDepoSource, WireCell::IConfigurable);
Note, this macro needs to appear before any using namespace
directives. The arguments to the macro are:
- The “type name” which is typically the class name absent any namespace prefixes. It must be unique across the entire WCT application.
- The full class name.
- A list of all interfaces that it implements.
A component may be retrieved as an interface using the named factory pattern implemented in WCT. If the component has yet to be instantiated it will be through this lookup. This is performed with code like:
#include "WireCellUtil/NamedFactory.h" auto a = Factory::lookup<IConfigurable>("TrackeDepos"); // or auto b = Factory::lookup<IConfigurable>("TrackeDepos","some instance name"); // or auto c = Factory::lookup_tn<IConfigurable>("TrackeDepos:"); // or auto d = Factory::lookup_tn<IConfigurable>("TrackeDepos:some instance name");
The four example differ in if an instance name is known and if it is known separately from the type name or in the canonical join (eg as type:name
). The returned value in this example is a std::shared_ptr<const IConfigurable>
. This example accesses the IConfigurable
interface of TrackDepos
.
Not typically required by most code but there exists also a function lookup_factory()
to get the factory that constructs the component instance.
4.5 Configuration
One somewhat special component interface is IConfigurable
. A class inheriting from this interface is considered a configurable component such as TrackDepos
in the above example. It is required for any main application using the WCT toolkit to adhere to the Wire Cell Toolkit Configuration Protocol. This is a contract by which the main application promises to do the following:
- Load in user-provided configuration information (see the configuration section of hte manual)
- Instantiate all configurables referenced in that configuration.
- Request the default configuration object from each instance.
- Update that object with, potentially partial, information provided by the user.
- Give the instance the updated configuration object.
- Do this before entering any execution phase of the application.
If the main application uses WireCell::Toolkit
then the protocol can be enacted with code similar to
using namespace WireCell; ConfigManager cfgmgr(); // ... load up cfgmgr for (auto c : cfgmgr.all()) { string type = get<string>(c, "type"); string name = get<string>(c, "name"); auto cfgobj = Factory::lookup<IConfigurable>(type, name); // throws Configuration cfg = cfgobj->default_configuration(); cfg = update(cfg, c["data"]); cfgobj->configure(cfg); }
FIXME: shouldn’t we put this all inside ConfigManager
?
Developers of new configurables should keep this protocol in mind and should refer to existing configurables for various useful patterns to provide their end of the exchange.
4.6 Execution Models
4.6.4 Data flow programming execution
Using abstract DFP. A whole section on 6 is also available.
5 Packages
5.1 wire-cell-python
Wire Cell Toolkit provides some support for Python and requires it for some preprocessing, plotting and validation. This support is held in the wire-cell-python package
5.1.1 Installing wire-cell-python
It’s just a “normal” Python package. Use your favorite method, such as
$ virtualenv --system-site-packages venv $ source venv/bin/activate $ git clone https://github.com/WireCell/wire-cell-python.git $ cd wire-cell-python/ $ python setup.py develop $ wirecell-sigproc --help
5.1.2 Python command line programs
A number of python command line programs are provided. They are typically named like:
wirecell-<name>
Where <name>
is one a package short name (eg, util
, sigproc
, gen
).
All use the same command line interface (CLI) module (Click) so have similar usage. In particular, run the program without any arguments to get a help screen.
5.3 wire-cell-iface
Brief overview but it’s also in ./internals.html so don’ t over do it.
5.3.1 Data
tbd
5.3.2 Nodes
tbd
5.3.3 Misc
tbd
5.4 wire-cell-gen
The wire-cell-gen
package provides components for the generation of data. It primarily includes components which perform the grand convolution of drifted electron distribution, field and electronics response and associate statistical fluctuations (aka, the “drift simulation”).
5.4.1 Depositions
Depositions (IDepo
data objects, aka depo) are provided by IDepoSource
components. A single depo is provided on each call to the source and are expected to be produce in strict time order. In general a depo represents a 2D distribution (ie, Gaussian) of drifting electrons spread in longitudinal and transverse directions. A depo may be confined to a single point where the two extents of the distribution are zero.
The IDepoSource
components adapt to external sources of information about initial activity in the detector. These sources may provide \(dE\) and \(dX\) in which case two models can be applied to produce associated number of ionized electrons. The external source may provide only \(dE\) in which case the number of ionized electrons will be calculated for the deposition on the assumption that the particle is a MIP. Finally, the ionization process may be handled by the external source and the number of electrons may be given directly.
5.4.2 Drifting
The IDrifter
components are responsible for transforming a depo at one location and time into another depo at a different location and time while suitably adjusting the number of ionization electrons and their 2D extents. Each call of the component accepts a single depo and returns zero or more output depos. Input depos are assumed to be strictly time ordered and each batch of output depos likewise. In general a drifter must cache depos for some length of time in order to assure it has seen all possible depos to satisfy causality for the output.
5.4.3 Response
The field and electronics response of the detector is calculated in an IDuctor
component. This is typically done by accepting depos at some input plane or response plane. Up to this plane, any drifting depo is assumed to induce a negligible detector response. For drifting beyond this plane some position dependent response is applied (ie, a field response calculated by 2D Garfield or 3D LARF). Each call to an IDuctor
components accepts one depo and produces zero or more frames (IFrame
data object). In general an IDuctor
component must cache depos long enough to assure the produced frames satisfy causality. Output frames may be sparse in that not all channels may have traces (ITrace
data objects) and in any given channel the traces may not cover the same span of time. The unit for the waveforms in the frame depend on the detector response applied. If field response alone is applied then the waveform is in units of sampled current (fixme, check code, it may be integrated over tick and thus charge.) If both field and electronics response is applied the waveform is in units of voltage.
5.4.4 Digitizing
An IDigitizer
component applies a transformation to the waveform, typically but not necessarily in order to truncate it to ADC. These components are functional in that each call takes and produces one frame. Even if truncating to ADC the frame is still expressed as floating point values.
5.4.5 Noise
t.b.d.
5.4.6 Frame Summing
Right now, frames can be summed by a bare function FrameUtil::sum()
. This is better put into a component.
5.4.7 Execution Graphs
The gen
package provides high-level IApplication
components. Primarily, these provide aggregation of the various components described above (and others). The aggregation is conceptually of the form of a call graph which connects outputs of components to inputs of others. This aggregation is hard-coded so that the application determines the connectivity of the resulting execution graph.
- Fourdee
- provides a simple linear drift simulation chain from depos to digitized frame and with noise included. Only a single detector response is allowed and a single frame with signal and noise summed together is produced. (Aside: the “dee” stands for the various components with names starting with “D”.)
- Multidee
- provides a drift simulation that takes depositions through multiple paths and which produce multiple semantically different frames for the same set of depositions. The overall execution graph is shown in the figure below. The
DepoFanout
represents sending the same depo to multipleDuctor
components. TheMultiDuctor
will apply particular detector response depending on where the depo is (measured in wire-space). This can be used to emulate MicroBooNE’s shorted wires. A second pointer to a depo goes into the “truth”Ductor
which has a set of field response functions that produce some kind of “true” signal waveforms from the input depositions (eg, a simple Gaussian smearing instead of bipolar/unipolar field response). Finally, each frame is sunk for output by theMultiFrameSink
which is just anIFrameSink
that collates based on frame identifier numbers.
5.4.7.1 Hard-coded vs Configurable
WCT supports construction of general execution graphs through user-provided configuration.
t.b.d. this code needs reworking and retesting.
5.5 wire-cell-waftools
The WCT build system is based on Waf. The parts of the build system include:
- the
wcb
command provided bywire-cell-build
which is a bundled version of Waf’swaf
command. - a number of Waf tools provide instructions for finding the required and optional software dependencies
- the main
wscript
and per-packagewscript_build
files provide the high-level instructions for building WCT (ie, they are like old fashionedMakefile
files).
5.5.1 Recreating wcb
The wcb
command bundles some optional Waf tools which are not included in the default version of the waf
command. In case new versions of Waf or new tools are needed it can be recreated like this:
$ git clone https://github.com/waf-project/waf.git $ cd waf/ $ ./waf-light --tools=compat15,doxygen,boost,bjam $ cp waf /path/to/wire-cell-build/wcb $ cd /path/to/wire-cell-build $ git commit [...]
5.5.2 Included Waf tools
A number of Waf tools are provided in the waftools submodule. This provides a Python module for each software package which is a required or optional dependency and which is not already covered by Waf itself. New dependencies can be added by using existing modules as examples. It is the smplpkgs.py
module which handles the building of the WCT packages themselves. The wcb.py
module is used as a simple aggregate of all the other modules. It is this that is loaded by the main wscript
.
The scripts to make and test releases are also housed in this package.
7 Other Topics
7.1 Garfield 2D Support
Garfield 2D is used to provide field response functions for WCT signal processing and simulation. This section collects some documentation of this.
7.1.1 Garfield 2D data
Garfield 2D is used to calculate various things:
- the electrostatic or drift field near the wire planes
- electron drift paths through this field starting from an array of points at a (nominal) fixed drift distance and terminating on an electrode
- the Shockley-Ramo weighting field for one central wire of each wire plane
- the instantaneous current in each central wire separately for each drift path and regularly sampled over time.
The Garfield 2D user is free to pick the array drift path starting points however, a choice was made for initial field calculations and processing of Garfield 2D output assumes it holds. In particular:
- The drift paths all start with a fixed “X” coordinate value
- There is one drift path exactly aligned with each wire in the transverse direction.
- There is one drift path on one side of each wire starting exactly at the transverse midpoint with its neighbor
- There are four more equal spaced drift paths between these two.
These paths are said to start at impact positions. These impact positions extend across the entire transverse domain, bounded by one half wire pitch below the lowest wire and one half pitch above the highest wire. The impact positions where Garfield 2D paths start represent half of the total. The remaining half are defined through symmetry.
In the initial ub_10
data set with 21 wires \(\times\) 6 impact positions \(\times\) 3 planes the counting of impact position numbers and wire numbers are in opposite transverse directions. This is corrected for in the preprocessing. If future Garfield 2D runs attempt to “fix” this it will break the preprocessing.
7.1.2 Preprocessing of Garfield 2D data
WCT provides a Python module to assist in generating a WCT field response file in compressed JSON format. It also is responsible for filling in missing drift paths, correcting any vagaries, normalizing units.
The output field response functions must be given in the form of a sampled, electric current waveform due to the passage of a single electron along the drift path and the current must be expressed in the WCT system of units for electric current. The response function is not in units of electric charge.
The main entry point into the Garfield 2D support is:
import wirecell.sigproc.garfield as garfield
The next section gives some detailed examples of use and subsequent ones give some plots.
7.1.3 Eyeballing Garfield 2D with wirecell.sigproc.garfield
This is from the ub_10
data set.
Taking raw Garfield output shows, apparently, two electrons were drifted per path. It’s unclear why it’s slightly more than 2.0 electrons of charge though.
import wirecell.sigproc.garfield as garfield dat = garfield.load("/opt/bviren/wct-dev/share/wirecell/data/ub_10.tar.gz") w = [r for r in dat if r.impact == 0 and r.region == 0 and r.plane == 'w'][0] sum(w.response) # -> -0.020265450377670812 w.times[1] / units.us # -> 0.1 sum(w.response) * w.times[1] / units.coulomb # -> -3.2468828093569448e-19 sum(w.response) * w.times[1] / units.eplus # -> -2.0265450377670811 w0 = [r for r in dat if r.region == 0 and r.plane == 'w'] len(w0) # -> 6 w0[0].region # -> 0 sum(w0[0].response)/units.microampere # -> -3.2577267395296085e-06 [sum(w.response)*w.times[1]/units.eplus for w in w0] # -> [-2.033313287245619, # -2.0184390655262097, # -2.061704680164814, # -2.057934020579546, # -2.0501410482117408, # -2.0265450377670811]
Modify garfield.load()
to normalize all responses so that this last array averages to 1.0. After that change:
[sum(r.response)*r.times[1]/units.eplus for r in dat2 if r.region == 0 and r.plane == 'w'] # -> [0.99606489937379161, # 0.98877842254157267, # 1.0099730708830847, # 1.0081259272658833, # 1.0043083619718129, # 0.99274931796385024] sum([sum(r.response)*r.times[1]/units.eplus for r in dat2 if r.region == 0 and r.plane == 'w'])/6.0 # -> 0.99999999999999944
Same region 0 average in units of eplus
for U is -6.9e-3 and for V is -5.5e-3.
7.1.4 Validation plots
Some critical validation plots can be made from the command line using the wirecell-sigproc
program. In the examples below it is assumed, for brevity that the Garfield 2D data set is available via the $G2D
environment variable set for example like:
$ export G2D=/opt/bviren/wct-dev/share/wirecell/data/ub_10.tar.gz
7.1.4.1 Perpendicular ideal track
A perpendicular ideal track can be “simulated” by summing response functions. This is because the response on neighboring wires due to a charge drifting near the central wire is equivalent to the response on the central wire due to charge drifting near neighboring wires.
There are three main data tiers:
- instantaneous induced current
- sampled voltage after preamplifier gain and shaping done in the FEE
- digitized ADC waveform of that voltage
This command can be used to make plots of these with different parameterizations:
$ wirecell-sigproc plot-garfield-track-response --help Usage: wirecell-sigproc plot-garfield-track-response [OPTIONS] GARFIELD_FILESET PDFFILE Plot Garfield response assuming a perpendicular track. Options: -o, --output TEXT Set output data file -g, --gain FLOAT Set gain in mV/fC. -s, --shaping FLOAT Set shaping time in us. -t, --tick FLOAT Set tick time in us (0.1 is good for no shaping). -n, --norm INTEGER Set normalization in units of electron charge. -a, --adc-gain FLOAT Set ADC gain (unitless). --adc-voltage FLOAT Set ADC voltage range in Volt. --adc-resolution INTEGER Set ADC resolution in bits. --help Show this message and exit.
If the shaping is zero, then the induced current is plotted. If it is nonzero but the ADC gain is zero then pre-ADC voltages are plotted. If both are nonzero (default) then the ADC waveforms are plotted
7.1.4.2 Induced current
Garfield 2D provides instantaneous, sampled induced current waveforms. Their per-plane sum, as described above equivalent to a perpendicular track, can be plotted with zero shaping time:
$ wirecell-sigproc plot-garfield-track-response -s 0.0 $G2D figs/track-response-current.svg
The default normalization is such that there are 16000 electrons per pitch (MIP) and no diffusion. Just doing the unit conversions, this many electrons arriving over 2-3 us should give a current of about a nanoamp.
7.1.4.3 Amplified Voltage
The results of convolution with electronics response can be plotted with a zero per-ADC gain:
$ wirecell-sigproc plot-garfield-track-response -a 0.0 $G2D figs/track-response-voltage.svg
The default preamplifier gain is 14 mV/fC. This means that a delta-function current integrating to 1 fC would produce a smooth voltage curve with a peak of 14 mV. The default 16000 electrons per pitch, if producing a delta-function of current, would produce 36 mV. Because of broadening and finite width as shown in the previous plot, this peak should be reduced and smeared.
7.1.5 Producing WCT Field Response Data File
The WCT does not directly read Garfield 2D data sets but instead requires the information to be compiled into a single, compressed JSON file. This is done like:
$ wirecell-sigproc convert-garfield $G2D garfield-1d-3planes-21wires-6impacts-v5.json.bz2
FIXME: distribution of these data files needs some formal mechanism. For now, they may be available here: http://www.phy.bnl.gov/~bviren/tmp/wctsim/wct-dev/share/wirecell/data/.
7.2 Running inside of art/LArSoft
Wire Cell Toolkit can be run “stand alone” via the wire-cell
command line program. However, WCT is designed to run as part of some greater application. On primary example is to run as a component in a program constructed as art modules as provided by LArSoft (LS). This section describes the so called Wire Cell Toolkit / LArSoft integration (WC/LS).
7.2.1 Overview
As of WCT 0.6.1 and LArSoft 6.48.0 there is support for running WCT inside of art in an extensible and maintainable manner. The software that provides this integration is maintained in the LArSoft package larwirecell maintained in FNAL’s Redmine (github mirror).
The sections below cover the integration software design, special issues relating to configuration inside of art, how to prepare an art runtime environment which includes WCT and how to develop WCT in the context of art as the driving application.
7.2.2 Design
The design of the integration software is summarized in the following UML diagram.
Figure 5: Wire Cell Toolkit / LArSoft integration design UML diagram.
The classes in blue are in larwirecell
. The dark blue module and the WCLS
tool classes are implementations of art concepts and remaining follow WCT. A WireCellToolkit
art module is provided in larwirecell
. Instances of this class may be used directly or it can serve as a reference example. Regardless of the module, the WCT processing is delegated to the WCLS
tool. It is effectively a copy of the wire-cell
program but written to be called as an art tool instead of from a command line. In the section 2 below it’s shows that it shares most of the same options as wire-cell
.
The WCLS
tool is responsible for WCT-side configuration and execution and special LS-side execution. This latter entails calling the implementations to a LS-specific interface class IArtEventVisitor
. These objects likely also implement some other common WCT interface. Two examples of the roles these objects have are:
- data converter
- translating data products from LS to WCT, or vice versa.
- service facade
- providing WCT access to some LS service.
Bracketed by these “two (inter)faced” converter objects are the usual “core” components of the WCT job. Shown in the figure are two frame filters implementing noise filtering and signal process, respectively. Not shown is the WCT “application” object which is responsible for aggregating the top level components and marshalling data through them (in this example this would be the class sigproc::Omnibus=
).
7.2.3 Configuration
The bulk of configuring WCT to run inside of LS is identical to what it would be to run from the wire-cell
command line. The only difference pertains to:
- configuring art and LS components
- configuring WC/LS converter components
It is recommended, and indeed easy, to structure the WCT configuration so that the parts that depend on the WC/LS layer are factored out from those that depend on LS-side WCT components.
In addition there is the need to provide a family of job-independent WCT configuration “data” files. These include, for example, the special field-response functions WCT uses. They are provided by the wire-cell-data
package.
The FHiCL fragment to configure the WireCellToolkit
art module and its WCLS
tool for noise filtering and signal processing would look something like:
physics :{ producers: { nfsp : { module_type : WireCellToolkit wcls_main: { tool_type: WCLS # (1) apps: ["Omnibus"] # (2) plugins: ["WireCellGen", "WireCellSigProc", # (3) "WireCellSio", "WireCellLarsoft"] configs: ["uboone-nf-sp.jsonnet"] # (4) inputers: ["wclsRawFrameSource", # (5) "wclsChannelNoiseDB"] outputers: ["wclsCookedFrameSink"] # (6) params: { # (7) detector: "uboone" } } } } # ... }
Notes which refer to the parenthetical numbers:
- Declare that the configuration applies to an art class tool of type
WCLS
. - The
apps
list defines WCT application objects which are responsible for performing top-level execution. They are somewhat conceptually equivalent to art modules. - The
plugins
list defines WCT plugin libraries in which WCT may find definitions of component classes. A plugin name matches its library name with the leadinglib
and trailing extension remove. TheWireCellLarsoft
plugin contains WC/LS integration components from thelarwirecell
package. - The
configs
list gives an ordered list of all top-level WCT configuration files. As shown, these are in Jsonnet data tempting but may also be in JSON. They, like all WCT configuration files (including job-independent configuration “data” files) are found via theWIRECELL_PATH
environment variable. More information on WCT configuration is in 2. - The
inputers
listIArtEventVisitor
components that should be called before the WCT application objects are executed. Here, the component which converts from LS’sraw::RawDigit
collections to WCT’sIFrame
instances is included. The second a channel noise DB object which inherits from WCT’sOmniNoiseChannelDB
and augments the fully but statically configured information with some portion that is taken dynamically from LS services. - The
outputers
list is interpreted identically toinputers
but its components are executed after the WCT app objects. - The
params
dictionary may define WCT configuration parameters that are referenced by the Jsonnet files that are loaded. This allows the bulk of the configuration to be made more generic by pushing out meta-parameters to the end user.
WCT components are named in the apps
, inputers
and outputers
parameter lists. As shown, just their WCT component “type” names are given (these are not necessarily their C++ class names, but are usually similar). Like all references to components in WCT configuration, these may be specialized by giving an optional “instance” name. This allows for multiple instances of the same component class which may then be configured uniquely. Again, more details on WCT configuration are in 2.
7.2.4 Runtime
Running WCT inside of art entails running art which means setting up a runtime environment in the “Fermilab way”. This requires obtaining binary packages (“UPS products”) from Fermilab for your host OS. If supported, this can be done in a number of ways.
7.2.4.1 CVMFS mount
CVMFS is a way to deliver files (usually ready-to-run binary software builds) to a client via HTTP. If not already available to you (check for the existence of /cvmfs) the system administrator of the host will need to provide it. One starting point is here.
If CVMFS is available to you, start by setting:
$ export PRODUCTS=/cvmfs/fermilab.opensciencegrid.org/products/larsoft
7.2.4.2 Local binaries
It is possible to semi-automatically provide binary software builds
via the pullProducts
downloader script. This can (and should) be
exercised without root permissions. It requires dedicated disk space
of 10-100 GB. More OSes are supported with these binaries than are
available in CVMFS (in particular Ubuntu)
To prepare a base installation
find a desired larsoft version from from this scisoft directory and navigate to the download guide eg the one for LS 6.48.00 and download the pullProducts
script and run it according to the guide. One example:
$ mkdir -p ~/dev/pp/products $ cd ~/dev/pp $ wget http://scisoft.fnal.gov/scisoft/bundles/tools/pullProducts $ chmod +x pullProducts $ ./pullProducts `pwd`/products u16 larsoft-v06_48_00 s50-e14 prof $ rm *.tar.bz2
That last rm
command is optional but cleans up some unneeded tarballs.
$ export PRODUCTS=$HOME/dev/pp/products
7.2.4.3 General
After providing a base software installation in one of the methods
above and setting PRODUCTS
continue as:
$ source $PRODUCTS/setup $ setup larsoft v06_48_00 -q e14:prof
Note, in general PRODUCTS
can be a “:”-separated list. If for some reason your must have more than one element in PRODUCTS at this point be sure to use proper directory to locate the setup
script.
A simple test to see that the art
and wire-cell
programs are now available:
$ art --version art 2.07.03 $ wire-cell --help Options: -h [ --help ] wire-cell [options] [arguments] -a [ --app ] arg application component to invoke -c [ --config ] arg provide a configuration file -p [ --plugin ] arg specify a plugin as name[:lib] -V [ --ext-str ] arg specify a Jsonnet external variable=value -P [ --path ] arg add to JSON/Jsonnet search path
With this, the user is ready to run art, LArSoft and Wire-Cell.
7.2.5 Development
The challenge to do development on WC/LS integration code is that the larwirecell
expects WCT to be provided as a released and built UPS product. Development of course requires constant rebuilding and adding releases and full UPS product building is prohibitive. The solution is to cheat and produce what looks like an installed wirecell
UPS product area into which the development WCT code is directly built.
7.2.5.1 Prepare development UPS products area
If the $PRODUCTS
area defined above is writable, it can be used. Otherwise, make a new one:
$ mkdir ~/dev/myproducts $ cp -a $PRODUCTS/.upsfiles ~/dev/myproducts
Declare a fictional version (here v0_7_dev
) for what will become the dev wirecell
UPS product area:
$ ups declare wirecell v0_7_dev -f Linux64bit+4.4-2.23 -q e14:prof -r wirecell/v0_7_dev -z ~/dev/myproducts -U ups -m wirecell.table $ mkdir -p ~/dev/myproducts/wirecell/v0_7_dev/ups $ cp $PRODUCTS/wirecell/v0_6_1/ups/wirecell.table ~/dev/myproducts/wirecell/v0_7_dev/ups/
If the versions of dependencies listed in the wirecell.table
file require updating, this is the time to change them and make sure they have UPS corresponding UPS products available.
Now, let UPS know about this new products area for future ups
incantations by putting it first in $PRODUCTS
and “setup” this new version. There’s nothing there yet, that’s okay.
$ PRODUCTS=~/dev/myproducts:$PRODUCTS $ unsetup wirecell $ setup wirecell v0_7_dev -q e14:prof $ echo $WIRECELL_VERSION v0_7_dev
There will not yet be a code actually installed into the just declared wirecell
UPS product area. As a consequence, the above setup
command will not actually set some important environment variables (in particular LD_LIBRARY_PATH
). This is rectified below.
7.2.5.2 WCT Source
Independent of the above, and as per usual, get the WCT source. You may wish to get the source in one of two ways:
- Development on the
master
branch, maybe in anticipation of a future release
$ git clone --recursive git@github.com:WireCell/wire-cell-build.git wct-master
- Bug fixing an existing release branch called
<BRANCH>
(named after major.minor versions likeA.B.x
, eg0.6.x
)
$ git clone --recursive --branch <BRANCH> \ https://github.com/WireCell/wire-cell-build.git wct-<BRANCH>
The exact name for the source directory is up to you. Below, wct-src
is used as a placeholder.
7.2.5.3 Configuring WCT source
The WCT source is configured in the same manner as it is for any environment. It must be told where to find the installed dependencies. When building against UPS products, its environment variables may be used to locate the provided dependencies. As this is tedious a script is provided to assist this configuration. It assumes that the calling environment is already properly “setup”.
This script configures the source to install WCT into the wirecell
UPS product area. This will overwrite any existing files. Be sure that the environment is properly “setup” and in particular $WIRECELL_FQ_DIR
is set as intended.
$ cd wct-src/ $ ./waftools/wct-configure-for-ups.sh
7.2.5.4 Building and running WCT
As usual, build and install with the provided wcb
.
$ ./wcb build install
If the source was configured to install into the UPS product area then everything is ready to run. If it was installed locally then the usual PATH
like variables need to be set to point into that installation location.
7.2.5.5 Fix UPS environment hysteresis
As warned above, the UPS setup
command fails to set important variables after a UPS product is “declared” but before any code is installed. After WCT is installed as above this UPS inadequacy can be rectified by “turning it off and on again”:
$ unsetup wirecell $ setup wirecell v0_7_dev -q e14:prof
7.2.5.6 Preparing mrb
development area
To co-develop WCT (in the form of wirecell
UPS product) and some mrb
developed package, most likely, larwirecell
one sets up more or less as usual.
$ export MRB_PROJECT=larsoft $ setup mrb $ mkdir ~/dev/ls-6.48.00 $ cd ~/dev/ls-6.48.00 $ mrb newDev $ source localProducts_larsoft_v06_48_00_e14_prof/setup # if cloning via SSH $ kinit bv@FNAL.GOV $ mrb g -b feature/<identifier>_<my_feature> larwirecell
Be sure to follow branch naming conventions outlined here. If this fails due to the branch not yet existing,
$ cd larwirecell $ git flow feature start <identifier>_<my_feature>
Update the source to use the new version of wirecell
by editing the product_deps
file:
$ sed -i 's/^wirecell.*/wirecell v0_7_dev/' ~/dev/ls-6.48.00/srcs/larwirecell/ups/product_deps
Finish setting up development environment and do a build:
$ cd ~/dev/ls-6.48.00/build_u16.x86_64/ $ mrbsetenv $ mrb build
If this build fails in CMake with cryptic statements about SOURCE is required
it likely means that your wirecell
UPS product environment is broken as warned above. See section 7.2.5.5.