Project
Project(array_store, table_store, label='prj')
¶
Bases: Container
The project is the core interface you have for interacting with data through atomea.
Cadence¶
All data derived from atomistic calculations are on one of two cadences: ensemble and microstate. Both are derived from the conceptual understanding of statistical mechanics.
[!NOTE] We do not support ensembles where the number of particles change (i.e., grand canonical).
Both are defined in the Cadence enum in atomea.
Microstate¶
A microstate is a single, distinguishable configuration of particles (i.e., atoms) where the system's thermodynamic variables are unchanged. In our calculations, a microstate could be a:
- frame of a molecular dynamics simulation trajectory;
- protein-ligand pose in a docking calculation;
- transition state of a chemical reaction.
Data that could change between microstates, such as atomistic coordinates,
energy, docking score, instantaneous temperature or pressure, dipole moment,
electronic state, etc., are given a cadence of MICROSTATE.
Ensemble¶
An ensemble is a collection of microstates where "thermodynamic variables" (e.g., Hamiltonian, temperature, number of particles) are constant. Changes in any of these variables change the ensemble. This also extends to calculation parameters that could—intentionally or not—change properties of that atomistic system (e.g., force field, integration algorithm, docking scoring algorithm, barostat set point, etc.).
Stores¶
Dimensionality of data determines how we represent, store, and analyze it. We define two working categories: scalars and n-dimensional.
Scalars¶
Values that have only one dimension with respect to each microstate are always stored in tables using DataFrames in their respective columns. This includes energies, thermodynamic variables, calculation parameters, and other relevant factors. Data cadence has no influence on storage.
All data should be stored in a way that assumes multiple ensembles and microstates will be present. Each table item must include:
ens_id(str): A unique identification label for an ensemble. This can be"1","default","exp3829", etc.run_id(str): An unique, independent run within the same ensemble. This often arises when running multiple independent molecular simulation trajectories with different random seeds.micro_id(uint): An index specifying a microstate with some relationship to order. This can be a frame in a molecular simulation trajectories, docking scores from best to worst, optimization steps, etc.
N-dimensional¶
Data with more than one value must be stored with arrays with the appropriate number of dimensions for multiple values—even if there is only one. Data for all ensemble runs are stored in a single array since they are theoretically sampled from the same ensemble.
Data must also be stored in the same order as the table indices of that
Container. Thus, row indices from tables can be used to slice arrays.
Note that row indices can change between Containers since not all
data is collected on the same cadence.
| PARAMETER | DESCRIPTION |
|---|---|
|
Storage backend for all arrays.
TYPE:
|
|
Storage backend for all tables.
TYPE:
|
|
Unique ID for this container.
|
energy = Energy(self)
¶
label = label
¶
quantum = Quantum(self)
¶
time = Time(self)
¶
__getitem__(ens_id)
¶
__repr__()
¶
add_ensemble(ens_id)
¶
Create and register a new Ensemble with given ID, using the project's stores.
| PARAMETER | DESCRIPTION |
|---|---|
|
Unique label for the ensemble.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
Ensemble
|
The newly created Ensemble. |
get_ensemble(ens_id)
¶
Retrieve an existing Ensemble by its ID, or None if not found.
list_ensembles()
¶
Return all ensemble IDs managed by this project.
remove_ensemble(ens_id)
¶
Remove an Ensemble from the project by its ID.
TODO: Need to implement dropping tables and arrays.
| RAISES | DESCRIPTION |
|---|---|
KeyError
|
if the ensemble does not exist. |