Data models included in gufe¶
The core of the gufe data model is the GufeTokenizable class, but gufe features more than just this base data structure.
To ensure interoperability, gufe also defines classes of objects that represent the core chemistry and alchemistry of a free energy pipeline, including molecules, chemical systems, and alchemical transformations. In other words, gufe provides a shared language used by tools across the OpenFE ecosystem.
Below, you will learn how the various pieces of gufe fit together. Generally speaking, ChemicalSystems can be thought of as the what or the nouns that we are simulating, Transformations are the how or the verbs that encode how we are simulating these objects and moving between them, and an AlchemicalNetwork is like a sentence that groups all of these together.
Note
Some of these classes are designed to be subclassed, and constitute the extensible points of the library. These include (but are not limited to) the following; see the How-To Guides for more information on how to extend from each:
Component¶
The Component class represents a portion of a system of molecules, where a single Component is capable of representing anything from an individual drug-like molecule, to an entire protein, to a solvent with ions.
Components are often used as the building blocks of a ChemicalSystem, which form the nodes of an AlchemicalNetwork.
The same Component may be present within multiple ChemicalSystems, such as a ProteinComponent in an AlchemicalNetwork featuring relative binding transformations between ligands.
As another distinct example: the SmallMoleculeComponent class (which is a subclass of Component) is used to form the nodes of a LigandNetwork.
This is useful for representing relative transformations between a series of small molecules without invoking the additional complexity of an AlchemicalNetwork.
Note
The Component class is an extensible point of the library,
and is intended to be subclassed to enable new applications.
For details on how to create your own Component classes, see How to define a new Component.
ChemicalSystem¶
A ChemicalSystem represents a complete system of molecules and is often composed of multiple Components.
These are most often used as nodes of an AlchemicalNetwork, with pairs of ChemicalSystems connected by Transformations.
Because a ChemicalSystem functions as a kind of container of Components, more than one ChemicalSystem can feature the same Component.
This allows even very large AlchemicalNetworks to be relatively small in memory, as only a few large Components, like ProteinComponents, may be shared among hundreds of ChemicalSystems.
See Deduplication in memory (flyweight pattern) for more details about this memory optimization.
When used as inputs to a Transformation, ChemicalSystems represent the set of Components for which a free energy difference will be estimated.
Alchemical methods performing free energy perturbation (FEP) between the two ChemicalSystems of a Transformation will simulate these Components using some sampling approach, obtaining enough information to derive a free energy difference estimate.
Transformation¶
A Transformation represents an alchemical transformation between two ChemicalSystems.
Transformation objects are often used as the edges of an AlchemicalNetwork.
In addition to referencing the two ChemicalSystems it spans, a Transformation also includes the Protocol used to actually perform the alchemical transformation, as well as an ComponentMapping specifying what portions of the Components are being transformed across the ChemicalSystems.
A Transformation functions as a container for all the information needed to obtain an estimate of the free energy difference between its two ChemicalSystems.
NonTransformation¶
A NonTransformation represents non-alchemical sampling of a single ChemicalSystem.
In the context of an AlchemicalNetwork, a NonTransformation is effectively a self-loop, featuring the same ChemicalSystem at either end.
Similar to a Transformation, it features a Protocol used to perform sampling on its ChemicalSystem, but does not feature a ComponentMapping since there is no second ChemicalSystem.
An example of a Protocol that would be appropriate for a NonTransformation is one that performs equilibrium molecular dynamics of the ChemicalSystem.
A NonTransformation cannot be used to obtain a free energy difference estimate, since by definition transforming the ChemicalSystem to itself should be exactly 0.
Protocol¶
A Protocol represents the specific sampling approach used to transform one ChemicalSystem into another (as in a Transformation), or to simply sample a single ChemicalSystem (as in a NonTransformation).
Protocol objects are often used as part of a Transformation, although they can be used on their own alongside ChemicalSystems and ComponentMappings (when needed) to obtain free energy difference estimates.
Individual Protocol subclasses obtain these estimates in a wide variety of ways, with varying domains of applicability and effectiveness.
The Protocol.create() method is used to generate ProtocolDAGs that can be executed to produce ProtocolDAGResults.
The Protocol.gather() method is then used to aggregate the contents of many ProtocolDAGResults into a ProtocolResult.
Note
The Protocol is an extensible point of the library,
and is intended to be subclassed to enable new applications.
For details on how to create your own Protocol classes, see How to define a new Protocol.
ProtocolDAG¶
A ProtocolDAG is an executable object that performs a Protocol.
A ProtocolDAG is created via Protocol.create() in combination with ChemicalSystem(s) and a ComponentMapping (when needed).
It is a directed acyclic graph (DAG) of ProtocolUnits and their dependency relationships.
The ProtocolUnits of this ProtocolDAG can be executed in dependency-order to yield information needed for a free energy difference estimate.
ProtocolDAGs are generally only handled directly by ecosystem tools that perform Transformation execution.
ProtocolUnit¶
A ProtocolUnit is the unit of execution of a ProtocolDAG, functioning as a node with dependency relationships within the directed acyclic graph (DAG).
A ProtocolUnit retains all of its inputs as attributes, including any ProtocolUnits present among those inputs.
An execution engine performing the ProtocolUnit feeds the ProtocolUnitResults corresponding to its dependencies to its ProtocolUnit.execute() method, returning its own ProtocolUnitResult upon success.
If the ProtocolUnit fails to execute, a ProtocolUnitFailure is returned instead.
Because ProtocolUnits are only a function of their inputs and dependencies, they can be executed and retried by an execution engine in a variety of ways, in different processes, on different machines, etc.
Their outputs can also be preserved to allow for partial execution and a form of checkpointing for ProtocolDAGs.
Note
The ProtocolUnit is an extensible point of the library alongside Protocol,
and is intended to be subclassed to enable new applications.
For details on how to create your own ProtocolUnit classes, see How to define a new Protocol.
ProtocolUnitResult¶
A ProtocolUnitResult retains the results from successful execution of a ProtocolUnit.
A ProtocolUnitResult retains as attributes all of its inputs, including any ProtocolUnitResults present among those inputs.
It is returned by a successful call to its corresponding ProtocolUnit.execute() method, and retains all outputs from execution.
It also retains its start and end datetime, and potentially other provenance information.
ProtocolUnitFailure¶
A ProtocolUnitFailure retains the results from failed execution of a ProtocolUnit.
A ProtocolUnitFailure retains the same information as a ProtocolUnitResult,
but because it is returned by a failed call to its corresponding ProtocolUnit.execute() method, it has not outputs to retain.
It does, however, retain the Exception and traceback of the error.
ProtocolDAGResult¶
A ProtocolDAGResult retains the results from executing a ProtocolDAG.
A ProtocolDAGResult contains the same information as a ProtocolDAG (including ProtocolUnits and their dependency relationships), while also featuring the set of ProtocolUnitResults (and ProtocolUnitFailures, if present) that resulted from each.
Each individual ProtocolDAGResult always contains enough information to obtain a free energy difference estimate, though perhaps undersampled and unconverged.
Multiple ProtocolDAGResults can be aggregated together via Protocol.gather() to yield a ProtocolResult, giving the best estimate for the free energy difference possible given the data presented among the ProtocolDAGResults.
ProtocolResult¶
A ProtocolResult aggregates the results from one or more ProtocolDAGResults to yield a free energy difference estimate.
ProtocolResult objects are created from Protocol.gather(), and feature the Protocol-specific methods necessary to obtain actual free energy difference estimates from a set of ProtocolDAGResults, namely:
Note
The ProtocolResult is an extensible point of the library alongside Protocol,
and is intended to be subclassed to enable new applications.
For details on how to create your own ProtocolResult classes, see How to define a new Protocol.
ComponentMapping¶
A ComponentMapping expresses that two Components are related to each other via some kind of mapping.
A ComponentMapping is the most minimal extensible point for relating two Components to each other, as it does not require that the any details of the relationship are defined as a mapping.
See AtomMapping for an extensible point that is more specific to atom-based Components.
Note
The ComponentMapping is an extensible point of the library,
and is intended to be subclassed to enable new applications.
AtomMapping¶
An AtomMapping expresses that two Components are related to each other via a mapping between their atoms.
AtomMappings describe the relationship between componentA and componentB in terms of their atoms’ indices with the methods AtomMapping.componentA_to_componentB() and AtomMapping.componentB_to_componentA().
An AtomMapping is typically generated by an AtomMapper, as described below.
A specialized example of an AtomMapping is a LigandAtomMapping, which is used to define the edges in a LigandNetwork.
Note
The AtomMapping is an extensible point of the library,
and is intended to be subclassed to enable new applications.
AtomMapper¶
An AtomMapper generates an iterable of AtomMappings, given two Components via the AtomMapper.suggest_mappings() method.
As with an AtomMapping, it is assumed that the relationship between the Components can be described in terms of the atoms’ indices.
A specialized example of an AtomMapper is a LigandAtomMapper, which generates LigandAtomMapping/s.
Note
The AtomMapper is an extensible point of the library,
and is intended to be subclassed to enable new applications.
LigandNetwork¶
A LigandNetwork is a set of SmallMoleculeComponents and LigandAtomMappings organized into a directed network.
A LigandNetwork is a GufeTokenizable, but can also be represented as a networkx graph using the LigandNetwork.graph() property.
An AlchemicalNetwork for a relative binding free energy calculation can be created from a LigandNetwork, using the LigandNetwork() convenience method. This uses the LigandNetwork along with user-defined SolventComponent, ProteinComponent, and Protocol to create the Transformation/s edges and ChemicalSystem nodes constitute an AlchemicalNetwork.
AlchemicalNetwork¶
An AlchemicalNetwork is a set of ChemicalSystems, Transformations, and NonTransformations, fully representing a set of alchemical and non-alchemical calculations to be performed.
An AlchemicalNetwork functions as a single container for a collection of (often related) Transformations and their ChemicalSystems.
It is simply a grouping of these objects, optionally with a name attached.
For Transformations that feature many ChemicalSystems in common, these objects effectively encode these relationships.
Some execution engines, such as alchemiscale, ingest AlchemicalNetworks as their primary unit of input.
See the diagram at the top of this page for a graphical depiction of an AlchemicalNetwork.