gufe.tokenization¶

The machinery for tokenizing gufe objects live in this module.

Module Attributes

`TOKENIZABLE_CLASS_REGISTRY`	noindex:
`REMAPPED_CLASSES`	noindex:
`TOKENIZABLE_REGISTRY`	Registry of tokenizable objects.

Functions

`dict_decode_dependencies`(dct)
`dict_encode_dependencies`(obj)
`from_dict`(dct)
`get_all_gufe_objs`(obj)	For GufeTokenizable obj, get all contained GufeTokenizables.
`get_class`(module, qualname)
`gufe_objects_from_shallow_dict`(obj)	Find GufeTokenizables within a shallow dict.
`gufe_to_digraph`(gufe_obj)	Recursively construct a DiGraph from a GufeTokenizable.
`import_qualname`(modname, qualname[, remappings])
`is_gufe_dict`(dct)
`is_gufe_key_dict`(dct)
`is_gufe_obj`(obj)
`key_decode_dependencies`(dct[, registry])
`key_encode_dependencies`(obj)
`key_renamed`(dct, old_name, new_name)	Serialization migration: Rename a key in the dictionary.
`modify_dependencies`(obj, modifier, is_mine, mode)
`module_qualname`(obj)
`nested_key_moved`(dct, old_name, new_name)	Serialization migration: Move nested key to a new location.
`new_key_added`(dct, new_key, default)	Serialization migration: Add a new key to the dictionary.
`old_key_removed`(dct, old_key, should_warn)	Serialization migration: Remove an old key from the dictionary.
`register_tokenizable_class`(cls)
`to_dict`(obj)
`tokenize`(obj)	Generate a deterministic, relatively-stable token from a GufeTokenizable object.

Classes

`GufeKey`
`GufeTokenizable`(args, *kwargs)	Base class for all tokenizable gufe objects.
`KeyedChain`(keyed_chain)	Keyed chain representation encoder of a GufeTokenizable.

gufe.tokenization.TOKENIZABLE_CLASS_REGISTRY: dict[tuple[str, str], GufeTokenizable] = {('gufe.archival', 'AlchemicalArchive'): <class 'gufe.archival.AlchemicalArchive'>, ('gufe.chemicalsystem', 'ChemicalSystem'): <class 'gufe.chemicalsystem.ChemicalSystem'>, ('gufe.components.component', 'Component'): <class 'gufe.components.component.Component'>, ('gufe.components.explicitmoleculecomponent', 'ExplicitMoleculeComponent'): <class 'gufe.components.explicitmoleculecomponent.ExplicitMoleculeComponent'>, ('gufe.components.proteincomponent', 'ProteinComponent'): <class 'gufe.components.proteincomponent.ProteinComponent'>, ('gufe.components.smallmoleculecomponent', 'SmallMoleculeComponent'): <class 'gufe.components.smallmoleculecomponent.SmallMoleculeComponent'>, ('gufe.components.solvatedpdbcomponent', 'ProteinMembraneComponent'): <class 'gufe.components.solvatedpdbcomponent.ProteinMembraneComponent'>, ('gufe.components.solvatedpdbcomponent', 'SolvatedPDBComponent'): <class 'gufe.components.solvatedpdbcomponent.SolvatedPDBComponent'>, ('gufe.components.solventcomponent', 'BaseSolventComponent'): <class 'gufe.components.solventcomponent.BaseSolventComponent'>, ('gufe.components.solventcomponent', 'SolventComponent'): <class 'gufe.components.solventcomponent.SolventComponent'>, ('gufe.ligandnetwork', 'LigandNetwork'): <class 'gufe.ligandnetwork.LigandNetwork'>, ('gufe.mapping.atom_mapper', 'AtomMapper'): <class 'gufe.mapping.atom_mapper.AtomMapper'>, ('gufe.mapping.atom_mapping', 'AtomMapping'): <class 'gufe.mapping.atom_mapping.AtomMapping'>, ('gufe.mapping.componentmapping', 'ComponentMapping'): <class 'gufe.mapping.componentmapping.ComponentMapping'>, ('gufe.mapping.ligandatommapping', 'LigandAtomMapping'): <class 'gufe.mapping.ligandatommapping.LigandAtomMapping'>, ('gufe.network', 'AlchemicalNetwork'): <class 'gufe.network.AlchemicalNetwork'>, ('gufe.protocols.protocol', 'Protocol'): <class 'gufe.protocols.protocol.Protocol'>, ('gufe.protocols.protocol', 'ProtocolResult'): <class 'gufe.protocols.protocol.ProtocolResult'>, ('gufe.protocols.protocoldag', 'ProtocolDAG'): <class 'gufe.protocols.protocoldag.ProtocolDAG'>, ('gufe.protocols.protocoldag', 'ProtocolDAGResult'): <class 'gufe.protocols.protocoldag.ProtocolDAGResult'>, ('gufe.protocols.protocolunit', 'ProtocolUnit'): <class 'gufe.protocols.protocolunit.ProtocolUnit'>, ('gufe.protocols.protocolunit', 'ProtocolUnitFailure'): <class 'gufe.protocols.protocolunit.ProtocolUnitFailure'>, ('gufe.protocols.protocolunit', 'ProtocolUnitResult'): <class 'gufe.protocols.protocolunit.ProtocolUnitResult'>, ('gufe.tokenization', 'GufeTokenizable'): <class 'gufe.tokenization.GufeTokenizable'>, ('gufe.transformations.transformation', 'NonTransformation'): <class 'gufe.transformations.transformation.NonTransformation'>, ('gufe.transformations.transformation', 'Transformation'): <class 'gufe.transformations.transformation.Transformation'>, ('gufe.transformations.transformation', 'TransformationBase'): <class 'gufe.transformations.transformation.TransformationBase'>}¶: noindex:

gufe.tokenization.REMAPPED_CLASSES: dict[tuple[str, str], tuple[str, str]] = {}¶: noindex:

gufe.tokenization.register_tokenizable_class(cls)¶

gufe.tokenization.new_key_added(dct, new_key, default)¶

Serialization migration: Add a new key to the dictionary.

This can be used in when writing a serialization migration (see GufeTokenizable.serialization_migration() ) where a new key has been added to the object’s representation (e.g., a new parameter has been added). In order to be migratable, the new key must have an associated default value.

Parameters:

dct (dict) – dictionary based on the old serialization version
new_key (str) – name of the new key
default (Any) – default value for the new key

Returns:

input dictionary modified to add the new key

Return type:

dict

gufe.tokenization.old_key_removed(dct, old_key, should_warn)¶

Serialization migration: Remove an old key from the dictionary.

This can be used in when writing a serialization migration (see GufeTokenizable.serialization_migration() ) where a key has been removed from the object’s serialized representation (e.g., an old parameter is no longer allowed). If a parameter has been removed, it is likely that you will want to warn the user that the parameter is no longer used: the should_warn option allows that.

Parameters:

dct (dict) – dictionary based on the old serialization version
old_key (str) – name of the key that has been removed
should_warn (bool) – whether to issue a warning for this (generally recommended)

Returns:

input dictionary modified to remove the old key

Return type:

dict

gufe.tokenization.key_renamed(dct, old_name, new_name)¶

Serialization migration: Rename a key in the dictionary.

This can be used in when writing a serialization migration (see GufeTokenizable.serialization_migration() ) where a key has been renamed (e.g., a parameter name has changed).

Parameters:

dct (dict) – dictionary based on the old serialization version
old_name (str) – name of the key in the old serialization representation
new_name (str) – name of the key in the new serialization representation

Returns:

input dictionary modified to rename the key from the old name to the new one

Return type:

dict

gufe.tokenization.nested_key_moved(dct, old_name, new_name)¶

Serialization migration: Move nested key to a new location.

This can be used in when writing a serialization migration (see GufeTokenizable.serialization_migration()) where a key that is nested in a structure of dicts/lists has been moved elsewhere. It uses labels that match Python namespace/list notations. That is, if dct is the following dict:

{"first": {"inner": ["list", "of", "words"]}}

then the label 'first.inner[1]' would refer to the word 'of'.

In that case, the following call:

nested_key_moved(dct, "first.inner[1]", "second")

would result in the dictionary:

{"first": {"inner": ["list", "words"]}, "second": "of"}

This is particular useful for things like protocol settings, which present as nested objects like this.

Parameters:

dct (dict) – dictionary based on the old serialization version
old_name (str) – label for the old location (see above for description of label format)
new_name (str) – label for the new location (see above for description of label format)

Returns:

input dictionary modified to move the value at the old location to the new location

Return type:

dict

class gufe.tokenization.GufeTokenizable(*args, **kwargs)¶

Base class for all tokenizable gufe objects.

Subclassing from this provides sorting, equality and hashing operators, provided that the class implements the _to_dict() and _from_dict() methods.

This extra work in serializing is important for hashes that are stable across different Python sessions.

classmethod serialization_migration(old_dict: dict, version: int) → dict¶

Migrate old serialization dicts to the current form.

The input dict old_dict comes from some previous serialization version, given by version. The output dict should be in the format of the current serialization dict.

The recommended pattern to use looks like this:

def serialization_migration(cls, old_dict, version):
    if version == 1:
        ...  # do things for migrating version 1->2
    if version <= 2:
        ...  # do things for migrating version 2->3
    if version <= 3:
        ...  # do things for migrating version 3->4
    # etc

This approach steps through each old serialization model on its way to the current version. It keeps code relatively minimal and readable.

As a convenience, the following functions are available to simplify the various kinds of changes that are likely to occur in as serializtion versions change:

new_key_added()
old_key_removed()
key_renamed()
nested_key_moved()

Parameters:

old_dict (dict) – dict as received from a serialized form
version (int) – the serialization version of old_dict

Returns:

serialization dict suitable for the current implementation of from_dict.

Return type:

dict

property logger¶: Return logger adapter for this instance.

property key¶: Tokenized representation of this object, aka ‘gufe key’.

classmethod defaults()¶

Dict of default key-value pairs for this GufeTokenizable object.

These defaults are stripped from the dict form of this object produced with to_dict(include_defaults=False) where default values are present.

to_dict(include_defaults=True) → dict¶

Generate full dict representation, with all referenced GufeTokenizable objects also given in full dict representations.

Parameters:: include_defaults (bool) – If False, strip keys from dict representation with values equal to those in defaults.

classmethod from_dict(dct: dict)¶

Generate an instance from full dict representation.

Parameters:: dct (Dict) – A dictionary produced by to_dict to instantiate from. If an identical instance already exists in memory, it will be returned. Otherwise, a new instance will be returned.

to_keyed_dict(include_defaults=True) → dict¶

Generate keyed dict representation, with all referenced GufeTokenizable objects given in keyed representations.

A keyed representation of an object is a dict of the form:

{‘:gufe-key:’: <GufeTokenizable.key>}

These function as stubs to allow for serialization and storage of GufeTokenizable objects with minimal duplication.

The original object can be re-assembled with from_keyed_dict.

classmethod from_keyed_dict(dct: dict)¶

Generate an instance from keyed dict representation.

Parameters:: dct (Dict) – A dictionary produced by to_keyed_dict to instantiate from. If an identical instance already exists in memory, it will be returned. Otherwise, a new instance will be returned.

to_shallow_dict() → dict¶: Generate shallow dict representation, with all referenced GufeTokenizable objects left intact.

See also

GufeTokenizable.to_dict(), GufeTokenizable.to_keyed_dict()

classmethod from_shallow_dict(dct: dict)¶

Generate an instance from shallow dict representation.

Parameters:: dct (Dict) – A dictionary produced by to_shallow_dict to instantiate from. If an identical instance already exists in memory, it will be returned. Otherwise, a new instance will be returned.

copy_with_replacements(**replacements)¶

Make a modified copy of this object.

Since GufeTokenizables are immutable, this is essentially a shortcut to mutate the object. Note that the keyword arguments it takes are based on keys of the dictionaries used in the the _to_dict/_from_dict cycle for this object; in most cases that is the same as parameters to __init__, but not always.

This will always return a new object in memory. So using obj.copy_with_replacements() (with no keyword arguments) is a way to create a shallow copy: the object is different in memory, but its attributes will be the same objects in memory as the original.

Parameters:: replacements (Dict) – keyword arguments with keys taken from the keys given by the output of this object’s to_dict method.

to_keyed_chain() → list[tuple[str, dict]]¶: Generate a keyed chain representation of the object.

See also

KeyedChain

classmethod from_keyed_chain(keyed_chain: list[tuple[str, dict]])¶

Generate an instance from keyed chain representation.

Parameters:: keyed_chain (List[Tuple[str, Dict]]) – The keyed_chain representation of the GufeTokenizable.

See also

KeyedChain

to_json(file: PathLike | TextIO | None = None) → None | str¶

Generate a JSON keyed chain representation.

This will be written to the filepath or filelike object if passed.

Parameters:: file – A filepath or filelike object to write the JSON to.
Returns:: A minimal JSON representation of the object if file is None; else None.
Return type:: str

See also

from_json

classmethod from_json(file: PathLike | TextIO | None = None, content: str | None = None)¶

Generate an instance from JSON keyed chain representation.

Can provide either a filepath/filelike as file, or JSON content via content.

Parameters:

file – A filepath or filelike object to read JSON data from.
content – A string to read JSON data from.

See also

to_json

to_msgpack(file: PathLike | BinaryIO | None = None, compress: bool = True) → None | bytes¶

Generate a MessagePack keyed chain representation.

This will be written to the filepath or filelike object if passed.

Parameters:

file – A filepath or filelike object to write the encoded msgpack to.
compress – Whether or not to zstandard compress the serialized bytes. The default is True.

Returns:

A minimal msgpack representation of the object if file is None; else None.

Return type:

None | bytes

See also

from_msgpack

classmethod from_msgpack(file: PathLike | BinaryIO | None = None, content: bytes | None = None)¶

Generate an instance from a MessagePack keyed chain representation.

Can provide either a filepath/filelike as file, or msgpack content via content.

Parameters:

file (BinaryIO | PathLike | None) – A filepath or filelike object to read msgpack data from.
content (bytes) – Bytes to read msgpack data from.

See also

to_msgpack

class gufe.tokenization.GufeKey¶

to_dict()¶

property prefix: str¶: Commonly indicates a classname

property token: str¶: Unique hash of this key, typically a md5 value

capitalize()¶

Return a capitalized version of the string.

More specifically, make the first character have upper case and the rest lower case.

casefold()¶: Return a version of the string suitable for caseless comparisons.

center(width, fillchar=' ', /)¶

Return a centered string of length width.

Padding is done using the specified fill character (default is a space).

count(sub[, start[, end]]) → int¶: Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.

encode(encoding='utf-8', errors='strict')¶

Encode the string using the codec registered for encoding.

encoding: The encoding in which to encode the string.
errors: The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.

endswith(suffix[, start[, end]]) → bool¶: Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.

expandtabs(tabsize=8)¶

Return a copy where all tab characters are expanded using spaces.

If tabsize is not given, a tab size of 8 characters is assumed.

find(sub[, start[, end]]) → int¶

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

format(*args, **kwargs) → str¶: Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).

format_map(mapping) → str¶: Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).

index(sub[, start[, end]]) → int¶

Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

isalnum()¶

Return True if the string is an alpha-numeric string, False otherwise.

A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.

isalpha()¶

Return True if the string is an alphabetic string, False otherwise.

A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.

isascii()¶

Return True if all characters in the string are ASCII, False otherwise.

ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.

isdecimal()¶

Return True if the string is a decimal string, False otherwise.

A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.

isdigit()¶

Return True if the string is a digit string, False otherwise.

A string is a digit string if all characters in the string are digits and there is at least one character in the string.

isidentifier()¶

Return True if the string is a valid Python identifier, False otherwise.

Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.

islower()¶

Return True if the string is a lowercase string, False otherwise.

A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.

isnumeric()¶

Return True if the string is a numeric string, False otherwise.

A string is numeric if all characters in the string are numeric and there is at least one character in the string.

isprintable()¶

Return True if all characters in the string are printable, False otherwise.

A character is printable if repr() may use it in its output.

isspace()¶

Return True if the string is a whitespace string, False otherwise.

A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.

istitle()¶

Return True if the string is a title-cased string, False otherwise.

In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.

isupper()¶

Return True if the string is an uppercase string, False otherwise.

A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.

join(iterable, /)¶

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

ljust(width, fillchar=' ', /)¶

Return a left-justified string of length width.

Padding is done using the specified fill character (default is a space).

lower()¶: Return a copy of the string converted to lowercase.

lstrip(chars=None, /)¶

Return a copy of the string with leading whitespace removed.

If chars is given and not None, remove characters in chars instead.

static maketrans()¶

Return a translation table usable for str.translate().

If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.

partition(sep, /)¶

Partition the string into three parts using the given separator.

This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing the original string and two empty strings.

removeprefix(prefix, /)¶

Return a str with the given prefix string removed if present.

If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.

removesuffix(suffix, /)¶

Return a str with the given suffix string removed if present.

If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.

replace(old, new, count=-1, /)¶

Return a copy with all occurrences of substring old replaced by new.

count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.

If the optional argument count is given, only the first count occurrences are replaced.

rfind(sub[, start[, end]]) → int¶

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Return -1 on failure.

rindex(sub[, start[, end]]) → int¶

Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.

Raises ValueError when the substring is not found.

rjust(width, fillchar=' ', /)¶

Return a right-justified string of length width.

Padding is done using the specified fill character (default is a space).

rpartition(sep, /)¶

Partition the string into three parts using the given separator.

This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.

If the separator is not found, returns a 3-tuple containing two empty strings and the original string.

rsplit(sep=None, maxsplit=-1)¶

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the end of the string and works to the front.

rstrip(chars=None, /)¶

Return a copy of the string with trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

split(sep=None, maxsplit=-1)¶

Return a list of the substrings in the string, using sep as the separator string.

sep
The separator used to split the string.

When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.

maxsplit
Maximum number of splits. -1 (the default value) means no limit.

Splitting starts at the front of the string and works to the end.

Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.

splitlines(keepends=False)¶

Return a list of the lines in the string, breaking at line boundaries.

Line breaks are not included in the resulting list unless keepends is given and true.

startswith(prefix[, start[, end]]) → bool¶: Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.

strip(chars=None, /)¶

Return a copy of the string with leading and trailing whitespace removed.

If chars is given and not None, remove characters in chars instead.

swapcase()¶: Convert uppercase characters to lowercase and lowercase characters to uppercase.

title()¶

Return a version of the string where each word is titlecased.

More specifically, words start with uppercased characters and all remaining cased characters have lower case.

translate(table, /)¶

Replace each character in the string using the given translation table.

table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.

The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.

upper()¶: Return a copy of the string converted to uppercase.

zfill(width, /)¶

Pad a numeric string with zeros on the left, to fill a field of the given width.

The string is never truncated.

gufe.tokenization.gufe_objects_from_shallow_dict(obj: list | dict | GufeTokenizable) → list[GufeTokenizable]¶

Find GufeTokenizables within a shallow dict.

This function recursively looks through the list/dict structures encoding GufeTokenizables and returns list of all GufeTokenizables found within those structures, which may be potentially nested.

Parameters:: obj – The input data structure to recursively traverse. For the initial call of this function, this should be the shallow dict of a GufeTokenizable. Input of a GufeTokenizable will immediately return a base case.
Returns:: All GufeTokenizables found in the shallow dict representation of a GufeTokenizable.
Return type:: List[GufeTokenizable]

gufe.tokenization.gufe_to_digraph(gufe_obj)¶

Recursively construct a DiGraph from a GufeTokenizable.

The DiGraph encodes the dependency structure of the GufeTokenizable on other GufeTokenizables.

class gufe.tokenization.KeyedChain(keyed_chain)¶

Keyed chain representation encoder of a GufeTokenizable.

The keyed chain representation of a GufeTokenizable provides a topologically sorted list of gufe keys and GufeTokenizable keyed dicts that can be used to fully recreate a GufeTokenizable without the need for a populated TOKENIZATION_REGISTRY.

The class wraps around a list of tuples containing the gufe key and the keyed dict form of the GufeTokenizable.

Examples

We can create a keyed chain representation from any GufeTokenizable, such as:

>>> from gufe.tokenization import KeyedChain
>>> s = SolventComponent()
>>> keyed_chain = KeyedChain.gufe_to_keyed_chain_rep(s)
>>> keyed_chain
[('SolventComponent-26b4034ad9dbd9f908dfc298ea8d449f',
  {'smiles': 'O',
   'positive_ion': 'Na+',
   'negative_ion': 'Cl-',
   'ion_concentration': '0.15 molar',
   'neutralize': True,
   '__qualname__': 'SolventComponent',
   '__module__': 'gufe.components.solventcomponent',
   ':version:': 1})]

And we can do the reverse operation as well to go from a keyed chain representation back to a GufeTokenizable:

>>> KeyedChain(keyed_chain).to_gufe()
SolventComponent(name=O, Na+, Cl-)

classmethod from_gufe(gufe_object: GufeTokenizable) → Self¶: Initialize a KeyedChain from a GufeTokenizable.

to_gufe(tokenizable_map: dict[str, GufeTokenizable] | None = None) → GufeTokenizable¶: Initialize a GufeTokenizable.

decode_subchains(func: Callable) → Generator[GufeTokenizable, None, None]¶

Extract GufeTokenizable objects matching a pattern from a KeyedChain.

The func function is applied to each keyed dict contained in the KeyedChain. When it evaluates to a truthy value, the GufeTokenizable is created and yielded. Dependencies of this GufeTokenizable are derived from preceding portions of the KeyedChain.

Example

Suppose only the NonTransformation GufeTokenizable objects are wanted from a KeyedChain, an_kc, that encodes an AlchemicalNetwork.

>>> nontransformations = list(an_kc.decode_subchains(lambda kd: kd["__qualname__"] == "NonTransformation"))

classmethod from_keyed_chain_rep(keyed_chain: list[tuple[str, dict]]) → Self¶: Initialize a KeyedChain from a keyed chain representation.

to_keyed_chain_rep() → list[tuple[str, dict]]¶: Return the keyed chain representation of this object.

static gufe_to_keyed_chain_rep(gufe_object: GufeTokenizable) → list[tuple[str, dict]]¶

Create the keyed chain representation of a GufeTokenizable.

This represents the GufeTokenizable as a list of two-element tuples containing, as their first and second elements, the gufe key and keyed dict form of the GufeTokenizable, respectively, and provides the underlying structure used in the KeyedChain class.

Parameters:: gufe_object – The GufeTokenizable for which the KeyedChain is generated.
Returns:: The keyed chain representation of a GufeTokenizable.
Return type:: key_and_keyed_dicts

gufe_keys() → Generator[str, None, None]¶: Create a generator that iterates over the gufe keys in the KeyedChain.

keyed_dicts() → Generator[dict, None, None]¶: Create a generator that iterates over the keyed dicts in the KeyedChain.

gufe.tokenization.TOKENIZABLE_REGISTRY: WeakValueDictionary[str, GufeTokenizable] = <WeakValueDictionary>¶

Registry of tokenizable objects.

Used to avoid duplication of tokenizable gufe objects in memory when deserialized. Each key is a token, each value the corresponding object.

We use a weakref.WeakValueDictionary here to avoid holding references to objects that are no longer referenced anywhere else.

gufe.tokenization.module_qualname(obj)¶

gufe.tokenization.is_gufe_obj(obj: Any)¶

gufe.tokenization.is_gufe_dict(dct: Any)¶

gufe.tokenization.is_gufe_key_dict(dct: Any)¶

gufe.tokenization.import_qualname(modname: str, qualname: str, remappings={})¶

gufe.tokenization.get_class(module: str, qualname: str)¶

gufe.tokenization.modify_dependencies(obj: dict | list, modifier, is_mine, mode, top=True)¶

Parameters:

obj (Dict or List) – Dictionary or list to traverse. Assumes that only mappings are dict and only iterables are list, and that no gufe objects are in the keys of dicts
modifier (Callable[[GufeTokenizable], Any]) – Function that modifies any GufeTokenizable found
is_mine (Callable[Any, bool]) – Function that determines whether the given object should be subjected to the modifier
mode ({'encode', 'decode'}) – Whether this function is being used to encode a set of GufeTokenizable s or decode them from dict or key-encoded forms. Required to determine when to modify objects found in nested dict/list.
top (bool) – If True, skip modifying obj itself; needed for recursive use to avoid early stopping on obj.

gufe.tokenization.to_dict(obj: GufeTokenizable) → dict¶

gufe.tokenization.dict_encode_dependencies(obj: GufeTokenizable) → dict¶

gufe.tokenization.key_encode_dependencies(obj: GufeTokenizable) → dict¶

gufe.tokenization.from_dict(dct) → GufeTokenizable¶

gufe.tokenization.dict_decode_dependencies(dct: dict) → GufeTokenizable¶

gufe.tokenization.key_decode_dependencies(dct: dict, registry=<WeakValueDictionary>) → GufeTokenizable¶

gufe.tokenization.get_all_gufe_objs(obj)¶

For GufeTokenizable obj, get all contained GufeTokenizables.

This is useful when deduplicating GufeTokenizables for serialization.

Parameters:: obj (GufeTokenizable) – the container tokenizable
Returns:: all contained GufeTokenizables
Return type:: Set[GufeTokenizable]

gufe.tokenization.tokenize(obj: GufeTokenizable) → str¶

Generate a deterministic, relatively-stable token from a GufeTokenizable object.

Examples

>>> from gufe import SolventComponent
>>> s = SolventComponent()
>>> tokenize(s)
'e6eef7519854d35a5ce6c84136b3684c'

>>> tokenize(s) == tokenize(SolventComponent.from_dict(s.to_dict()))
True