Duplicate finder
Often computations with identical inputs may be repeated. In the FireWorks workflow management system there is basic duplicate finder DupeFinderExact
class that prevents repeated executions of fireworks with identical spec dictionaries. If the specs differ, for example due to some different but irrelevant data or metadata, DupeFinderExact
does not detect the duplicates. In addition, the method used in the provided class does not scale well.
Some features described in issues #145 (closed) and #144 (closed) involve excessive use of duplicates. In addition, a function may be called several times with the same parameters. In all these cases, however, the spec dictionaries of the duplicates are different and not strictly identical.
Implementation
FireWorks provides the base class DupeFinderBase
. A new subclass DupeFinderImproved()
can be written to perform partial and incremental match. Then all fireworks created must contain {'_dupefinder': DupeFinderImproved(...)}
in the spec.
Source code match
Within one workflow
Source code match can make use of the variable statement - workflow node mapping. In the following example the parameters of the two variables are have identical values but are different objects in the model and for them two different workflow nodes are created:
a = f(1, 2)
b = f(1, 2)
In this case a
and b
should share the same value as soon as one of the two is computed. Another case is if some input occurs with a small difference:
a = f(1, 1.9999999999)
b = f(1, 2.0000000000)
Across several workflows
Several workflows may belong to one model instance (see issue #145 (closed)) and sub-workflows in the workflows in a group may be completely identical.
A sub-workflow from one model can be copied to another model (see issue #144 (closed)). In this case also the different variable names should be considered in the match.
a = f(1, 2.0) # in the source model with ID f1234
a@f1234 = f(1, 2.0) # in target model
To avoid variable name conflicts, all variables can be copied with a namespace
a = f(1, 2.0) # in the source model with ID f1234
a = 3.14 # in the target model, in conflict with a in the source model
a@f1234 = f(1, 2.0) # in target model