Add computing resources and granularity
Concepts
Defining computing resources and granularity has no impact on results and is optional in the grammar.
Computing resources:
- computing time
- number of cores
- main memory
- disk storage
- ...
Granularity:
- Data granularity
- Node-task granularity
Task-node and data granularity
Task-node granularity
This part has been moved to issue https://git.scc.kit.edu/virtmat-tools/vre-language/-/issues/95
Data granularity: chunk size
Syntax
<object> [in <int> chunks]
Example with a function
d = f2(c in 4 chunks, b) on 2 cores for 2 hours
is equivalent to
(c1, c2, c3, c4) = split(c, 4)
d1 = f2(c1) on 2 cores for 0.5 hours
d2 = f2(c2) on 2 cores for 0.5 hours
d3 = f2(c3) on 2 cores for 0.5 hours
d4 = f2(c4) on 2 cores for 0.5 hours
d = concat(d1, d2, d3, d4)
without using the chunks
keyword.
Example with an expression
Let us assume that energy is an iterable like pd.Series
, np.array
, ...
use exp from stdlib.functions
use kB from stdlib.constants
temperature = 300.0 K
energy_0 = 2.3 eV
energy = 0 eV to 4 eV step 0.2 eV # alternative to range(0 eV, 4 eV, 0.2 eV)
rate = exp(-(energy-energy_0)**2/(kB*temperature)) in 2 chunks
Implementation
For the first implementation we need basic interpreter for delayed execution.
FireWorks
- use
Firework
objects to implement nodes. UseFiretask
objects to implement tasks usePyTask
andLambdaTask
(both supporting chunk number) to implement data granularity
NOTE: ForeachTask is not necessary because the number of chunks is a known constant input.
NOTE: It can happend that we need specific subclasses of Firetask that better match the needs of the interpreter.
Example implementation of the expression example using FireWorks (pseudocode)
# Task 1: LambdaTask with ForeachTask
func: -(energy-energy_0)**2/(units.kB*temperature)
inputs: energy, energy_0, units.kB, temperature
split: energy
number of chunks: 2
outputs: exp_arg
# Task 2: PyTask with ForeachTask
func: numpy.exp(exp_arg)
inputs: exp_arg
outputs: rate
split: exp_arg
number of chunks: 2
Deferred execution of lambda functions and expressions
For named python functions available via an API we can readily use the PyTask
. For expressions and expressions with dummy identifiers (lambda functions), we need a way to transfer the code to the workflow system. There are two methods.
Via serialization
import dill
import base64
import json
def func(x):
return 2*sqrt(x)
# a named function:
string = json.dumps(base64.b64encode(dill.dumps(func)).decode('utf-8'))
# lambda function:
string = json.dumps(base64.b64encode(dill.dumps(lambda x: x**2)).decode('utf-8'))
new_func = dill.loads(base64.b64decode(json.loads(string).encode())) # reconstructed function in Python
print(new_func(4)) # 4
Expressions can be serialized but it is not clear where they are evaluated, for example:
a = 4
string = json.dumps(base64.b64encode(dill.dumps(x**2)).decode('utf-8'))
expression = dill.loads(base64.b64decode(json.loads(string).encode()))
print(expression) # 16
A drawback of this kind of serialization is that the serialized function may be non-portable (other package than dill, other Python, 64/32 bit, etc.).
Via Python source code
import ast
string = 'lambda x: x**2'
node = ast.parse(string, mode='eval')
assert isinstance(node.body, ast.Lambda)
func = eval(compile(node, '', 'eval'))
print(func(2)) # 4
Firetasks
Optimal wrapping Python functions into It may happend that the exisisting "standard" Firetasks are not optimal for implementing the deferred executor. There are some issues:
- Lambda functions must be passed in a serialized form and not via a
module.name
string - The functions provided to
PyTask
must be aware of the serialized objects passed as arguments and returned. - A mixture of
args
andinputs
is generally not supported. We need a way to pass constant parameters viaargs
intermitted byinputs
, i.e. data from upstream nodes and previous Firetasks.
A proposal: the construction of the FireTask (than must be sub-classed) can be carried out using a decorator function that is specific to every metamodel class that has value
property. This will wrap and serialize every Python function correspondingly. A model processor can be used to construct the Firetasks per model object and add the to Fireworks, and then add the Fireworks to the database. The value
property that will basically fetch the Firetask output will be available only if the state
property is COMPLETED.