Enable different evaluation and input/output modes
Evaluation
The evaluation strategy in the DSL is lazy (non-strict order) evaluation with memoization to allow reuse/sharing of returned values, i.e. avoid repeated evaluations. This is different from eager (strict order) evaluation. A workflow system itself implements such an evaluation strategy.
Another aspect of evaluation is the time of evaluation - during interpreter runs (let us call it immediate) and after the interpreter has finished (deferred evaluation).
A third aspect is the location of the evaluation: we make difference between local evaluation and remote. This is when we need additional computing resources that are not available locally (i.e. local HPC cluster with a batch system and a JupyterHub instance connected to it).
The current implementation is lazy immediate local evaluation. To use HPC resources for evaluation we need a deferred evaluation. Due to long lifetime and laziness for this we need a workflow system and a batch system as backends.
I/O
Depending on the time and order of input/output we have different situations. First of all, in a workflow (or in a functional program) we cannot predict strictly the order of I/O operations and when they are scheduled. Due to this we simply assume that the state of external inputs does not change with time. This is why inputs - file and URL inputs, and input()
statements - can be performed immediately, i.e. during interpreter phase and before runtime. Similarly, outputs - file and URL outputs, and all print()
statements - can be performed as soon as the results are available. The only limitation is the locality of the sources and targets: expensive I/O operations (large data) should be carried out on the resources where they will be processed / have been produced.
Example
Implementation
The implementation is closely related to issue https://git.scc.kit.edu/jk7683/vre-language/-/issues/6.
Provide the source code to the interpreter
Normally, the interpreter has no access to the source code. The source code file can be accessesed via get_location()
but if the source code is provided as string, this is not possible any more. In the deferred mode, the executed statements from the source code must be captured in the backend workflow system. For this the source code can be provided with
metamodel.model_param_defs.add('source_code', 'Source code of the model')
model = metamodel.model_from_str(prog_input, source_code=prog_input)
...
# in interpreter:
source_code_str = model._tx_model_params['source_code']
Provide a program instance to the interpreter
The are two cases of operation in the deferred mode:
- Starting a program instance "from scratch". In this case, a new workflow will be constructed before the interpreter starts.
- Continuing an existing program instance. A persisted instance is used a checkpoint. In this case the instance is loaded from the database, the persisted objects from the model are reconstructed and then the new code is parsed and interpreted.
metamodel.model_param_defs.add('model_instance', 'Model instance')
model = metamodel.model_from_str(prog_input, model_instance=None) # start from stratch
model = metamodel.model_from_str(prog_input, model_instance=123456) # start model instance with fw_id 123456
...
# in interpreter:
instance_id = model._tx_model_params['model_instance']
In principle, the immediate execution mode may also use persistence but then the persisted model instance is used for checkpointing only because the objects have no state (all completed are in the instance, all failed or incomplete are not).
_tx_model_params
Use directives to manipulate The easiest way to implement how the user can pass _tx_model_params
to the main Python executable is via command line flags. Later on we can introduce "directives", i.e. instructions in the source code that are not part of the model. Similarly, we pass such instructions after the #!
character sequence in shell scripts or after %
in Jupyter notebooks.
Mode of evaluation
The properties of the metamodel classes differ for immediate and deferred evaluation modes and selecting the mode at the time of model instantiation is too late. This means the mode of evaluation is a metamodel property and no model property. Therefore, this has to be specified right after the instantiation of the metamodel, for example like:
metamodel = metamodel_from_file(metamodel_file, auto_init_attributes=False)
...
add_properties(metamodel, deferred_mode=False)
...
program = metamodel.model_from_file(model_file)
func
property methods
Implementation strategy for -
The return value: a tuple of a function and a tuple of parameters that should be used in the deferred call: (function, (par1, par2, ...)).
-
The returned function may not include a reference to any text object. Any expressions and functions that require such references must be evaluated immediately in the
func
property method and included by value in the function definition, Examples:- Do not return
lambda x: x*self.value
but evaluateself.value
likeval = self.value
(note that becausevalue
is a property,self.value
triggers a call to the relevant evaluation method of the specific class) and then return the closurelambda x: x*val
. - The same holds for
self.func
etc. - The same strategy applies for cases such as
lambda: len(self.parameters)
- herelen(self.parameters)
must be evaluated first.
- Do not return
-
The returned tuple of parameters should only contain referenced parameters. Expressions, functions and other objects must be included in the function part of the returned value. Example: for the expression
x0*func1(x1, func2(x2))
the returned value will be(lambda x, y, z: x*func1(y, func2(z)), (x0, x1, x2))
or the equivalent(lambda *args: args[0]*func1(args[1], func2([args[2])), (x0, x1, x2))
. Whereas(lambda x, y, z: x*func1(y, z), (x0, x1, func2(x2)))
is not valid. If the function has no parameters, then the returned tuple must be empty, i.e.tuple()
. -
No objects may be dereferenced and thus evaluated, unless it is obvious that the evaluation is trivial and necessary (see the second rule above).
-
Special cases:
- "Naive" deferred execution of if-expressions and if-functions:
if(expr, true_b, false_b)
... expr_func, expr_pars = expr.func expr_pars_len = len(expr_pars) true_b, true_b_pars = true_b.func true_b_pars_len = len(true_b_pars) false_b, false_b_pars = false_b.func false_b_pars_len = len(false_b_pars) def iffunc(*args): if expr_func(*args[:expr_pars_len]): retval = true_b(*args[expr_pars_len:true_b_pars_len]) else: retval = false_b(*args[true_b_pars_len:false_b_pars_len]) return retval return iffunc, (*expr_pars, *true_b_pars, false_b_pars)
Naive is in the sense that the condition will be evaluated on the same resources as the true branch expression and the false branch expression. These can be split only with a special conditional Firetask that spawns a Firework in a detour-type
FWAction
dynamically.-
if
statement: Because the return (meta-)type isStatement
(note thatStatement
is an abstract rule), the only way to implement this for deferred remote evaluation is via returning aFWAction
that spawns a new Firetask or Firework. If theStatement
is aVariable
then conditional assignment can be the implementation in a Python function that can be wrapped in a single Firetask. But in this case we can use if function, i.e. this code
- "Naive" deferred execution of if-expressions and if-functions:
if cond then a = 4 else b = 3
and this
a = if(cond, 4, null)
b = if(cond, null, 3)
are equivalent.
Implementation strategy for the Firetask/Firework construction
Firetask
- Parameters that are
Variable
references are added to the argument list of the functions and their names are passed asinputs
to Firetasks. - References to objects that have no
name
attributes must be resolved in such a way to wrape the objects to a functions and use references to named object as inputs. - Parameters that are other objects must be wraped in the functions. If they include references these will be appended to the function argument list and their names appended to
inputs
. - Returned values are
outputs
in Firetasks containing the names ofVariable
objects.
Open questions
-
Add args for constant arguments? For which constants? Constants in this context are attributes of metamodel class instances that are neither references to other objects nor other metamodel objects, i.e. they are of primitive data types from integer, float, string and boolean literals. Examples: In
ObjectProperty
we havestart = INT
,stop = INT
,step = INT
. InSeries
we haveurl = STRING
,filename = STRING
. Another simple example is the programa = 1
. This generates a Firetask returning value 1 that is stored as{'a': 1}
on thespec
of this Fireworks and all children (due tooutputs = ['a']
). Where should be the1
coded? 1) Inlined in the function:lambda: 1
,inputs=[]
or 2) provided as argument:lambda x: x
,args = [1]
or 3) as keyword argument:def a_func(param): return param
,kwargs: {'param': 1}
. -
How to implement "constant" arguments: use
kwargs
orargs
?
Firework
Regarding the granularity this is more related to issue https://git.scc.kit.edu/virtmat-tools/vre-language/-/issues/6. Important is that the source code is stored task-wise, task-private parameters must be stored in the Firetasks and workflow data, i.e. those specified in inputs
and outputs
, must be stored in the Firework spec
.
I/O operations in deferred remote evaluation
print()
, obj to file
and obj to url
statements
A database query followed by the actual output operation. All these operations should be non-blocking.
-
print()
: state-dependent and for interactive use and called always immediately: If the object is not evaluated at the time of callnull
value is printed. After Jupyter integrationprint()
is not necessary. -
obj to file
andobj to url
are state-independent should return and wait (non-blocking) until objects are evaluated (checked by database queries) and then executed.
input()
, Object from file
and Object from url
statements
-
input()
should be blocking, for interactive input on the console only. After Jupyter integrationinput()
is not necessary. -
Object from file
andObject from url
should be blocking all operations depending on the inputs. These can be implemented as Firetasks / Fireworks parent to the depedent Fireworks / Firetasks.