cores resource annotations in vm script and ntasks in qadapter object clash if they do not match

This issue occurs when the WFEngine is employed to evaluate a model that was executed in workflow mode (and already added to a database). Any fireworks that contain resource annotations, for example:

props_H2O = Property energy, forces, dipole for structure geom_H2O
            with calculator calc with constraints (H2O_plane),
            task: local minimum
            on 5 cores for 10.0 [minutes] <--- THIS!

will be assigned with the batch category, and eventually submitted for evaluation to the queue of the HPC host by the WFEngine. To achieve this, the engine will generate a SLURM submission script (FW_submit.script) for each batch firework. The annotation will introduce in such script the line:

#SBATCH --ntasks-per-node=5

For this example, the specifications of the corresponding firework would contain the following:

"spec": {

    "_category": "batch",
    -"_queueadapter": {
        "job_name": "890ee3c1-23ec-4b9c-bfd4-4c1c80435039",
        "nodes": 1,
        "ntasks_per_node": 5,
        "queue": "dev_cpuonly",
        "walltime": 10
    },

At the same time, when the WFEngine is used, a qadapter object should be created, either within the running Jupyter-API, or loaded from a .yaml file. Typically this qadapter object should include some specification for the amount of cores for the task, indicated by the ntasks parameter, for example:

ntasks: 10

This adds to the FW_submit.script file the line

#SBATCH --ntasks=10

A problem arises if the number for ntasks and ntasks-per-node don't match in the submission script. The job crashes when the engine tries to submit it. The error message from the WFEngine log console is:

qadapter.SLURM - ERROR - Error in job submission with SLURM file FW_submit.script and cmd ['sbatch', 'FW_submit.script']
The error response reads: b'sbatch: error: Batch job submission failed: Requested node configuration is not available

Aside from user awareness (and probably a WARNING entry in the manual), it would be useful to have some way to ensure consistency between these two options to avoit problems.

Assignee Loading
Time tracking Loading

Help | Imprint | Privacy policy | Accessibility | Contact