texts -l flag not working for batch nodes of different databases
Suppose one needs to work on two different models model_A.vm and model_B.vm, each assigned to a different launchpad (and different databases, for this test): launchpad_A.yaml and launchpad_B.yaml:
#model_A
f(x) = x
a1 = f(1) on 2 cores with 2 [GB] for 2.0 [minutes]
a2 = f(1) on 4 cores for 1.0 [minute]
b = (numbers: 1, 2, 3, 4)
c = map((x: x**2), b) in 2 chunks for 1.0 [hour] on 1 core with 3 [GB]
and
#model_B
h2o = Structure water (
(atoms: ((symbols: 'O', 'H', 'H'),
(x: 0., 0., 0.) [nm],
(y: 0., 0.763239, -0.763239) [angstrom],
(z: 0.119262, -0.477047, -0.477047) [angstrom]
)
)
)
calc = Calculator emt (), task: single point
algo = Algorithm Langevin ((timestep: 1.) [fs], (steps: 5),
(temperature_K: 300.) [K], (friction: 0.05 [1/fs]),
(trajectory: true))
prop = Property energy, forces, trajectory ((calculator: calc), (structure: h2o), (algorithm: algo)) on 8 cores for 10.0 [minute]
The ~/.fireworks/FW_config.yaml file contains the following:
LAUNCHPAD_LOC: /home/.fireworks/launchpad_A.yaml
Next, each model is executed in workflow mode, and added to the corresponding database:
$~> texts script -m workflow -f model_A.yaml
program UUID: uuid_of_model_A
program output: >>>
<<<
$~> texts script -l /home/.fireworks/launchpad_B.yaml -m workflow -f model_B.yaml
program UUID: uuid_of_model_B
program output: >>>
<<<
Evaluation of model_A.vm using an interactive session (autorun, background launcher) is done by:
$~> texts session -r -a -u uuid_of_model_A
All batch jobs are queued, and eventually evaluated.
The "COMPLETED" status of all nodes can be verified wit the %hist
magic, or by inspection of the corresponding launchdir folder.
On the other hand, the intuitive way to run the evaluation of model_B would be by using the -l flag pointing to the corresponding launchpad file:
$~> texts session -l /home/.fireworks/launchpad_B.yaml -r -a -u uuid_of_model_B
In this case, the batch job to evaluate the model's prop
variable is queued, and in fact it does start running as batch job, but stops after nearly 30 seconds.
If one checks the model's history, it can be seen that the batch-type node is stuck in "RESERVED" status.
All other interactive nodes which do not depend on the batch node are "COMPLETED", and those that depend on the batch job are now in "WAITING" status.
Inspection of the corresponding .out file within the launcher folder of model_B's batch node shows the following error message:
No FireWorks are ready to run and match query! {'$or': [{'spec._fworker': {'$exists': False}}, {'spec._fworker': None}, {'spec._fworker': 'Automatically generated Worker'}]}
2024-11-21 18:25:48,942 INFO Rocket finished
This seems to be a problem of the -l
flag, somehow not working for batch-type nodes.
Is it not overriding the LAUNCHPAD_LOC
in the FW_config.yaml file, thus not managing to run the batch nodes?
If the ~/.fireworks/FW_config.yaml file is edited, so that in points to the correct launchpad for model_B:
LAUNCHPAD_LOC: /home/.fireworks/launchpad_B.yaml
and the batch node is rerun (%rerun prop
), then the evaluation can be succesfully completed with:
$~> texts session -r -a -u uuid_of_model_B