Skip to content
Snippets Groups Projects
Commit cd5530f4 authored by Ivan Kondov's avatar Ivan Kondov
Browse files

adapted the configuration filenames, highlighting, spelling

parent 666cfee1
No related branches found
No related tags found
No related merge requests found
Advanced and productive use
===========================
Using FireWorks with a batch system on HPC clusters
---------------------------------------------------
* QAdapter and FWorker configuration
* Separate configuration, launches and templates
Use firework categories and fireworker names
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
......@@ -17,46 +9,75 @@ If fireworks have to be launched on different workers (for example in
heterogeneous and distributed computing workflows) the keywords ``_category``
and ``_fworker`` can be used to specify which categories of fireworks can be
launched by what fireworkers. In the following example, the firework can be
launched by fireworker with name *uc1* and configured to process category
launched by any fireworker with name *uc1* that is configured to process category
*turbomole*::
_category: turbomole
_fworker: uc1
Note that the same fireworker may process more than one category or
non-categorised fireworks. This is specified in the fireworker configuration
file in **demos/6_advanced/fworker_turbomole.yaml**::
A matching fireworker is configured in file
**demos/6_advanced/fworker_uc1_turbomole.yaml**::
name: uc1
category: [turbomole]
query: '{}'
In this example, the fireworker may launch only fireworks from category *turbomole*
and tagged with fireworker name *uc1*. Note that the same fireworker may process
more than one category or non-categorised fireworks if configured correspondingly.
Submit jobs to the batch system using ``qlaunch``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If a firework launch takes large resources then it should be submitted to a
batch system on the HPC cluster. This is done with the ``qlaunch`` command. The
command is configured with a qadapter configuration file. For MOAB and SLURM
example configuration files see **demos/6_advanced/qadapter_moab_turbomole.yaml**
and **demos/6_advanced/qadapter_slurm_turbomole.yaml**.
command is configured with a qadapter configuration file. For configuration files
for the queing systems MOAB and SLURM see the examples provided in the folder
**demos/6_advanced/config**.
In practical use cases, it is generally of advantage to separate the configuration
files, the workflow templates (i.e. the input) and the launches. This structure
is used in the demo 6 in folder **demos/6_advanced**.
To submit a single fireworker batch job on bwUniCluster the ``qlaunch`` command
is called like this::
cd launches
qlaunch -q ../config/qadapter_uc1_vasp.yaml -w ../config/fworker_uc1_vasp.yaml singleshot
Deal with failures and crashes
------------------------------
* FIZZLED: what to do next?
- If the error is external (i.e. not in the ``spec``) and fixed then just ``lpad rerun_fws``.
- In-place fix with ``lpad update_fws`` and ``lpad rerun_fws``. Advantage: preservation of all independent COMPLETED fireworks is guaranteed
- Fix the error in and resubmit the whole workflow.
For some reason the execution of a firework may fail and the firework gets
*FIZZLED* state. Depending on the reason for the error there are different
approaches to handle the error:
* If the error is external (i.e. not in the ``spec``) and fixed then the firework
can be rerun using the command ``lpad rerun_fws``.
* If the the error is in the *spec* of the firework then this can be in-place
fixed with the command ``lpad update_fws`` and then rerun with ``lpad rerun_fws``.
Advantage: preservation of all independent *COMPLETED* fireworks is guaranteed.
* If the error is in the *spec* then it can be fixed in the workflow template
and the whole workflow is added again to launchpad. This approach is not
practical with increasing number of errors and updates in the same workflow.
Detect lost runs
----------------
If a job is killed by the batch system its status *RUNNING* gets never changed.
In order to detect such running fireworks we use the command ``lpad detect_lostruns``
which will return the IDs of fireworks with lost runs. Optionally, these can be
rerun set to *FIZZLED*.
* RUNNING forever
- Use the command ``lpad detect_lostruns`` to set to FIZZLED or to rerun
Detect duplicates
-----------------
Fireworks can reuse the data from the launches of identical Fireworks
(duplicates). To enable detection of duplicates the following key is added to the *spec*::
Fireworks can reuse the data from the launches of identical Fireworks (duplicates).
To enable detection of duplicates the following key is added to the *spec*::
_dupefinder:
_fw_name: DupeFinderExact
......@@ -77,7 +98,7 @@ identical section including four fireworks (1-4)::
The second run of ``rlaunch`` detects four duplicate pairs whereas only the last
firework of the second added workflow is executed. After this both workflows are
in COMPLETED state which can be checked with ::
in COMPLETED state which can be checked with::
lpad get_wflows -t -m 2 --rsort created_on
......@@ -86,7 +107,8 @@ Let us now delete the first workflow for that all fireworks have been executed::
lpad delete_wflows -i 1
We see *Remove launches []* in the output, i.e. its launches have not been
removed. With deleting a workflow including duplicated fireworks the shared launcher is removed from launchpad only if all duplicated fireworks are deleted.
removed. With deleting a workflow including duplicated fireworks the shared
launcher is removed from launchpad only if all duplicated fireworks are deleted.
The launches are related now only to the relevant fireworks of the second
workflow. The launches will be removed if we remove the second workflow::
......@@ -113,24 +135,15 @@ us test this with re-running a firework that is identical to another firework::
# the two fireworks are COMPLETED
Security best practices
-----------------------
Configure security (MongoDB authentication and authorization)
Query and analyse data from fireworks and workflows
---------------------------------------------------
TODO
Use FilePad to store fireworks file inputs and file outputs
-----------------------------------------------------------
->
Not every feature of FireWorks is covered in this tutorial. Please visit the
the documentation website https://materialsproject.github.io/fireworks/ for
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment