From 6a712502c405008be5e516125ec1d0586803634b Mon Sep 17 00:00:00 2001 From: "ivan.kondov" <ivan.kondov@kit.edu> Date: Thu, 5 Dec 2019 15:05:55 +0100 Subject: [PATCH] added the query tutorial --- docs/advanced.rst | 109 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 105 insertions(+), 4 deletions(-) diff --git a/docs/advanced.rst b/docs/advanced.rst index e5817be..2b8b6fa 100644 --- a/docs/advanced.rst +++ b/docs/advanced.rst @@ -138,10 +138,111 @@ us test this with re-running a firework that is identical to another firework:: Query and analyse data from fireworks and workflows --------------------------------------------------- - - -Use FilePad to store fireworks file inputs and file outputs ------------------------------------------------------------ +Fireworks and workflows are stored on the LaunchPad during their full life cycle +-- at the time they are added and become in *READY* state, as they are run in +*RUNNING* state until they reach the *COMPLETED* state. Completed workflows +hold not only the input parameters and the results but also provenence metadata +that help the further use of the workflows. The life cycle of a workflow continues +when it is extended using for example the *append_wflow* command (see exercise 4). + +Another further use of the stored workflows is to collect, reorganize, analyse +and visualize their stored data programatically. To show how data can be extracted +from workflows and fireworks a very simple query module is demonstrated in +**lib/lpad_query.py**. The module is driven by a command ``lpad_query`` +(installed in **bin**) and has similar syntax as the ``lpad`` command. For a +short help this command can be started like:: + + lpad_query --help + +In **demos/6_advanced/analysis** three sample queries are prepared. The first +query:: + + filters: + name: The coffee workflow + state: COMPLETED + selects: [] + +will return all completed workflows that have name *The coffee workflow*. The +empty selection means that no firework and no firework updates are selected. +This query is started with the command:: + + lpad_-o yaml query -f query_sample_1.yaml + +If, supposed, we have one workflow with that name and it is completed the output +is:: + + - fws: [] + metadata: {} + name: The coffee workflow + +Because no fireworks or updates are selected the fws list is empty. The second +query selects for each returned document the fireworks with name *Brew coffee*:: + + selects: + - fw_name: Brew coffee + +The query returns again a list with one workflow (because the filter is the same) +but this time with one firework (metadata and updates):: + + - fws: + - created_on: '2019-12-05T13:42:42.535429' + id: 222 + name: Brew coffee + parents: + - 221 + state: COMPLETED + updated_on: '2019-12-05T13:42:52.854682' + updates: + pure coffee: + - top coffee selection + - workflowing water + metadata: {} + name: The coffee workflow + +If we now add the key ``add fw_spec`` to the selects:: + + selects: + - fw_name: Brew coffee + add fw_spec: true + +the returned data is completed with the specs of the selected fireworks:: + + - fws: + - created_on: '2019-12-05T13:42:42.535429' + id: 222 + name: Brew coffee + parents: + - 221 + spec: + _tasks: + - _fw_name: PyTask + func: auxiliary.print_func + inputs: + - coffee powder + - water + outputs: + - pure coffee + coffee powder: top coffee selection + water: workflowing water + state: COMPLETED + updated_on: '2019-12-05T13:42:52.854682' + updates: + pure coffee: + - top coffee selection + - workflowing water + metadata: {} + name: The coffee workflow + +The third example demonstrates even more complex use of a query for a DFT +calculation of a water molecule. The query include a regular expression, and the +workflow metadata in the filter section. Additionally, more than one firework is +selected and for one firework specific updates (from all updates) are selected. + +All queries are mongo queries and have the pymongo syntax. + + +Use FilePad to store files +-------------------------- -- GitLab