Standard data types: Atoms
On the long term, we can use an entity-relationship model to describe the things in the domain and their inter-relations. It provides the necessary extensibility towards adoption of a domain ontology.
The current use case requires definition of Atoms
. This will be the first type to implement.
Some ideas
Basic data structure
There may be no difference between Atoms
and Atom
in contrast to ASE. It can be just Atoms
. Atoms
can be seen as a union (that can be implemented as a Tuple
) of two Table
objects: one has as many rows as the number of atoms (think of pandas DataFrame
) and one has just one row (think of a Python dict
).
Examples
For a hydrogen atom the first table t1
will look like this
((symbols: H), (x: 1), (y: 2), (z: 3), (tags: adsorbate))
A water molecule with an H atom:
((symbols: H, H, O, H), (x: 1, 2, 3, 0), (y: 4, 5, 6, 0), (z: 7, 8, 9, 0),
(tags: molecule, molecule, molecule, single atom))
Further optional columns in the first Table t1
can be masses, charges, momenta, forces, magmoms, isotopes.
- masses: the atomic masses, type
float
- charges: the atomic charges, type
float
- momenta: the momenta of the nuclei, type
(float, 3)
- forces: forces applied on the nuclei, type
(float, 3)
(this property is external, see below) - magmoms: electronic atomic magnetic moments. types:
-
float
for collinear spin systems -
(float, 3)
for non-collinear spin systems
-
- isotopes: optional isotopes: a
Table
with three columns:- mass number
- atomic mass
- fraction
Type:
-
((int, float, float), null)
: if custom isotopes are used the number of rows is the number of isotopes -
null
: if the standard isotopes are used
-
- numbers: atomic numbers of type
int
; these are redundant to the symbols (element names)
The ASE is used for the names.
In the second table t2
we can define for example these columns (each with one row):
- cell: 3x3-matrix containing the cell vectors v1, v2, v3 each with components in x, y, z -> numpy array -> pandas DataFrame; type
((float, float, float), 3)
- pbc: periodic boundary conditions: 3-vector -> numpy array -> pandas Series; type
(bool, 3)
- charge: the total charge of type
float
The union can be implemented in different ways. The simplest would be a Tuple
of the two tables: (t1, t2)
. Another possibility is another Table
: (atoms: t1, cell: ..., charge: ..., pbc: ...)
Types of properties
"Internal" properties
Some properties are fully defined by the structure. They are attributes of Atoms
objects defined in the second Table.
Example: The mean distance between adsorbed atomic species on a surface slab. This property does not depend on the method if the structure is defined. The moments of inertia is another example.
"External" properties
Other properties are approximations that must be described. These are related to a specific parameter set (describing the equation, algorithm, external codes, etc.). These are not integral part of the Atoms object. Example: The energy. This property has many different approximations, including the computational method (classical force field, density functional theory, parameters etc.). The same holds for the forces, the second derivatives and all properties defined by them.
Ontology
The advantages of the suggested ontology are: 1) it is a taxonomy (no triangle of relations but a tree of relations); 2) Structure
has atoms but can have molecules or other objects while keeping the other attributes; 3) Structure
is independent of any method, equation or approximations; Property
has one Structure
and one Calculator
instance, both as references.
Structure |
---|
atoms |
cell |
pbc |
... |
Calculator |
---|
name |
parameters |
... |
Property |
---|
name |
value |
... |
structure |
calculator |
Instantiation
To instantiate one of the three structures (Structure
, Calculator
and Property
) from scratch requires a new syntax to make an explicit (syntactic) difference from Tuple
and Table
objects. A semantic difference can be made either by the parser or later by the interpreter by reserving the (many) names of the included series.
Another way can be via intrinsic "functions" where the top-level table is kind-of "unpacked":
struct = Structure(atoms: (...), cell: (...), pbc: true)
calc = Calculator(name: ..., parameters: ((p1: ...), (p2: ...), ...))
e = Property(name: 'total energy', structure: struct, calculator: calc)
f = Property(name: 'forces', structure: struct, calculator: calc)
In this case only the words Structure
, Calculator
and Property
have to be reserved. Another (more verbose) possible syntax with from
allows the reuse of Table
objects, files and URLs:
struct = Structure from ((atoms: (...)), (cell: (...)), (pbc: true))
calc = Calculator from ((name: ...), (parameters: ((p1: ...), (p2: ...), ...))))
e = Property from ((name: 'total energy'), (structure: struct), (calculator: calc))
f = Property from ((name: 'forces'), (structure: struct), (calculator: calc))