Input file syntax

The input uses the YAML format, an user-friendly structured data format. In addition, Cuby checks the type of the data entered.

Simple input

In the simplest form, the input is just a list of keywords and their values:

job: energy
interface: turbomole
method: hf
basisset: cc-pVDZ
charge: 0

In addition to the requirements of a valid YAML format, Cuby adds some more checks. Firstly, only keywords defined in Cuby are allowed, use of an unknown key will produce an error (the keywords are not case-sensitive). Secondly, Cuby checks the type of the value. It is easy to write entries of the most common types, such as numbers or strings, but Cuby also uses more advanced types such as arrays and lists. Details on how to write them are provided here.

Default values

When a keyword used by the claculation is missing in the input, cuby might either stop with an error message, or continue using a default value (optionally with a warning printed). This behavior, as well as the respective default value, is defined separately for each keyword.

Comments

YAML uses the character '#' to start a comment, either at the beginning or within a line.

# HF energy calculation
job: energy
method: hf # the method is set here!

Input blocks

More complex computational protocols consist of multiple calculations, input for them is provided in separate blocks of the input. Each block has its name and the contents of the block are indented. The block ends when the indentation returns to the original level. The blocks can be nested arbitrarily. What blocks are required or accepted by different interfaces and protocols is listed in their description.

interface: mixer # Mixes two calculations
calculation_a: # Name of the block defining the first calculation
  method: HF # contents of the block is indented
  modifiers: restraints
  modifier_restraint: # A nested block
    restraints_setup: yes # Contents of the nested block
  charge: 0 # This belongs to the block calculation_a, as defined by the indentation
calculation_b: # A second block at the root level
  method: MP2

There is an important difference between Cuby 3 and Cuby 4: In Cuby 3, most of the information was read at the root level of the input and only some setup was read from the respective block (e.g a charge of the molecule was listed only once at the root level and it was used in all the child calculations). In Cuby 4, each calculation has to be completely defined in its own block, Cuby does not look elsewhere (e.g. in a job with multiple caclulations, the charge has to be entered in each block defining a calculations). This requires some more typing but it makes it possible to create very complex inputs systematically.

Multiple occurences of a keyword

YAML language specification does not allow the use of duplicate keys, but the ruby parser does not complain when it encounters this problems and just overwrites the value. Therefore, input

key: "ABC"
key: "xyz"

sets the value of the key to "xyz". Unfortunately, this is the behavior of the YAML parser itself, so it is not possible to warn the user when such duplicity is encountered.

Reusing parts of the imput

In more complex inputs, parts of the setup can be reused in multiple calculations. Some protocols support a block 'calculation_common' that applies to all the calculations. The same can be achieved at the level of the YAML language where some parts of the input can be named and then reused.

# Definition of the shared settings
# In YAML, the name of the block can be arbitrary, but in cuby input, we use
# a convention that it should start with prefix 'shared_'. Then, a label starting
# with '&' is used to name the block for further use.
shared_mopac: &mopac_setup
  interface: mopac
  method: pm6
  charge: 0

# Another block of shared settings, these can be mixed as needed
shared_job: &job_setup
  job: interaction


# The job itself
job: multistep
steps: methanol, methylamine

calculation_methanol:
  <<: *mopac_setup # This merges in the shared settings defined above
  <<: *job_setup   # This merges in the shared settings defined above
  geometry: S66:02 # Water ... MeOH

calculation_methylamine:
  job: interaction
  <<: *mopac_setup
  geometry: S66:03 # Water ... MeNH2

While it is not required in YAML, Cuby expects all the blocks of shared settings to be named 'shared_...' in order to distinguish them from other input blocks.

Conditional input

It is possible to modify the input depending on a condition evaluated when the calculation is run (more precisely, it is done when the computational protocol used for the calculation is being initialized). The conditional input has to be placed in a separate block named condition_... and the condition must be defined in this block using the condition keyword. If the condition evaluates positively, the contents of the conditial block are copied one level higher, it is into the jeb setup. Multiple conditinal blocks can be present.

The condition is evaluated as a ruby code, in the context of the initialization of the computational protocol. Apart from any general code (e.g. testing for presence of files, etc.), the current settings for the calculation are available as the @settings object. No more information on the calculation is available at this point because the condition should be able to overwrite the settings before they are used to build the calculation.

The following example shows a condition dependent on the interface used to perform the calculation. If the same input was used with another interface, the part of the input specific to MOPAC won't be used:

job: energy
interface: mopac
method: am1
geometry: A24:water
# A condition inserts a setup specific to MOPAC
condition_interface:
  condition: "@settings[:interface] == :mopac"
  mopac_precise: yes

It is, however, useful to apply conditions based on the geometry. While it is not defined yet at the time of the preparation of the calculation, it can be loaded just for the purpose of the evaluation of the condition if it is defined in the settings. In the following example, the number of optimization cycles is increased if the geometry has more than 6 atoms:

job: optimize
interface: mopac
method: pm6

maxcycles: 5

condition_molsize:
  condition: "Geometry.load_from_settings(@settings).size > 6"
  maxcycles: 100

geometry: S66:10