Protocol multistep

This protocol allows chaining more calculations (steps) in one input file. Compared to running a series of calculation from outside of cuby, there are multiple advantages:

Output

The output of a serial calculation and a parallel one (where steps queued using the keyword step_queue) might differ in some cases. When teh steps are executed in series, the step header is printed and all output from this step follows. When a step is queued, the step header and final result is printed only after it is finished. However, some of its output (e.g. from the preparation of the jobs, or output from its parts taht must be executed serially, e.g. the individual cycles of optimization) are printed earlier.

Input structure

The protocol requires following blocks in the input:

Optionally, following blocks can be defined in the input:

Keywords used

Keywords specific for this protocol:

Other keywords used by this protocol:

Examples

The following examples, along with all other files needed to run them, can be found in the directory cuby4/protocols/multistep/examples

#===============================================================================
# Multistep protocol example 1 - using common setup
#===============================================================================

# The following multistep calculates interaction energy in water dimer
# using PM6 with various versions of dispersion and H-bond corrections.

job: multistep

# Step names must be defined first
steps: pm6, pm6-dh2, pm6-dh+, pm6-d3h4

# Large part of the setup is the same for all the steps. It is put into
# the common block which is used in each step (any settings here can be
# overriden in the step block)
calculation_common:
  job: interaction
  geometry: S66:01
  interface: mopac
  method: pm6
  charge: 0
  molecule_a:
    charge: 0
  molecule_b:
    charge: 0

# Each of the steps has its own block in the input, its name consist of
# prefix "calculation_" and the name of the step.

# calculation_pm6:
# Subsection not needed, common setup is used

calculation_pm6-dh2:
  mopac_corrections: dh2

calculation_pm6-dh+:
  mopac_corrections: dh+

calculation_pm6-d3h4:
  # Any step can have a custom title:
  step_title: "PM6-D3H4 - not in Mopac, using modifiers in Cuby"
  # Default PM6 calculation from calculation_common is used,
  # augmented with the following corrections:
  modifiers: dispersion3, h_bonds4
  modifier_h_bonds4:
    h_bonds4_extra_scaling: {}

#===============================================================================
# Multistep protocol example 2 - conditions
#===============================================================================

# The steps can be executed conditionally, depending on existence of a file.

# The following input simplifies a common task: optimization followed by
# calculation of vibrational frequencies. When optimized geometry is found,
# the first step is skipped. Run the example twice to see the difference.

job: multistep

# Here, different syntax is used for the list of steps - YAML array
steps: 
  - opt
  - freq

# Common setup: computational method
calculation_common:
  interface: turbomole
  method: dft
  functional: b-lyp
  basisset: SV
  charge: 0
  density_fitting: none

# Optimization
calculation_opt:
  # Optimization can be skipped if optimized geometry is found:
  skip_step_if_file_found: optimized.xyz
  job: optimize
  opt_quality: 0.1
  geometry: A24:water # Water molecule


# Frequencies
calculation_freq:
  job: frequencies
  geometry: optimized.xyz

#===============================================================================
# Multistep protocol example 3 - composite result
#===============================================================================

# A single result composed from the results of all the steps can be calculated.

# Example: calculate PM6-D3H4 energy difefrence before and after optimization
# at PM6 level.

job: multistep
steps: energy1, opt, energy2

# The final result:
# Expression for calculation of the results in Ruby language
# The results of the steps are stored in a variable steps, indexed
# by step name, as instances of the Results class.
multistep_result_expression: "steps['energy2'].energy - steps['energy1'].energy"
# Arbitrary name of the result, optional
multistep_result_name: Energy difference

# Energy before optimization
calculation_energy1:
  job: energy
  method: pm6
  modifiers: dispersion3, h_bonds4
  geometry: A24:water # Water molecule

# Optimization
calculation_opt:
  job: optimize
  opt_quality: 0.1
  method: pm6
  geometry: A24:water # same as in the first step
  history_freq: 0 # do not write optimization history
  
# Energy after optimization
calculation_energy2:
  job: energy
  method: pm6
  modifiers: dispersion3, h_bonds4
  geometry: optimized.xyz

# Common setup
calculation_common:
  interface: mopac
  charge: 0
#===============================================================================
# Multistep protocol example 4 - custom processing of the results
#===============================================================================

# Alternatively to example 3, a custom ruby code can be run on the results
# to perform the final processing

multistep_result_eval: |
  puts "Greetings, human master! Let me serve the results in a pleasing manner."
  energy_difefrence = steps['energy2'].energy - steps['energy1'].energy
  puts "The energy difefrence you seek is #{'%.3f' % energy_difefrence} kcal/mol"
  `rm optimized.xyz` # Also, delete the optimized geometry

# The rest is the same as in previous example:
job: multistep
steps: energy1, opt, energy2

# Energy before optimization
calculation_energy1:
  job: energy
  method: pm6
  modifiers: dispersion3, h_bonds4
  geometry: A24:water # Water molecule

# Optimization
calculation_opt:
  job: optimize
  opt_quality: 0.1
  method: pm6
  geometry: A24:water # same as in the first step
  history_freq: 0 # do not write optimization history
  
# Energy after optimization
calculation_energy2:
  job: energy
  method: pm6
  modifiers: dispersion3, h_bonds4
  geometry: optimized.xyz

# Common setup
calculation_common:
  interface: mopac
  charge: 0
#===============================================================================
# Multistep protocol example 5 - parallelization
#===============================================================================

# This is a copy of Example 1 with added paralellization. Only the commented
# keywords were added.

# Steps that do not depend on each other can be calculated in parallel.
# The number of parallel processes is set globally:
cuby_threads: 4

job: multistep
steps: pm6, pm6-dh2, pm6-dh+, pm6-d3h4

calculation_common:
  job: interaction
  geometry: S66:01
  interface: mopac
  method: pm6
  charge: 0
  molecule_a:
    charge: 0
  molecule_b:
    charge: 0

calculation_pm6:
# The following keyword (applied also to the following steps) queues this step
# until 1) all steps are queued, or 2) a step which does not have this flag is 
# encountered. Then, the queue is executed and results are printed in the
# original order.
  step_queue: yes

calculation_pm6-dh2:
  mopac_corrections: dh2
  step_queue: yes # queue this step as well

calculation_pm6-dh+:
  mopac_corrections: dh+
  step_queue: yes # queue this step as well

calculation_pm6-d3h4:
  step_title: "PM6-D3H4 - not in Mopac, using modifiers in Cuby"
# Do not queue this step. When this step is encountered, all the previously
# queued steps are executed before this step is executed.
  step_queue: no # (this is the default value)
  modifiers: dispersion3, h_bonds4
  modifier_h_bonds4:
    h_bonds4_extra_scaling: {}