Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

19
Computer Modelling and Molecular Dynamics Simulation of Biomolecules

Maria Reif and Martin Zacharias

Physics Department T38, Technical University of Munich, James‐Franck‐Str. 1, 85748 Garching, Germany

19.1 Significance

Computational modelling and simulations have become standard tools in (bio)chemistry. It is important not to use these tools as ‘black box’ approaches but to understand the underlying physical and algorithmic principles. It requires also careful preparation of the simulation or modelling input by the user and careful analysis of the results. Critical analysis is essential because the computational study of a system always relies on a model representation of the real system and thus comes with approximations that may impact the results [1–3]. A user can, however, only critically analyse the results of a computer modelling or simulation procedure if he/she has sufficient knowledge concerning the limitations of the underlying model. The present chapter will provide a basis for understanding the principles of computer modelling and molecular dynamics (MD) simulation of biomolecules. Further reading suggestions are listed at the end of the chapter.

In general, computational modelling and simulation is widely applied in many areas of science, not only (bio)chemistry. It allows studying systems or processes that are elusive to experimental study, e.g. modelling the crash of a car or aeroplane. In (bio)chemistry, the most important advantage of computational modelling and simulation is that insight complementary to experiment may be provided. For instance, with MD simulations one can ‘zoom in’ to the dynamic behaviour of biomolecules at atomistic resolution and down to the picosecond‐time level and thus gain information that experiments cannot deliver to date. This additional information can be used to suggest novel experiments, e.g. mutations in an enzyme that improve substrate binding or modification of a drug for tighter binding to the receptor.

In 1957, the first MD simulation was performed [4]. The simulated system was very simple, merely consisting of two‐dimensional hard disks. In 1964, the first liquid was simulated, argon, for a period of 6 ps [5]. The first molecular liquid, water, was simulated in 1971 for a period of 2 ps [6]. Algorithmic innovations and the development of models representing biomolecules finally allowed the first simulation of a protein in vacuum in 1977 [7] and in water in 1982 for 25 ps [8].

From year to year, researchers are able to study larger (bio)chemical systems for longer simulated time periods, i.e. the time effort required by the computer programs to perform a certain study continues to decrease. This is to a great extent due to hardware advances. Manufacturers keep devising more powerful computer and graphics processing units. The continuous improvements in hardware were, until recently, captured remarkably well by Moore's law [9], stating that the number of transistors in an integrated circuit would double every two years (due to continuous shrinking of transistor size) and hence computer power would also double every two years. This development drove the digital revolution both in science and any other area of life but may have approached an end. Once the atomistic scale has been reached, transistor size cannot be decreased any further along the lines of currently employed technologies. Innovations such as quantum computing [10] or neuromorphic computing [11] are expected to keep advancing the power and influence of computers in science, technology and daily life. Especially concerning applications in (bio)chemistry, these new technologies appear to offer exciting opportunities [12,13]. The present chapter, however, deals with the state‐of‐the‐art of molecular modelling and classical MD simulation in (bio)chemistry and the theoretical foundations to provide the reader with an overview and to enable him/her to take further steps in this exciting and rapidly evolving area.

19.2 Theory and Principles

19.2.1 General Considerations

When constructing a computational model of a biomolecular system, four different aspects have to be considered [14]. First, the degrees of freedom (dof) of the system have to be defined. They depend on the spatial resolution at which the system is to be modelled and influence the timescale of motions occurring in the system, as well as, from a practical point of view, the simulation timescale and method. For example, if the chosen dof are atoms including their constituent nuclei and electrons, the fastest motions in the system occur on a timescale of femto‐ to picoseconds and the simulation method of quantum dynamics [15] may be applied to propagate the dynamics of the system. Given this choice of dof and simulation method, motions that occur on a much larger timescale may be inaccessible due to restrictions in compute power. If, on the other hand, a very coarse‐grained depiction of the system is chosen, e.g. the dof are captured by a continuum of velocities, motions in the system occur on a supramolecular to macroscopic timescale and the simulation method of fluid dynamics [16] may be applied to propagate the dynamics of the system. Intermediate between these two regimes is the definition of dof as atoms, small groups of atoms or monomers. Such a description is commonly chosen for bimolecular modelling. Motions in the system then occur on a molecular (picosecond to nanosecond) timescale, the properties of the system can be described with statistical mechanics [17] and its dynamics can be propagated via classical mechanics [18,19]. In case one is not interested in the dynamics, but in the structural features of a biomolecule, one may model it on the level of its building blocks (monomers, i.e. amino acids or nucleic acid bases) and construct the structure via statistical methods based on database examination (comparative modelling [20,21]) or ‘from scratch’ (ab initio modelling [22]).

Second, given a particular set of dof in the system, one has to define how these are interacting, i.e. an energy function has to be defined that returns the potential energy of the system as a function of the coordinates of the dof. The energy function usually used in biomolecular modelling is described in Section 19.3.1.

Third, one has to determine the spatial environment of the model (Figure 19.1). A biomolecule can, for instance, be modelled in vacuum. In most cases this is unrealistic because biomolecules are typically embedded in an aqueous or membrane‐like environment and do not preserve their native structure and behaviour in vacuum. Considering that in vacuum there is no dielectric screening of charges, salt bridges and hydrogen bonds are much stronger than in aqueous solution and hence the structure of a biomolecule can deviate significantly from the solvated structure. A biomolecule is more realistically modelled in solution. The solution phase is typically represented by a dielectric continuum of infinite extent (implicit solvation; Section Implicit Solvation) or as a multitude of individual solvent molecules, each modelled in terms of individual atoms or groups of atoms (explicit solvation). The explicit‐solvent environment may be represented as a system of finite extent (e.g. a droplet) or of infinite extent, as a periodically replicated array of computational unit boxes, referred to as periodic boundary conditions (PBC). A microscopically small droplet (containing, e.g., 10⁵ solvent molecules) gives a poor representation of the true environment of biomolecules in solution because the biomolecule is affected by the presence of the close‐by droplet surface. Therefore, biomolecules are typically modelled in a solvent that is represented implicitly or explicitly under PBC.

Image described by caption. — **Figure 19.1** Spatial environment of biomolecular models, illustrated for a protein depicted in cartoon representation. Biomolecules may be modelled (a) in vacuum, (b) using an implicit solvent of infinite extent, (c) using explicit solvent molecules in a droplet of finite size, or (d) using explicit solvent molecules under PBC.

Lastly, one has to decide how to generate coordinates (configurations) of the model. If one is interested in a static structure, one may obtain coordinates of a biomolecule from a database (e.g. protein databank [23]) or via comparative [ 20, 21] or ab initio [22] modelling. If one is interested in all configurations of a molecule that are possible at a given temperature, one can sample the entirety of possible configurations with a Monte Carlo algorithm [24–27]. This algorithm samples the configurational space of a molecule, such that each configuration occurs with a probability equal to its Boltzmann weight,

19.1

where β = (k _B T)⁻¹, k _B being Boltzmann's constant and T the absolute temperature, U(r ^N) the potential energy of the system as a function of the coordinates of all N particles r ^N and the term in the denominator (configurational integral) is a normalisation constant. A Monte Carlo algorithm to generate biomolecular configurations proceeds as follows. One starts with a molecule with coordinates r ^N,old. A random new configuration is explored, e.g. a randomly chosen dihedral angle is rotated by a certain amount. Thus, new coordinates r ^N,new are obtained and one computes the associated energy change ΔU = U(r ^N,new) − U(r ^N,old). The new configuration is accepted with probability

19.2

which means that it is immediately accepted if ΔU < 0 and accepted only with a certain probability, namely p _acc, if ΔU > 0. This acceptance criterion is called the Metropolis algorithm [27] and it can be shown that it generates configurations according to their Boltzmann probabilities. The entirety of such configurations is called a Boltzmann‐weighted ensemble. Concerning biomolecules, the Monte Carlo/Metropolis algorithm is often used in ab initio protein structure prediction methods [22] because it allows one to efficiently probe possible configurations. A disadvantage of the algorithm is that the obtained configurations are not related to each other via a dynamic sequence, i.e. one does not obtain a movie‐like depiction of the time behaviour of the molecule. MD simulation [25,28], in contrast to mere configurational sampling such as performed by a Monte Carlo algorithm, generates configurations in a time‐dependent manner. The dynamics of the system is propagated via an equation of motion (eom) that describes how the new coordinates r ^N(t + Δt) depend on the old coordinates a small time increment Δt earlier, r ^N(t), where t denotes time. In practice, this propagation is done with a numerical integrator (Section Integration of the Equation of Motion). The generated ensemble of configurations is Boltzmann‐weighted, just as in Monte Carlo sampling. Thus, Monte Carlo sampling performs ‘a random walk’ through configuration space whereas MD simulation delivers a trajectory in phase space.

The phase space of the system is the entirety of possible coordinates and velocities of its dof. In statistical mechanics, one set of coordinates and velocities is called a microstate and the ensemble of all microstates compatible with given thermodynamic conditions (pressure, temperature) is called a macrostate. The properties of all these configurations may be used to extract thermodynamic information of the system via statistical mechanics (Section 19.2.2). By virtue of involving particle velocities, i.e. dynamic information, MD simulation, in contrast to Monte Carlo sampling, does not only allow the investigation of thermodynamic, but also of dynamic system properties.

19.2.2 Statistical Mechanics: Partition Function and Ensemble Average

Statistical mechanics [17] is the key to the interpretation of the results of an MD simulation [29]. It establishes a relation between the configurations (microstates) sampled in the simulation and macroscopic thermodynamic properties under given thermodynamic conditions. The latter refer to which state variables are kept constant and characterise the ensemble of microstates. For instance, in the canonical ensemble (NVT), microstates are compatible with a constant number of particles (N), volume (V) and temperature (T). The alternate variables, chemical potential, pressure and energy fluctuate. Other ensembles are the microcanonical ensemble (NVE), which pertains to constant number of particles, volume and energy. Here, chemical potential, pressure and temperature fluctuate. In the grandcanonical ensemble (μVT), the chemical potential, volume and temperature are constant and the number of particles, pressure and energy fluctuate. Of most practical relevance is the isothermal‐isobaric ensemble (NPT) featuring constant number of particles, pressure and temperature and fluctuating chemical potential, volume and energy.

For simplicity, the following discussion refers to the canonical ensemble. The Boltzmann probabilities in this ensemble are

19.3

where h is Planck's constant, p ^N denotes the momenta associated with all dof, H(r ^N, p ^N) is the Hamiltonian of the system (the sum of kinetic and potential energy) and Q is the partition function for the canonical ensemble:

19.4

It involves an integral over all possible positions and momenta, i.e. over the whole phase space and hence cannot be calculated for large systems such as a biomolecule. However, ratios of partition functions are readily accessible via MD simulation (Section Free‐Energy Calculations).

The partition function is an important quantity in statistical mechanics because it allows the calculation of macroscopic observables A as ensemble averages <A> of the property A(r ^N, p ^N) over the entire phase space,

19.5

This equation expresses the link statistical mechanics provides between the properties of microstates (A(r ^N, p ^N)) and the corresponding macroscopic, experimentally measurable, observable. In an MD simulation, however, time averages are calculated,

19.6

where N _f is the number of simulation frames and t _i the time of the ith frame. This is a simple arithmetic average over the properties A(t _i) at each simulation frame, but is valid because a thermostatted (Section Thermo‐ and Barostatting) MD simulation already samples configurations (r ^N, p ^N) with the correct Boltzmann probabilities for the NVT ensemble. The ergodic hypothesis [30,31] states that in the limit of infinite sampling, i.e. an infinite number of simulation frames N _f, the ensemble average (Eq. (19.5)) is equivalent to a time average (Eq. (19.6) with N _f → ∞).

19.2.3 Classical Mechanics: Equations of Motion

MD simulations deliver a ‘movie’ of the simulated system by tracing the time evolution of the system according to an equation of motion (eom). An eom is a relation between particle positions, velocities and accelerations. Newton's eom, which is valid in Cartesian coordinates, is [ 18, 19]

19.7

where f ^N is a vector containing the forces on all N particles. In Eq. (19.7), it was assumed that the potential energy is not explicitly time‐dependent, i.e. the system is conservative, and that there are no velocity‐dependent forces, i.e. there is no friction. Equation ( 19.7) is a second‐order differential equation, which becomes apparent when writing the momentum–time derivative in terms of positions. To ease numerical integration in practice, usually a formalism involving two first‐order partial differential equations is used,

19.8

Here, subscripts i indicate properties referring to particle i. Integration of the eom, together with initial conditions for particle positions and momenta, yields a deterministic evolution of the particle positions and momenta over time. The corresponding series of sampled particle positions and momenta is called a trajectory. Throughout this trajectory, the total energy H(r ^N,p ^N) = K(p ^N) + U(r ^N) is conserved. The kinetic energy is a function of momenta only,

19.9

where m _i and v _i are mass and velocity, respectively, of particle i, and the potential energy only depends on particle positions.

Newton's eom does not contain friction. If such non‐conservative forces are to be included in the time evolution of the system, e.g. when certain dof are not represented explicitly but implicitly via the effect of their mean fluctuations, one can use Langevin's eom [32,33],

19.10

The force f(r _i) on particle i thus not only involves a conservative component but also a random and a dissipative one, f _i ^R and −m _i γ v _i, respectively, the magnitude of the latter being proportional to a friction coefficient γ. The random force is delta‐correlated, has zero mean and its variance is given by

19.11

where indices i and j denote particles, α and β denote Cartesian coordinate components, t and t′ denote time points, δ _ij is the Kronecker‐delta and δ(x) the Dirac‐delta function. The magnitude of the variance implies that the two non‐conservative forces are kept in balance by the fluctuation‐dissipation theorem [33].

Newton's eom is a special case of Lagrange's eom [ 18,34], when Cartesian coordinates are used. Lagrange's eom is particularly suited when holonomic (time‐independent) constraints on certain coordinates are to be considered in the time evolution of the system or when additional virtual coordinates (dof) are added to the system to generate an extended system. It is formulated in terms of generalised coordinates q ^N and conjugate momenta ,

19.12

where the Lagrangian L(q ^N, ) of the system is defined as the difference of kinetic and potential energy, the former of which now explicitly depends on the generalised coordinates,

19.13

19.3 Methodology

19.3.1 Molecular Mechanics: Biomolecular Force Fields

Biomolecules are commonly modelled at atomistic resolution using a classical energy function (force field) to describe the interaction between the individual atoms [35–38]. Sometimes, CH, CH₂ or CH₃ groups are modelled as single particles called united atoms. This reduces computational cost because fewer particles are involved and consequently fewer interactions have to be computed. Such a process is called ‘coarse‐graining’. Many ‘coarse‐grained’ force fields [39–42] exist where the elementary particles are even larger, e.g. entire amino acids, or (a number of) solvent molecules.

A classical force‐field description of the system implies several approximations. First, the Hamiltonian is considered time independent and the system is considered to be in the electronic ground state. Second, the Born–Oppenheimer approximation is used, which means that electrons relax instantaneously to the positions of the nuclei and in turn nuclear positions appear stationary to the electrons. Finally, at the core of the classical description is the assumption that the temperature is sufficiently high and that the modelled elementary particles are sufficiently heavy for their motion to be described by Newton's eom. Proton motion and high frequency bond‐vibrational motions are not adequately described by a classical force field.

Molecular mechanics describes the potential energy of a molecule with a classical interaction function that is a sum of individual covalent and non‐covalent energy terms. Covalent terms describe bond stretching, angle bending, torsional rotations around bonds (dihedral angles) and out‐of‐plane geometrical restraints. Non‐covalent terms describe non‐polar and electrostatic interactions. The entirety of the functional form of the individual terms and the involved empirical parameters is called a force field [ 35– 38] (Figure 19.2). Besides the above assumptions, the concept of a molecular‐mechanics force field also implies that the total potential energy of the system is captured by a sum of the individual terms. In the following, the physical covalent and non‐covalent interaction terms characterising an atomistically modelled molecule are explained. It is also possible to include non‐physical, artificial terms in a force field, e.g. to generate forces pushing apart or pulling together two atoms to achieve a user‐defined interatomic separation (Section Experiment‐Based Restraints).

Structures illustrating the physical covalent interaction indicating the bond stretching, angle bending, torsion (dihedral angle), and out-of-plane geometry in angle, height, and improper dihedral angle. — **Figure 19.2** Physical covalent interaction terms in a molecular‐mechanics force field.

Consider two atoms m, n with positions r _m, r _n connected by a covalent bond. The energy associated with stretching or compressing the bond with respect to the reference bond length b ₀ such that the actual bond length is b(r _m, r _n) is usually modelled as

19.14

where k ^b is a force constant. This is a harmonic potential, the first non‐zero term in a Taylor expansion of the energy around the minimum‐energy value b ₀. The first‐order term disappears because the force vanishes when the bond length is exactly b ₀. In principle, a harmonic functional form is sufficient to describe the vibrational behaviour of covalent bonds at equilibrium. Covalent bonds are relatively hard dof and only small deviations from the minimum‐energy value are possible at room temperature, e.g. around 0.006 nm for a bond between two sp³‐hybridised carbon atoms. One may still keep more terms beyond second‐order in the above Taylor expansion. This may be advantageous because anharmonicities are accounted for and hence the real vibrational geometries of a bond may be better reproduced, but the energy calculation is somewhat more computationally intensive and the parameterization is much more cumbersome. Note that Eq. (19.14) cannot describe bond breaking because infinite bond lengths are associated with an infinite energy. Bond breaking events can be modelled classically with other potentials [43] or quantum‐mechanically, e.g. with the quantum‐mechanical/molecular mechanical (QM/MM) method [44–46], where the reactive part of the system is treated with quantum mechanics and the rest with classical mechanics.

Now, consider three atoms l, m, n with positions r _l, r _m, r _n connected by two covalent bonds, i.e. forming an angle. The energy associated with bending the angle with respect to the reference angle θ ₀ such that the actual angle is θ(r _l, r _m, r _n), is usually modelled as

19.15

i.e. with a harmonic functional form and force constant k ^a. Akin to bond lengths, angles are relatively hard dof, but still somewhat softer than bonds. For example, the angle between three sp³‐hybridised carbon atoms can be bent up to around ±10° at room temperature.

Now, consider four atoms k, l, m, n with positions r _k, r _l, r _m, r _n connected by three covalent bonds, i.e. forming two intersecting planes (defined by the first three and last three atoms, respectively). The angle enclosed by the two planes is called a dihedral angle. It may also be envisioned as the torsional rotation of the outermost atoms around the central bond. The energy associated with a torsion is usually modelled as a cosine series,

19.16

where j is the multiplicity (number of minima over a rotation of 360°), δ is the phase shift of the cosine terms and the force constants k _j ^t, defined for each term j, determine the barrier height of the rotational potential. For instance, for ethane, HCH₂CH₂H, a torsion around the carbon–carbon bond is possible. The multiplicity is three, because there are three possible minima in the staggered geometry. To realistically model the situation in ethane, only the j = 3 term is non‐zero, i.e. the force constants pertaining to j = 1 and 2 vanish and the phase shift δ also vanishes. In contrast to bonds and angles, torsions are soft dof. In general, the whole range of dihedral angles from −180 to 180° can be sampled, except for special situations like double or peptide bonds.

Special geometric situations in a molecule may require the use of so‐called out‐of‐plane geometrical terms. These terms always involve four atoms and enforce planar, tetrahedral or chiral geometries. Consider, for example, the molecule acetone where atoms k, l with positions r _k, r _l, respectively, represent the carbonyl carbon and oxygen atoms, respectively, and atoms m, n with positions r _m, r _n, respectively, represent the two attached methyl carbon atoms. These four atoms are supposed to lie within a plane which can be achieved with a harmonic potential around a reference value χ ₀ for the dof χ describing the out‐of‐plane geometry,

19.17

where k ^oop is a force constant. The actual value χ(r _k, r _l, r _m, r _n) for the out‐of‐plane dof can be defined by many options. In the acetone example, one option is to use the angle formed by the carbonyl carbon atom, the oxygen atom and the projection of the carbonyl carbon atom in the plane formed by the three other atoms. Other options are to use the height of the carbonyl carbon atom above the latter plane, or the dihedral angle formed by the carbonyl carbon atom, the oxygen atom and the two other carbon atoms. Since this is not a genuine rotational torsion, such dihedral angles are denoted improper dihedral angles.

In summary, the contribution of covalent interactions to the potential energy of a molecule is

19.18

where sums over all bonds, angles, torsions and out‐of‐plane geometries in the system are performed. The parameters required in the covalent force‐field terms except for torsions are usually obtained from X‐ray crystallography (reference values) and IR spectroscopy (force constants). Parameters for the description of torsions are usually obtained from QM calculations.

The non‐polar contribution to non‐covalent interactions are van der Waals interactions [47]. They are composed of two contributions, an attractive and a repulsive one. The physical basis of the former are London dispersion forces [48]. These forces arise from fluctuations in the electron cloud of an atom and the corresponding instantaneous dipoles that can induce opposing dipoles in the electron clouds of atoms in the close neighbourhood. They are weak and of a short‐range nature. Since the electric field due to the inducing dipole varies as 1/r ³ and the dipole–dipole interaction varies as 1/r ³, where r is the interdipole distance, the attractive interaction decays as 1/r ⁶. The second, repulsive component accounts for the Pauli exclusion principle [49,50]. It has, in principle, an exponential distance dependence, but for computational convenience it is usually modelled with a power law. In the Lennard–Jones potential [51], an inverse‐twelve power is used because the twelfth power of the interatomic distance can be easily calculated from the sixth power occurring in the attractive term,

19.19

where C _12,mn and C _6,mn are constants specific to the interaction between the considered atom pair m, n and r _mn is the minimum‐image between the two atoms. The minimum‐image distance, under PBC, is the minimum distance between two atoms considering the presence of periodic system copies. Sometimes, the Lennard–Jones potential is written in terms of the constants ε _mn and σ _mn, which describe the well depth and zero of the potential, respectively (Eq. (19.19) and Figure 19.3). To ease parameterisation effort of the Lennard–Jones potential, usually the parameters are specific to certain atom types, for example sp³‐hybridised carbon atoms, carbonyl oxygen atoms, hydroxyl oxygen atoms, etc. In addition, combination rules [52] are used to deduce parameters for heteroatomic interactions based on homoatomic interaction parameters, i.e.

19.20

Image described by caption and surrounding text. — **Figure 19.3** The Lennard–Jones potential. Well depth ε _mn and zero σ _mn are highlighted.

where the function f is, for example, a geometric mean [53]. For biomolecular force fields, the parameterisation is usually based on optimisation with respect to thermodynamic condensed‐phase properties such as heats of vaporisation, densities and solvation free energies.

Electrostatic interactions between atoms m, n with partial charges q _m and q _n, respectively, are in principle given by Coulomb's law [54,55],

19.21

where r _mn is again the minimum‐image between the two atoms and ε ₀ the vacuum permittivity. Usually, the partial charges are parameterised for small groups; e.g. a hydroxyl group in a serine has the same partial charges as the hydroxyl group in a threonine. They are either obtained from QM calculations or optimization with respect to thermodynamic condensed‐phase properties such as heats of vaporisation, densities and solvation free energies. In practice, Coulomb's law is not used for simulations with explicit solvent under PBC because it is computationally intensive and gives rise to artefacts when used with a plain nearest‐image convention [56]. Other approaches commonly used instead are explained in Section Lattice‐Summed Electrostatic Interactions. Note also that the above formalism with fixed partial charges does not account for the effect a change in environment might have on the electron cloud of an atom. Such effects are better reproduced by so‐called polarisable force fields [57–61].

In summary, the contribution of non‐covalent interactions to the potential energy of a molecule is

19.22

where sums over all atom pairs are performed. Note the factors (1/2) to eliminate double‐counting of atom pairs.

There is a plethora of biomolecular force fields [ 35– 38]. Currently, the most common ones pertain to the AMBER (e.g. versions ff03 [62], ff14SB [63]), CHARMM (e.g. versions c27 [64], c36 [65]) and GROMOS (e.g. versions 54A7 [66], 54A8 [67]) families. Biomolecular force‐field development is in general a complex task because a balance has to be found between computational simplicity and as accurate as possible a description of the energetic and entropic effects governing the behaviour of the molecules [14]. Since free energies of interest are generally not above the order of 10 k _B T, and since they result from a summation over around 10⁷–10⁹ atom pairs, it follows that the individual force‐field terms have to be highly accurate to deliver simulation results of chemical accuracy (error ≤ 1 kJ/mol).

19.3.2 Molecular Dynamics Simulation of Biomolecules

19.3.2.1 Common Algorithms

19 Integration of the Equation of Motion

Typically, in a biomolecular MD simulation employing a classical molecular‐mechanics force field as outlined above, Newton's eom (Eq. (19.8)) is used to describe the dynamics of the particles. The two first‐order differential equations (Eq. (19.9)) are integrated numerically. Since Newton's eom is time‐reversible, a numerical integration algorithm should also be time‐reversible. The algorithm should also yield as accurate an integration as possible, which is not granted by default because the dynamics is propagated via finite time steps Δt and hence is approximate. The most commonly used integrators are the (Velocity‐)Verlet [68,69] and leapfrog integration algorithms [70]. Here, only the Velocity‐Verlet algorithm is explained. It involves a Taylor expansion of positions at time t + Δt based on the positions at the previous time step,

19.23

truncated after the second‐order term, i.e. there is a truncation error of third order. Here, v _i and a _i indicate velocities and accelerations of particle i, respectively. The velocities are not propagated as accurately as positions, namely only to first order and using a mid‐point approximation for the acceleration,

19.24

Despite the third‐order truncation error in positions, the Velocity‐Verlet algorithm is a second‐order integrator because the global error in positions, i.e. the cumulative error over many iterations, can be shown to be of order two [71]. An important beneficial property of this integrator is that it is symplectic. Symplectic integrators do not lead to systematic energy drifts over many iterations because they reproduce the exact dynamics for a slightly different (‘shadow’) Hamiltonian [72]. An overview of integration algorithms for MD simulations is given in reference [73].

19 Thermo‐ and Barostatting

By default, Newton's eom (Eq. ( 19.8)) delivers an ensemble at constant energy and volume. Typically, when simulating the behaviour of a biomolecule, one seeks conditions close to the experimental situation, i.e. constant temperature and constant pressure. This can be done by application of thermo‐ and barostatting algorithms. One can distinguish weak‐coupling, stochastic and extended‐system algorithms. Here, only one thermostatting algorithm of each kind and a weak‐coupling barostatting algorithm are presented, which adequately covers the most commonly employed approaches.

The definition of temperature in MD simulation is based on the equipartition theorem [17], which states that each dof is associated with a kinetic energy of half the thermal energy. Considering that there are N _df dof in the system, the average kinetic energy K is

19.25

Applying this equation at each instantaneous time step in an MD simulation in combination with the instantaneous kinetic energy expressed as a function of particle masses and velocities (Eq. ( 19.9)), the instantaneous temperature can be calculated.

The weak‐coupling thermostat [74] (also called the Berendsen thermostat) relies on the scaling of particle velocities v _i(t) with a time‐dependent scaling factor μ _T(t), to deliver scaled velocities v _i,scl (t) satisfying a kinetic energy corresponding to the target temperature T ₀,

19.26

It is defined that the temperature should approach the target temperature exponentially according to a relaxation constant τ _T,

19.27

With Eqs. ( 19.9) and (19.26), the temperature corresponding to the scaled velocities becomes

19.28

and the velocity scaling factor is

19.29

where it was assumed that Δt → 0. This algorithm is simple to apply but has the flaw of not producing an exact canonical distribution of states [75]; e.g. energy fluctuations are incorrect.

An example for a stochastic thermostat is to use Langevin's eom (Eq. (19.10)). Constant temperature is here achieved by stochastic collisions implied by the random force and the counteracting effect of friction, the exact balance being ascertained by the fluctuation‐dissipation theorem (Eq. (19.11)). Stochastic thermostats are naturally employed in implicit‐solvent simulations (Section Implicit Solvation). Due to the presence of random forces, they are not suited for the investigation of dynamic properties.

The extended‐system thermostat (also called the Nosé–Hoover thermostat) is based on the addition of a virtual ‘heat bath’‐like dof s (with an associated mass Q) to the system [76,77]. The entire system containing both physical, real particles and the virtual dof is the extended system. Its Lagrangian reads [ 76, 77]

19.30

The first two terms form the physical Lagrangian corresponding to the real particles in the system. The third and fourth terms are the kinetic energy and minus the potential energy of s, respectively. The key of the method lies in the fact that the form of the potential energy and the parameter g are chosen such that a canonical ensemble for the physical subsystem is generated while a microcanonical ensemble for the extended system is retained. It was shown [ 76, 77] that this is achieved with the choice g = 3N + 1, where N is the number of particles in the physical system. During the extended‐system simulation, the virtual dof s is propagated along with the physical particles. According to the fluctuation of s, heat flows in or out of the physical subsystem such that a constant temperature is maintained. In practice, the Nosé–Hoover thermostat was shown to have deficiencies for small and stiff systems or far from equilibrium (i.e. the target temperature). In the former situation, it is better to use the so‐called Nosé–Hoover chain algorithm [78], which involves successive thermostatting of virtual dof by multiple, rather than only one, virtual variables. An overview of thermostatting algorithms for MD simulations is given in reference [79].

The definition of pressure in MD simulation is based on the total virial W _tot,

19.31

which consists of the internal virial W and an energy contribution due to system confinement (external virial). The virial theorem [80] states equality of kinetic energy K and W _tot; hence

19.32

19.33

a sum of dot products of particle positions and forces acting on the particles, the instantaneous pressure can be calculated. In practice, under PBC, Eq. (19.33) is not used, but W(t) is reformulated in terms of interparticle distances and pairwise forces [29].

The weak‐coupling barostat, in analogy to the weak‐coupling thermostat, relies on the scaling of particle positions and system dimensions (usually, box‐edge lengths L) with a time‐dependent scaling factor μ _P(t) to deliver scaled quantities,

19.34

satisfying the target pressure. It is defined that the pressure should approach the target pressure exponentially according to a relaxation constant τ _P,

19.35

With Eq. ( 19.9) and the definition of isothermal compressibility κ _T, the pressure corresponding to scaled quantities becomes

19.36

and the scaling factor is

19.37

where it was assumed that Δt → 0. This algorithm is simple to apply but, similar to the weak‐coupling thermostat, has the flaw of not producing a correct isobaric ensemble; e.g. volume fluctuations are incorrect. Other barostatting algorithms are available, e.g. the Parrinello–Rahman barostat [81].

19 Bond‐Length Constraints

Bond‐length vibrations in molecules occur with frequencies larger than about 1000 cm⁻¹. This is much larger than k _B T/h at room temperature. Hence, bond vibrations are not excited at room temperature and are not adequately described by a classical harmonic oscillator (Eq. ( 19.14)). A remedy is to freeze the bond‐length vibrations, i.e. to model bond lengths as rigid rods with lengths b ₀, an approach referred to as bond‐length constraining. Constraining a dof amounts to eliminating this dof because there is no potential or kinetic energy associated with it. Bond‐length constraints have the advantage of increased computational efficiency. Since the maximal possible time step (to still have an accurate integration of the eom) in the numerical integrator is about a tenth of the smallest vibrational period and since bond‐length vibrations are typically the fastest vibrational motions in the system, constraining bond lengths allows the time step to be increased (usually from 0.5 to 2 fs).

A common algorithm to constrain bond lengths is the SHAKE algorithm [82]. Consider a molecule with N _c bond‐length constraints σ _k for bonds between atoms k ₁ and k ₂ with positions r _k1, r _k2, respectively,

19.38

These constraints are accounted for via N _c Lagrange multipliers λ _k in the eom (Eq. ( 19.8)),

19.39

The Lagrange multipliers need to be known to determine the constraint forces required to enforce fixed bond lengths. Based on Eqs. (19.38) and (19.39), a matrix equation Aλ = c can be derived, where the λ vector contains all the Lagrange multipliers and terms quadratic in the Lagrange multipliers were discarded [83]. Inversion of A solves for the λ vector, but this is not commonly done because matrix inversion is computationally expensive and the solution is not exact because quadratic terms were discarded. Only for very small molecules such as water (involving only three constraints), is an iterative procedure of matrix inversions used to solve for the Lagrange multipliers [83]. For larger molecules, instead of trying to simultaneously fulfil all constraints, one assumes decoupled constraints, i.e. that two atoms involved in constraint σ _k are not involved in any other constraint σ _k′. This means that one considers constraints between atoms k ₁ and k ₂ at a time and solves N _c decoupled equations for the corresponding λ _k(t) values. The resetting of particle positions according to the resulting constraint forces (again obtained from neglecting quadratic λ _k terms) leads to constrained positions (here, in the context of a Verlet integrator and with r _k1k2 = r _k1 − r _k2):

19.40

where superscripts ‘c’ or ‘uc’ on atom position vectors denote constrained and unconstrained positions, respectively. Thus, atoms k ₁ and k ₂ are reset along the direction of the old bond vector, in opposite directions, and proportional to the inverse mass. Because of the decoupling of the constraints, the procedure has to be repeated until all constraints are satisfied. In the case of very large forces, which cause large free‐flight steps of certain atoms, it might not be possible to fulfil bond‐length constraints via shifting along the old bond‐vector direction. In such a situation, the SHAKE algorithm fails.

It was stated above that angles in biomolecules are only slightly softer than bonds. Nevertheless, angles are never constrained in MD simulations because angle constraints were seen to cause artefacts in dynamics, e.g. fewer dihedral angle transitions [84] because the angles in a dihedral angle have to slightly open up for torsional rotation to occur in a normal fashion. In addition, mass‐metric tensor effects associated with angle constraints are not negligible [85]. This means that removal of the kinetic and potential energy associated with an angle dof significantly perturbs the probabilities of configurations in the constrained system.

19 Lattice‐Summed Electrostatic Interactions

There are numerous ways to calculate electrostatic interactions in a simulation. The reason is that the plain Coulomb potential given by Eq. (19.21) is not applicable in an MD simulation with PBC. It would be required to introduce a nearest‐image convention (i.e. each particle only interacts with the nearest neighbour of each other particle), but this leads to artefacts and is computationally expensive [56]. A common remedy is to truncate electrostatic interactions, possibly in combination with shifting or switching schemes, which make the electrostatic interaction between two particles smoothly vanish at the truncation distance [86,87]. A physically motivated shifting scheme is to include a reaction‐field correction that models the environment outside the cut‐off sphere of each particle as a dielectric continuum [88]. A commonly used electrostatic interaction scheme not involving a cut‐off truncation and constituting a correct calculation of Coulombic electrostatic interactions, albeit under PBC, is lattice summation, where one considers the Coulomb energy of an infinitely periodically replicated cubic unit cell of edge L containing N point charges q _i located at positions r _i,

19.41

where n = (n ₁, n ₂, n ₃), n _i being integers, are lattice vectors,

19.42

and the primed sum implies that i = j terms are omitted when n = (0, 0, 0). The problem is that, in general, the 1/r‐sum over unit cells in Eq. (19.41) is not absolutely convergent. Ewald introduced a splitting function f(r) for the 1/r potential [89],

19.43

which allows to solve for the electrostatic potential (Figure 19.4). Therefore, the original method to calculate electrostatic interactions via a lattice sum is also called Ewald summation.

Expressing the electrostatic potential Φ(r) in the above periodic system as a solution of the Poisson equation and using Ewald's splitting‐function approach,

19.44

where a periodic ‘screening’ charge density

19.45

depending on so‐called charge‐shaping functions ψ(r) was used to split the original periodic charge density

19.46

where δ(x) is the three‐dimensional Dirac‐delta function. Since the Laplace operator is linear, the total electrostatic potential in Eq. (19.44) can be split as well, i.e. Φ = Φ_R + Φ_K, with

19.47

As will be seen below, the potential Φ_R is conveniently evaluated via summation in real space. The potential Φ_K is due to a periodic charge distribution and is conveniently evaluated in reciprocal space (Figure 19.4).

In the original Ewald method, the charge‐shaping function is a Gaussian γ(r) of width (2α)^−1/2,

19.48

resulting in [25]

19.49

where the definition of the complementary error function (erfc(x)) was used. Since the latter decays exponentially, the sum over n may be restricted to n = (0, 0, 0) and truncation is possible at a certain cut‐off distance (called real‐space cut‐off). The energy associated with Φ_R is obtained from multiplying the charges with Φ_R at the charge locations,

19.50

The reciprocal‐space potential is long‐range, smooth and periodic in r. It can be expressed as a Fourier series with Fourier coefficients ,

19.51

where k are reciprocal‐space vectors, k = 2πn /L. Plugging the reciprocal‐space potential (Eq. (19.51)) and the screening‐charge distribution (Eq. (19.45)) into the Poisson equation (Eq. (19.47)) leads, for the above Gaussian, to [25]

19.52

from which one sees that k must not be zero (see below). The energy associated with Φ_K is obtained from multiplying the charges with Φ_K at the charge locations,

19.53

The problem with Eq. (19.53) is that the double sum contains the interaction of a continuous Gaussian charge distribution of charge q _i with a point charge q _i at its centre. This overcounts the electrostatic energy. To remove the spurious contribution, one considers the electrostatic potential ϕ(r) due to a spherically symmetric Gaussian charge cloud and the resulting value ϕ(r = 0) at its centre [25], giving the self‐energy

19.54

Furthermore, the k = 0 term in the reciprocal‐space sum (Eq. (19.52)) was omitted. This leads to the average of the reciprocal‐space potential being zero. However, the average of the real‐space potential is non‐vanishing. One hence adds a constant to fulfil the boundary condition of vanishing total average electrostatic potential, <Φ(r) > = 0 (referred to as the Ewald convention). To fulfil this convention, one evaluates the real‐space potential average over the unit cell,

19.55

The interaction of all charges with this potential is removed in the total energy. Note that the average electrostatic potential in Eq. (19.55) is non‐zero only for a non‐neutral box. Therefore, correcting for it via the Ewald convention may also be interpreted as introducing a homogeneous neutralising background charge that sets the average potential to zero.

The total lattice‐sum energy is the sum of the above four contributions,

19.56

In practice, in Ewald summation, both real‐ and reciprocal‐space sum are truncated. The parameter α is chosen such that contributions to the real‐space sum are negligible for terms separated by a distance larger than the real‐space cut‐off. Note that a large α entails a narrow Gaussian and a short‐range real‐space sum. Then the computational cost for the real‐space calculation is O(N), but the reciprocal‐space sum needs more terms to converge. In general, the reciprocal‐space sum scales as O(N ²), which is very inefficient for large systems. It has been shown that an optimal work balance between the two sums can be reached by cleverly adjusting α and the real‐space cut‐off, leading effectively to an O(N ^3/2) scaling of Ewald summation.

Nowadays, in practice, numerical implementation of lattice summation is used. The so‐called particle–mesh approaches exploit the fact that the Poisson equation can be solved efficiently on a grid using fast Fourier transforms (FFT). FFT scale as N log(N), N being the number of grid points in each dimension. Common particle–mesh approaches are particle‐mesh Ewald (PME) [90] or particle–particle particle–mesh Ewald (P3M) [91]. They rely on the same charge‐splitting idea as Ewald summation, but instead of directly calculating the Fourier transformation of the charge distribution, one maps the charges on a regular grid. FFT delivers the electrostatic potential on the grid points, which is transformed back to real‐space and interpolated to give the electrostatic potential at the actual charge positions.

Lattice summation is a widely used method to calculate electrostatic interactions. One has to keep in mind, however, that the resulting electrostatic interactions are periodic, which may give rise to so‐called artificial‐periodicity artefacts [92–94]. These are especially pronounced for small box‐edge lengths and may significantly perturb the properties of the simulated system. On the other hand, the above‐mentioned alternative approach of truncation of electrostatic interactions may also introduce problems because electrostatic interactions are of a very long‐range nature. In contrast to the rapidly decaying van der Waals interactions (Section 19.3.1 ), they decay only with the inverse of the interparticle separation. Therefore, any electrostatic interaction calculation scheme used in MD simulation to date is effective rather than correct and the involved approximations and possibly incurring artefacts must be well understood [95].

19.3.2.2 Special Topics

19 Free‐Energy Calculations

Free‐energy calculation is an intricate task, because free energies are not simple ensemble averages (Eq. ( 19.5)) but depend explicitly on the partition function, i.e. the Helmholtz free energy is the thermodynamic potential in the NVT ensemble,

19.57

and, similarly, the Gibbs free energy G is the thermodynamic potential in the NPT ensemble. Hence, absolute free energies cannot be calculated for realistic biomolecular systems in an explicit solvent. However, ratios of partition functions, i.e. free‐energy differences, are readily accessible. For simplicity, the following discussion only refers to the Helmholtz free energy.

There are numerous methods to calculate free energies from MD simulation. Reviews are, for example, provided by references [96–100]. Here, only three popular methods in the context of biomolecular simulation will be presented, thermodynamic integration (TI), free‐energy perturbation (FEP) and potential of mean force (PMF) calculations.

Suppose one wants to calculate the free‐energy difference between two states 0 and 1, with Hamiltonians H ₀ and H ₁, respectively. TI relies on a Hamiltonian parameterised as a function of a coupling parameter 0 ≤ λ ≤ 1 expressing the change between the two Hamiltonians, such that λ = 0 corresponds to state 0 and λ = 1 to state 1,

19.58

where n is an integer, n ≥ 1. This approach was introduced by Kirkwood [101]. It relies on the fact that free energy is a state function and the free‐energy difference between two states 0 and 1 corresponds to the reversible work done along any user‐defined pathway between the two states. With the Hamiltonian of Eq. (19.58), the free energy can be written as a function of λ, A(λ) = −k _B T ln(Q(λ)), and it follows that

19.59

where the definition of statistical‐mechanical ensemble averages (Eq. ( 19.5)) was used in the last step. Equation (19.59) implies that the free‐energy derivative at a given value of λ can be calculated as the ensemble average of the derivative of the Hamiltonian with respect to the coupling parameter λ, sampled at the given value of λ:

19.60

Thus, a free‐energy change can be calculated by integrating over λ:

19.61

In practice, a force field as presented in Section 19.3.1 delivers analytical expressions for ∂H/∂λ and the integration in Eq. (19.61) is done numerically. An example for the integrand is shown in Figure 19.5. The precision of the TI approach depends on how smooth the TI curve is. In general, the precision of the numerical integral can be increased by increasing the number of λ points (i.e. simulations with different Hamiltonians pertaining to different values of λ) used for the interpolation between the two states, especially in regions of high curvature.

Graph illustrating the typical TI curve depicted by a descending curve with circle markers lying on it. — **Figure 19.5** Illustration of a typical TI curve (integrand in Eq. ( 19.61)).

In FEP, one considers two states 0 and 1 differing by a small modification in the Hamiltonian ΔH,

19.62

The partition functions of the two states, Q₁ and Q₀, are related by

19.63

where the second step involved multiplication with Q ₀ in the numerator and denominator and the notation <^{▾ ▾ ▾}>₀ indicates an ensemble average over configurations sampled according to the Hamiltonian H ₀. Hence, the free‐energy difference between the two states is

19.64

This is an exact perturbation formula and was derived by Zwanzig [102]. It allows the calculation of a free‐energy difference from a single simulation. The method is therefore called single‐step perturbation. The ensemble average in Eq. (19.64) is, however, only reasonably accurate when ΔH is sufficiently small so that the exponential is sufficiently large, allowing the accumulation of significant contributions to the average. In other words, the configurations sampled with the Hamiltonian H ₀ have to be representative of state 1 (i.e. have high Boltzmann factors also when evaluated with the Hamiltonian H ₁), which implies that there has to be non‐negligible phase‐space overlap between the two states (Figure 19.6a). If there is insufficient phase‐space overlap between states 0 and 1, an artificial state R may be constructed whose phase space overlaps with those of both state 0 and 1 (Figure 19.6b). The perturbation formula (Eq. ( 19.64)) can then be applied twice to the configurations sampled with Hamiltonian H _R of state R to give the desired free‐energy difference as ΔA ₀₁ = ΔA _R1 − ΔA _R0, where

19.65

For example, if the free‐energy difference of two stereoisomers is sought, the reference state can be an artificial molecule exempt of improper dihedral angle potential at the stereocenter, i.e. it can flip between the two stereoisomers. After the simulation of this molecule, the energy of the sampled configurations is then re‐evaluated with force fields including the improper dihedral angle potentials for both stereoisomers, from which the free‐energy differences in Eq. (19.65) can be calculated [103].

The two latter approaches (single‐step perturbation and free‐energy perturbation based on a reference state) are computationally efficient because only a single MD simulation is involved. If it is not possible to find an adequate state R to bridge the phase spaces of state 0 and 1, the perturbation formula may be applied multiple times (Figure 19.6c). This method is called multistep perturbation and relies on the construction of N – 2 intermediate states between states 0 and 1 such that the phase‐space overlap for two consecutive states i and i + 1 is sufficiently large to allow accurate evaluation of ΔA _i,i+1:

19.66

Again, the intermediate states can be artificial. Often, the Hamiltonian is parameterised by a coupling parameter 0 ≤ λ ≤ 1, similar to TI. For instance,

19.67

The change in the Hamiltonian between two consecutive states i and i + 1 is (λ _i+1 −λ _i) ΔH, so

19.68

In general, the perturbation formula (Eq. ( 19.64)) is widely applied not only to calculate free energies, but also to reevaluate any property (e.g. the number of hydrogen bonds between two residues) for a different Hamiltonian than the one that was used for sampling configurations or to eliminate biasing potential‐energy contributions that were employed during a simulation [103]. This process is referred to as ‘reweighting’ or ‘unbiasing’.

TI and FEP are sometimes called ‘alchemical’ free‐energy calculation methods because they allow changes to be performed between molecules that are not feasible in reality. For example, a serine in a protein can be mutated into an alanine along a path involving artificial side chains that are a mixture of an alanine and a serine by presenting a vanishing hydroxyl group and a methylene group slowly turning into a methyl group. Technically, this means that the charges and Lennard–Jones parameters of the hydroxyl group are slowly fading out and that the Lennard–Jones parameters of the methylene group are slowly turning into those of a methyl group. Similarly, covalent parameters may change. Note that the number of particles in an MD simulation has to stay constant. Therefore, vanishing particles in an alchemical TI or FEP calculation does not genuinely eliminate particles but always means removal of corresponding non‐bonded parameters. The resulting atoms exempt of charges and Lennard–Jones parameters are called dummy atoms.

In the context of biomolecular simulation, the fact that TI and FEP allow alchemical changes, together with the fact that the free energy is a state function (meaning that around a thermodynamic cycle, the free‐energy change is zero), opens the way for computationally efficient approaches to asses relative ligand binding free energies, e.g. the free‐energy difference of a substrate binding to two different mutants of an enzyme (Figure 19.7a) or the free‐energy difference of two different drugs binding to a receptor (Figure 19.7b), in a thermodynamic cycle. These free‐energy changes are the differences of two absolute binding free energies (ΔA _bnd1 and ΔA _bnd2; vertical arrows in Figure 19.7a and b), which can in principle be calculated with a PMF calculation (see below). However, they can also be obtained as differences in alchemical free energies (horizontal arrows in Figure 19.7a and b). In the case depicted in Figure 19.7a, there is only one alchemical free‐energy difference, ΔA _mut, of mutating the protein when bound to the ligand, so

19.69

In the case depicted in Figure 19.7b, the horizontal arrows of the thermodynamic cycle are associated with the free‐energy differences ΔA _mut,svt and ΔA _mut,prot of mutating the first ligand into the second ligand, once in solvent and once in the bound state, respectively, so

19.70

Absolute binding free energies can be obtained via TI based on a thermodynamic cycle involving so‐called double decoupling (Figure 19.7c), which means that the ligand is mutated into a non‐interacting dummy molecule (a molecule corresponding to the real ligand, but with charges and Lennard–Jones parameters set to zero) both in solvent (free‐energy change ΔA _dum,svt) and in the bound state (free‐energy change ΔA _dum,prot). The absolute binding free energy ΔA _bnd is

19.71

where it was used that the binding free energy of a dummy molecule vanishes. In practice, mutations to dummy particles involve a complication because the singularity in the Lennard–Jones potential (Eq. ( 19.19)) gives rise to a divergence of the TI integrand for λ‐values approaching the dummy particle (λ = 1, in the case described here). This effect can be remedied with so‐called soft‐core potentials [104,105].

Absolute ligand binding free energies can also be calculated from pathway methods such as PMF‐based free‐energy calculations. A PMF is the free energy of a system as a function of a reaction coordinate R(r ^N), in general a function of the coordinates of user‐defined particles in the system. These coordinates may be internal or Cartesian, the former possibly implying contributions of the Jacobian of the coordinate transformation in the final PMF [106]. To express the free energy as a function of the reaction coordinate, one has to consider the probability that a particular value R′ of the reaction coordinate is adopted, the so‐called reaction‐coordinate probability

19.72

where the Dirac‐delta function δ(x) in the numerator selects only those configurations satisfying value R′ of the reaction coordinate and the denominator serves as normalisation. This fraction can be written in terms of corresponding partition functions Q _R(R′) and Q. The free energy associated with value R′ of the reaction coordinate is

19.73

Taking the negative gradient on both sides of Eq. (19.73) with respect to the reaction‐coordinate dof (e.g. an interparticle distance), one sees that the negative gradient of the PMF corresponds to an average force of all non‐reaction coordinate dof on the reaction coordinate [17].

In general, there are 12 possible ways to calculate a PMF [106], arising from whether (i) the simulation involves no bias, a restraint bias (also called an umbrella bias; see below) or a constraint bias (e.g. constraining a distance between two particles with the SHAKE algorithm; Section Bond‐Length Constraints), (ii) the PMF derives from averaging of a force or from the observed probability of states and (iii) whether the reaction coordinate is defined in terms of internal or Cartesian coordinates. In practice, the most frequently chosen approach is to use an umbrella bias on an internal coordinate, along with construction of the PMF from the sampled probability of states. Applying a biasing potential leads to more efficient sampling than in unbiased sampling. For instance, if a PMF along the distance between two molecules such as a ligand and a receptor is sought, an unbiased simulation approach requires simulation of the two molecules for a sufficiently long time until all possible ligand–receptor distances are sampled with their true phase‐space probabilities. This is inefficient because if, for example, the binding pocket is buried, it might take extremely long for the ligand to reversibly diffuse in and out of the pocket a sufficient number of times to yield a statistically correct result. Therefore, a biasing potential of the form

19.74

may be used to enforce certain values of the reaction coordinate, here a distance R(r ^N). This is a special potential‐energy term that is added to the physical force field and it penalises distances R(r ^N) deviating from a target distance R ₀ according to a harmonic potential with a force constant k ^res. The corresponding restrained reaction‐coordinate probability is

19.75

and it can be shown [107] that

19.76

Using the relation between the PMF and the reaction‐coordinate probability (Eq. ( 19.73)), the unrestrained PMF is

19.77

This approach is widely used and involves multiple simulations with the umbrella restraint bias potential centred at various positions R ₀ of the reaction coordinate (often referred to as ‘windows’). The probability distributions obtained from the different simulations should overlap to allow reconstruction of the PMF, because in each window, the PMF is only defined within a constant (Eq. (19.77)) and the final PMF is a concatenation of the free energies pertaining to the different windows. The assembly of the total PMF from the raw, biased probability distributions may, for example, be done with the weighted histogram analysis method [108]. Note that, for a given window spacing, smaller force constants k ^res imply a larger histogram overlap, because the probability distributions are broader, but the disadvantage is that longer simulation times are required to converge these broad distributions. An example for the biased probability distributions and the resulting PMF is displayed in Figure 19.8. Since the PMF is only defined within a constant it can be anchored arbitrarily, for instance such that the bound state corresponds to zero free energy. A binding free energy can then be obtained from such a PMF by taking the negative of the exponential average over the unbound state, as, for example, described in reference [109].

PMF calculations, although possibly not trivial because the choice of reaction coordinate might not be unambiguous and an unfortunate choice may lead to poor sampling and inaccurate results, offer the advantage of beyond‐thermodynamic insight because they may reveal barriers along the reaction coordinate and thus convey kinetic information. Because of this, they are not only used for the study of binding processes, but, for example, also for the study of conformational changes. Similar to the TI or multistep FEP method, PMF calculations involve multiple simulations that can be run in parallel. Since free energies pertaining to biomolecular systems are in general not easy to converge (see below), such simulations are often conducted in a Hamiltonian replica exchange framework [110], which enhances configurational sampling (Section Enhanced Sampling).

19 Enhanced Sampling

The power of MD simulation compared to other modelling methods lies in producing a Boltzmann‐weighted ensemble of molecular configurations. This is important because one is generally not interested in a biomolecule at zero Kelvin, but under ambient conditions where an ensemble of structures, rather than a single static structure, is representative of the molecule. The configurational phase space of a biomolecule is generally characterised by a potential‐energy surface that is rugged due to the presence of a vast number (around 10⁴–10⁶) of dof differing in their motional behaviour (harmonic, anharmonic, diffusive) and their length‐ and timescales (0.1–10 nanometres, femtoseconds to milliseconds). Considering the exponential weighting in the Boltzmann factor of the configurations (Eq. (19.1)), it follows that mostly low energy regions of the phase space will contribute significantly to the ensemble. However, in the case of a very high number of high energy configurations, a significant contribution to low free‐energy regions of the phase space can come from these high energy configurations because their vast number implies a low entropy. In summary, sampling problems in MD simulation may arise from (i) the ruggedness of the potential‐energy surface and hence the difficulty of exhaustive sampling of low energy regions, especially when the barriers separating minima are larger than the thermal energy, and (ii) the difficulty of exhaustive sampling of possibly present high energy/low entropy states [14].

A plethora of so‐called enhanced sampling methods has been developed to ensure the adequate sampling of the important regions of phase space [111–114]. In biomolecular simulations, often Hamiltonian replica exchange MD (HREMD) simulations [110] are used. One performs not one, but multiple, simulations differing in their Hamiltonian, e.g. different λ‐values in a TI‐based free‐energy calculation or different anchors R ₀ of umbrella biasing potentials in a PMF‐based free‐energy calculation (Figure 19.9). The different simulations are called replicas and any one replica can profit from the sampling achieved at any other replica because, at distinct time intervals, exchanges between neighbouring replicas are attempted according to a Metropolis criterion (Section 19.2.2 ). The exchange probability w for configurations x _i ^N and x _j ^N (involving positions r _i ^N, r _j ^N and corresponding momenta) between replicas i and j is

19.78

where the energy difference ΔU is

19.79

and the potential energy U _i corresponds to the Hamiltonian of replica i. Such a simulation approach is only efficient if the exchange of configurations is global, i.e. occurs throughout all replicas such that, for example, the first replica can ultimately also access configurations pertaining to the last replica. In other words, ‘diffusion’ of the replicas through Hamiltonian space has to be exhaustive.

An alternative replica‐exchange simulation scheme is to make the replicas not differing in Hamiltonian, but temperature (temperature replica exchange MD; TREMD) [115]. Typically, the first replica corresponds to the system at room temperature T ₀ and successive replica simulations are performed at continuously increasing temperatures T _i = T ₀ + nΔT, where n is an integer and ΔT the temperature separation between successive replicas. The idea is that the room‐temperature simulation can also access high energy regions of phase space or other low energy regions that are separated by high energy barriers through the mixing‐in of configurations only accessible in the high‐temperature replicas. However, in practice, TREMD simulations using an explicit‐solvent representation have been found to be inefficient because the exchange criterion requires low potential‐energy differences between neighbouring replicas, which in turn requires close temperature spacing. The latter is because the potential‐energy differences tend to be large because they are dominated by the large amount of solvent dof [116]. A large number of replicas, however, increases computational effort because long simulation times are needed to allow thorough mixing of replicas throughout the whole temperature space. Second, it has been found that exchanges in TREMD are mostly due to fluctuations in the solvent [117] and hence do not bring much benefit concerning conformational sampling in the biomolecule. Due to these deficiencies, TREMD is usually only advantageous in the context of implicit‐solvent simulations.

Besides the above multireplica approaches, another strategy to enhance sampling is to eliminate the ruggedness of the potential energy surface by filling up the local minima with artificial, positive potential‐energy contributions. In the local‐elevation (LE) method [118], this is done by progressive addition of biasing potential‐energy terms in frequently visited low energy regions (Figure 19.10). Thus, reaccessing those areas is penalised. There are numerous enhanced‐sampling methods relying on essentially the same approach, e.g. conformational flooding [119] or metadynamics [120]. In detail, in the LE method, a subset of the dof of the system is chosen whose sampling has to be enhanced, e.g. a side chain dihedral angle. The conformational space of the dof is then discretised and during the simulation one keeps track of when a specific bin is visited and adds a penalising potential‐energy term to that bin, e.g. a Gaussian. Clearly, this amounts to a time‐dependent Hamiltonian, which is why no equilibrium‐thermodynamic information can be gained from a plain LE simulation. However, the resulting smoothened potential energy surface can be used in a second simulation for enhanced configurational sampling, the efficiency being limited only by how fast the ‘diffusion’ over it occurs. In practice, this is done by using the final built‐up penalising potential as an umbrella biasing potential [121]. Note, finally, that the negative of the final built‐up penalising potential was shown to approximately correspond to the free‐energy surface of the system [122].

Top: Chevron diagram with labels system stuck in local minimum, build-up of penalizing Gaussians, system explores new configurations, etc. Bottom: 4 Graphs each are displaying fluctuating waveforms. — **Figure 19.10** Illustration of the build‐up of Gaussian penalties in a local‐elevation simulation to cross energy barriers and explore new regions of phase space.

19 Experiment‐Based Restraints

The biasing potential‐energy terms mentioned above in the context of umbrella sampling in PMF calculations or in the LE method are non‐physical potential‐energy terms that are added to the physical force‐field terms to serve a special user‐defined purpose. They are therefore also called special force‐field terms. There are numerous other special force‐field terms, mostly used to restrain a certain dof to a desired target value. This is commonly done via a harmonic potential, e.g. for property A,

19.80

allowing small deviations from the target value A ₀ according to a force constant k. Restraints, as opposed to constraints (as encountered in Section Bond‐Length Constraints in the context of bond‐length constraints) are thus associated with a non‐vanishing kinetic and potential energy of the dof.

Special restraining potential‐energy terms are frequently used to generate configurational ensembles compatible with given experimental data, for instance Nuclear Overhauser Effect (NOE)‐based distance thresholds [123], ³J‐coupling constants [124] or order parameters [125] from nuclear magnetic resonance (NMR) experiments. Common to this data is that they are primary experimental outcomes, i.e. do not involve assumptions or approximations. Also, in these cases, rather than restraining instantaneous values A at each simulation time step, time averages over the elapsed simulation time are restrained,

19.81

to account for the fact that the experimental results are also an average over time. In MD simulation, the time average in Eq. (19.81) is computed in a discretised fashion along with exponential dampening of long‐time history to keep the restraint responsive to the currently sampled configurations [ 123– 125].

It appears straightforward to construct special potential‐energy terms for given properties to establish a connection between ensembles sampled in MD simulations and observed in experiment, but there are several intricacies associated with it [126], e.g. there has to be sufficient sampling in the MD simulation such that all possible solutions fulfilling the experimental restraints are explored.

19 Implicit Solvation

Implicit solvation models [127–131] are an important tool in MD simulations to reduce computational cost (by about one order of magnitude) through elimination of the myriad of solvent dof and instead modelling the solvent as a dielectric continuum with a certain permittivity ε. Thus, longer simulation times and in principle more exhaustive sampling is possible. In addition, due to their efficiency, these tools are useful to simulate a huge number of systems, which is getting more and more important in a time where molecular‐level insight is increasingly used to perform research at the interface of structural biology and omics disciplines. Besides their computational efficiency, an advantage of implicit‐solvation models is that long‐range electrostatic interactions are represented realistically according to Coulomb's law. However, due to the neglect of explicit solvent structure, the results from MD simulations in implicit solvent may not be accurate and only be meaningful for qualitative investigations. In particular, problems are the lack of solute–solvent hydrogen bonding, of dielectric saturation and electrostriction effects around charged solute components, as well as possibly a spurious speeding‐up of dynamics.

The most important eom for implicit‐solvent simulations is Langevin's eom (Eq. ( 19.10)). The stochastic force represents random collisions of the solute with solvent molecules, the friction term represents a drag force (the magnitude being determined by the friction coefficient γ, which should be chosen as a function of solvent viscosity) and the physical force represents a mean force not only deriving from the molecular‐mechanics force‐field interactions within the biomolecule but also including mean solvent contributions. This is done with a potential energy

19.82

where the first contribution corresponds to covalent molecular‐mechanics force‐field terms (Section 19.3.1 ), together with intrasolute Lennard–Jones interactions and intrasolute Coulombic electrostatic interactions. The latter usually involve screening by the dielectric constant inside the solute ε _P,

19.83

The precise value of ε_P is ambiguous and typically chosen within a range of 4–10 [132]. The two last terms in Eq. (19.82) describe the electrostatic and non‐polar contribution to the solute solvation free energy in the implicit solvent. These are free energies, i.e. include entropic components, which is why the force deriving from the potential‐energy function in Eq. ( 19.82) is a mean force, accounting for the mean effect of the solvent. A simple way to describe the non‐polar solvation free energy is

19.84

where a _i is the solvent‐accessible area of atom i and p _i is an atom‐specific parameter capturing the non‐polar solvation free energy contribution per unit solvent accessible surface area of this atom [133]. The electrostatic solvation free‐energy contribution is typically obtained either from the Poisson–Boltzmann (PB) equation [130] or from Generalised Born (GB) models [127]. In the former,

19.85

where q _i is the partial charge of atom i and φ _slv(r _i) and φ _vac(r _i) are the electrostatic potentials at the location of atom i when the solute is solvated and in vacuum, respectively. These electrostatic potentials (in the following denoted φ(r)) are obtained from solving the PB equation

19.86

where ρ(r) is the solute charge distribution, ε(r) the position‐dependent dielectric permittivity, the sum runs over implicitly modelled ion species i of charge q _i and bulk concentration c _i and λ _i(r) is a parameter to describe ion accessibility. The equation is linearised by expanding the exponential term, which is valid for q _i φ(r) < k _BT, and keeping only the first‐order term. Hence,

19.87

The PB equation is usually solved on a grid via finite differences. To this end, the charge distribution, dielectric constant and ion accessibility are mapped on to the grid points and one solves for the electrostatic potential on the grid points along with interpolation of the electrostatic potential to the positions of the solute charges. Two calculations are performed, one in solvent (employing the solvent dielectric permittivity) and one in vacuum (employing a dielectric permittivity of one), finally allowing evaluation of Eq. (19.87).

A computationally more inexpensive method is to use a GB model,

19.88

where

19.89

is a function to account for the influence of the finite separation of the charges on their solvation free energies and R _i is the Generalised‐Born radius of atom i, an empirical parameter. Various parameterizations of GB models are available [134].

19.4 Applications

19.4.1 Comparative Protein Structure Modelling

Many proteins in a cell adopt well‐defined three‐dimensional (3D) structures that determine their biological function. Hence, knowledge of the protein 3D structure is of major biological importance. Experimental protein structure determination, e.g. by X‐ray crystallography, requires purification of large amounts of proteins and the ability to crystallise the proteins or protein complexes, which in many cases may be difficult or even impossible. In addition, NMR and cryo‐electron microscopy may be used for protein structure determination, but also have certain limitations. Computational modelling and MD simulation can be used to predict protein structure by directly mimicking the folding process. However, MD simulations at atomistic resolution and including surrounding solvent explicitly are still limited to the regime of microseconds, whereas the folding process of many proteins is in the millisecond to second regime. Nevertheless, with modern force fields and specialised computer hardware [135] it has become possible to directly follow the folding process at atomistic resolution for a number of fast folding proteins [136]. Due to the large compute resources and still time limitations of such simulations this approach is not useful as a general tool for protein structure prediction.

However, comparison of known experimental protein structures indicates reoccurring types of 3D conformations [137]. In particular, proteins with similar sequences are likely to adopt a similar folded structure. This observation forms the basis of molecular modelling approaches known as ‘comparative protein modelling’, ‘protein homology building’ or ‘template‐based 3D modelling’ [ 20, 21,138]. The similarity of two protein sequences is defined through a sequence alignment, i.e. an algorithm aligning the sequences in two rows such that the number of identical residues in each column is maximised (Figure 19.11). The ratio of the number of identical amino acids after optimal alignment relative to the total number of aligned amino acids defines the sequence identity.

The structural similarity of two proteins is typically measured as a root‐mean‐square deviation (RMSD) of corresponding atoms after best‐possible spatial superposition. As an example, the backbone structure of two proteins with a sequence identity of ∼55% and an RMSD of backbone atoms of ∼1 Å is shown in Figure 19.11. It is evident that in this particular example the two proteins adopt very similar 3D structures. As a rule of thumb, when comparing natural sequences, analysis of many cases indicates that if the sequence identity of two protein domains is >50% it is highly likely that both domains adopt the same fold with typically a backbone RMSD <1.5 Å. In the case of a sequence identity in the range 30–50% it is still very likely that the same type of fold is formed with RMSD of ∼1.5–2.5 Å. If the sequence identity drops to <30% only sometimes, the proteins adopt the same fold with RMSD >2.5 Å, but also often no or only little structural similarity is observed.

The expected close structural similarity for proteins of close sequence similarity can be used to obtain structural models via restrained MD simulations and energy minimization based on a known structure for one of the two proteins (termed the template protein). The observed spatial arrangement of homologous residues in the template protein is encoded in additional force‐field terms during a simulation of the homologous target protein. Typically, this can be included by using or slightly modifying force‐field terms already available in the original force field described in Section 19.3.1 . For example, contacts between residues can be included as additional bonds (distance restraints) using the typical form of a harmonic potential, but instead of defining a single minimum‐energy distance one defines a small range of allowed distances between pairs of homologous atoms beyond which a quadratic energy penalty is added to the force‐field description of the target protein. With a sufficient number of such restraints, rapid folding of the target protein to a possible structure in close agreement with the known template structure can be obtained [139]. The standard terms in a molecular‐mechanics force field prevent that sterically incorrect models (e.g. with atom overlaps or incorrect bonded geometry) are formed (Section 19.3.1 ). A modelling process including such spatial restraints is illustrated in Figure 19.12. The Modeller program [138] is one of the most common approaches for modelling proteins based on sequence similarity and satisfaction of spatial restraints.

Comparative protein modelling is widely used in molecular and structural biology. In case of close sequence similarity (e.g. ∼50%), accurate target structures with backbone RMSD ∼1–2 Å from the real structure can be achieved. Since for a given protein fold with a known structure typically many homologous sequences are available, structural information on these homologues can be readily obtained.

19.4.2 Alchemical Free‐Energy Simulations to Identify Key Residues of Protein–Protein Interaction

As already outlined in Section Free‐Energy Calculations, thermodynamic integration or free energy perturbation approaches allow one to extract free‐energy changes associated with processes not feasible in experiment, e.g. alchemical transformation of one chemical group into another. This option can, for example, be used to calculate the influence of mutations on the binding free energy of two interacting proteins and to identify key residues for the interaction. An example is the analysis of the role of hot‐spot residues at the interface of colicin E9 and Immunity protein 9 (Im9) [140]. The activity of the bacterial colicin E9 enzyme is controlled by the specific and high‐affinity binding of immunity protein Im9. This system is a well‐studied model for investigating protein–protein interactions [141]. The E9/Im9 interface structure is illustrated in Figure 19.13.

To identify key residues for the protein interactions and to study the mechanism of high‐affinity binding, several Im9 interface residues were alchemically transformed to alanine residues during TI calculations. The simulations followed closely the methodology described in Section Free‐Energy Calculations. To calculate the effect of such alchemical mutations of side chains to alanine on the binding affinity it is necessary to perform the mutations both in the complex and the isolated partner protein (for obtaining a full thermodynamic cycle of the transformation). The effect of the side chain mutation on binding can then be calculated as the difference in free energy for the transformation in the complex versus in the isolated unbound partner. Good agreement of the calculated free‐energy differences of binding and available experimental data was obtained (Figure 19.13). The simulation study also included analysis of the role of solvent and conformational flexibility of the partner proteins by comparing results in the absence or presence of solvent and with or without positional restraints on the protein partners. Restriction of the conformational flexibility of the proteins resulted in significant changes of the calculated free energies but of similar magnitude for the isolated Im9 and for the complex, and therefore in only modest changes of the relative binding free‐energy differences. The largest overall binding free‐energy change was obtained for the two Tyr‐Ala mutations. However, the physical origin appeared to be different, with solvation changes contributing significantly to the Tyr55Ala mutation and a loss of direct protein–protein interactions dominating the free‐energy change due to the Tyr54Ala mutation [140]. Hence, in addition to identifying the contribution of interface residues for protein–protein binding, the simulation studies also contribute to a better understanding of the molecular details of the interactions and of the physical origins of the contribution of each residue to protein–protein binding affinity.

19.5 Concluding Remarks

Starting in the 1960s and 1970s, applications of MD simulation and molecular modelling have grown dramatically. In particular, in the last 10 years the introduction of new powerful hardware as well as methodological advances have widened the applicability of MD simulation enormously to investigate many biophysical processes and effects on timescales up to microseconds. Simulation and modelling techniques are involved at many stages of biomolecular structure determination to translate X‐ray scattering, NMR or cryo‐electron microscopy data into biomolecular structures. It is also possible to apply simulation techniques to follow biomolecular structure formation and association processes.

As outlined above, free‐energy simulation methods play an important role to predict or to interpret the effect of chemical modifications on protein–protein and protein–ligand binding. In the field of drug design, molecular simulations and docking methods can help to predict the structure of drug–protein complexes and to estimate the ligand–protein binding affinity. However, it is important to keep in mind that the simulation results are limited by the maximum timescale of the simulation approach and in addition by the accuracy of the underlying force field. The force‐field description is an approximation of the true molecular interactions and requires frequent testing by comparison with experimental data and further improvement.

The intention of the present contribution was to give an introduction into the physical and methodological basis for molecular simulations based on a classical molecular‐mechanics force field. The description of applications was intended to present examples but cannot cover the entire range of tasks that can be tackled using molecular simulation techniques. In particular, a technique that is increasingly used but not covered in the present contribution is multiscale simulation [ 46,142,143], where different parts of the simulated system are described at different levels of resolution, e.g. combining a (sub)atomistic representation of a solute and its first two solvation shells with a coarse‐grained representation of the bulk solvent. A special case of multiresolution simulation is the QM/MM method [ 44– 46], where parts of the system (e.g. an enzyme active site) are modelled quantum‐mechanically and the rest of the system is modelled using molecular mechanics (possibly at different levels of resolution). The interested reader is referred to the references for further information.

References

1 Frenkel, D. (2013). Simulations: the dark side. Eur. Phys. J. Plus 128: 10/1–10/21.
2 Wong‐ekkabut, J. and Karttunen, M. (2016). The good, the bad and the user in soft matter simulations. Biochim. Biophys. Acta Biomembr. 1858: 2529–2538.
3 van Gunsteren, W.F., Daura, X., Hansen, N. et al. (2017). Validation of molecular simulation: an overview of issues. Angew. Chem. Int. Ed. 57: 884–902.
4 Alder, B.J. and Wainwright, T.E. (1957). Phase transition for a hard sphere system. J. Chem. Phys. 27: 1208–1209.
5 Rahman, A. (1964). Correlations in the motion of atoms in liquid argon. Phys. Rev. 136: A405–A411.
6 Rahman, A. and Stillinger, F.H. (1971). Molecular dynamics study of liquid water. J. Chem. Phys. 55: 3336–3359.
7 McCammon, J.A., Gelin, B.R., and Karplus, M. (1977). Dynamics of folded proteins. Nature 267: 585–590.
8 van Gunsteren, W.F. and Karplus, M. (1982). Protein dynamics in solution and in a crystalline environment: a molecular dynamics study. Biochemistry 21: 2259–2274.
9 Moore, G.E. (1965). Cramming more components onto integrated circuits. Electronics 38: 114–117.
10 Rieffel, E.G. and Polak, W.H. (2014). Quantum Computing: A Gentle Introduction. Cambridge, USA: The MIT Press.
11 Furber, S. (2016). Large‐scale neuromorphic computing systems. J. Neural Eng. 13: 051001/1–051001/14.
12 Lanyon, B.P., Whitfield, J.D., Gillett, G.G. et al. (2010). Towards quantum chemistry on a quantum computer. Nature Chem. 2: 106–111.
13 Kassal, I., Whitfield, J.D., Perdomo‐Ortiz, A. et al. (2011). Simulating chemistry using quantum computers. Annu. Rev. Phys. Chem. 62: 185–207.
14 van Gunsteren, W.F., Bakowies, D., Baron, R. et al. (2006). Biomolecular modeling: goals, problems, perspectives. Angew. Chem. Int. Ed. Engl. 45 (25): 4064–4092.
15 Micha, D.A. and Burghardt, I. (2007). Quantum Dynamics of Complex Molecular Systems. Heidelberg, Germany: Springer.
16 Wesseling, P. (2000). Principles of Computational Fluid Dynamics, 1e. Berlin, Germany: Springer.
17 McQuarrie, D.A. (2000). Statistical Mechanics. Sausalito, CA, USA: University Science Books.
18 Goldstein, H. (1980). Classical Mechanics, 2e. Reading Massachusetts, USA: Addison Wesley.
19 Newton, I. (1999). The Principia: Mathematical Principles of Natural Philosophy. (A new translation by I. B. Cohen and A. Whitman). Berkeley, USA: University of California Press.
20 Lushington, G.H. (2015). Comparative modeling of proteins. Methods Mol. Biol. 1215: 309–330.
21 Fiser, A. (2010). Template‐based protein structure modeling. Methods Mol. Biol. 673: 73–94.
22 Lee, J., Wu, S., and Zhang, Y. (2009). Ab initio protein structure prediction. In: From Protein Structure to Function with Bioinformatics (ed. D.J. Rigden), 3–25. Dordrecht, The Netherlands: Springer.
23 The protein data bank. www.rcsb.org.
24 Frenkel, D. (1993). Monte Carlo simulations: a primer. In: Computer Simulation of Biomolecular Systems, Theoretical and Experimental Applications (ed. W.F. van Gunsteren, P.K. Weiner and A.J. Wilkinson), 37–66. The Netherlands: ESCOM Science Publishers, B.V., Leiden.
25 Frenkel, D. and Smit, B. (2002). Understanding Molecular Simulation, 2e. San Diego: Academic Press, USA.
26 Metropolis, N. and Ulam, S. (1949). The Monte Carlo method. J. Am. Stat. Assoc. 44: 335–341.
27 Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N. et al. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21: 1087–1092.
28 van Gunsteren, W.F. (1993). Molecular dynamics and stochastic dynamics simulation: a primer. In: Computer Simulation of Biomolecular Systems, Theoretical and Experimental Applications (ed. W.F. van Gunsteren, P.K. Weiner and A.J. Wilkinson), 3–36. The Netherlands: ESCOM Science Publishers, B.V., Leiden.
29 Allen, M.P. and Tildesley, D.J. (1987). Computer Simulation of Liquids. New York, USA: Oxford University Press.
30 Boltzmann, L. (1871). Einige allgemeine Sätze über das Wärmegleichgewicht. Wien. Ber. 63: 679–711.
31 Gallavotti, G. (2016). Ergodicity: a historical perspective. Equilibrium and nonequilibrium. Eur. Phys. J. H 41: 181–259.
32 Huang, K. (2010). Introduction to Statistical Physics, 2e. Boca Raton: Chapman & Hall/CRC Taylor & Francis Group, USA.
33 Langevin, P. (1908). Sur la théorie du mouvement brownien. C. R. Avad. Sci. Paris 146: 530–533.
34 Lagrange, J.L. (1788). Mécanique Analytique. Pairs, France: Gauthier‐Villars.
35 MacKerell, D.A. Jr. (2004). Empirical force fields for biological macromolecules: Overview and issues. J. Comput. Chem. 25: 1584–1604.
36 Monticelli, L. and Tieleman, D.P. (2013). Force fields for classical molecular dynamics. Methods Mol. Biol. 924: 197–213.
37 Lopes, P.E.M., Guvench, O., and MacKerell, A.D. Jr. (2015). Current status of protein force fields for molecular dynamics simulations. Methods Mol. Biol. 1215: 47–71.
38 Hünenberger, P.H. and van Gunsteren, W.F. (1997). Empirical classical interaction functions for molecular simulations. In: Computer Simulation of Biomolecular Systems, Theoretical and Experimental Applications (ed. W.F. van Gunsteren, P.K. Weiner and A.J. Wilkinson), 3–82. Dordrecht, The Netherlands: Kluwer/Escom Science Publishers.
39 Kmiecik, S., Gront, D., Kolinski, M. et al. (2016). Coarse‐grained protein models and their applications. Chem. Rev. 116: 7898–7936.
40 Saunders, M.G. and Voth, G. (2013). Coarse‐graining methods for computational biology. Annu. Rev. Biophys. 42: 73–93.
41 Voth, G.A. (2008). Coarse‐Graining of Condensed Phase and Biomolecular Systems. New York, USA: Taylor & Francis, Inc.
42 Ingólfsson, H.I., Lopez, C.A., Uusitalo, J.J. et al. (2013). The power of coarse graining in biomolecular simulations. WIREs Comput. Mol. Sci. 4: 225–248.
43 Morse, P.M. (1929). Diatomic molecules according to the wave mechanics. II. Vibrational levels. Phys. Rev. 34: 57–64.
44 Ryde, U. (2016). QM/MM calculations on proteins. Methods Enzymol. 577: 119–158.
45 Senn, H.M. and Thiel, W. (2009). QM/MM methods for biomolecular systems. Angew. Chem. Int. Ed. Engl. 48: 1198–1229.
46 Pezeshki, S. and Lin, H. (2015). Recent developments in QM/MM methods towards open‐boundary multi‐scale simulations. Mol. Simul. 41: 168–189.
47 Stone, A.J. (2000). The Theory of Intermolecular Forces. Oxford, UK: Oxford University Press.
48 London, F. (1937). The general theory of molecular forces. Trans. Faraday Soc. 33: 8–26.
49 Griffiths, D.J. (2017). Introduction to Quantum Mechanics, 2e. Cambridge, UK: University Press.
50 Pauli, W. (1925). Über den Zusammenhang des Abschlusses der Elektronengruppen im Atom mit der Komplexstruktur der Spektren. Z. Physik 31: 765–783.
51 Lennard‐Jones, J.E. (1937). The equation of state of gases and critical phenomena. Physica 4: 941–956.
52 Halgren, T.A. (1992). Representation of van der Waals (vdW) interactions in molecular mechanics force‐fields: potential form, combination rules, and vdW parameters. J. Am. Chem. Soc. 114: 7827–7843.
53 Good, R.J. and Hope, C.J. (1970). New combining rule for intermolecular distances in intermolecular potential functions. J. Chem. Phys. 53: 540–543.
54 Jackson, J.D. (1999). Classical Electrodynamics, 3e. New York, USA: Wiley.
55 Coulomb, C.A. (1788). Sur l'électricité et le magnétisme, premier mémoire, construction et usage d'une balance électrique, fondée sur la propriété qu'ont les fils de métal, d'avoir une force de réaction de torsion proportionnelle à l'angle de torsion. Mém. Acad. Sci. 1785: 569–577.
56 Adams, D.J. (1979). Computer simulation of ionic systems: the distorting effects of the boundary conditions. Chem. Phys. Lett. 62: 329–332.
57 Rick, S.W. and Stuart, S.J. (2002). Potentials and algorithms for incorporating polarizability in computer simulations. Rev. Comp. Chem. 18: 89–146.
58 Yu, H. and van Gunsteren, W.F. (2005). Accounting for polarization in molecular simulation. Comput. Phys. Commun. 172: 69–85.
59 Warshel, A., Kato, M., and Pisliakov, A.V. (2007). Polarizable force fields: History, test cases, and prospects. J. Chem. Theory Comput. 3: 2034–2045.
60 Lopes, P.E., Roux, B., and MacKerell, A.D. Jr. (2009). Molecular modeling and dynamics studies with explicit inclusion of electronic polarizability: theory and applications. Theor. Chem. Accounts 124: 11–28.
61 Baker, C.M. (2015). Polarizable force fields for molecular dynamics simulations of biomolecules. WIREs Comput. Mol. Sci. 5: 241–254.
62 Duan, Y., Wu, C., Chowdhury, S. et al. (2003). A point‐charge force field for molecular mechanics simulations of proteins based on condensed‐phase quantum mechanical calculations. J. Comput. Chem. 24: 1999–2012.
63 Maler, J.A., Martinez, C., Kasavajhala, K. et al. (2015). ff14SB: improving the accuracy of protein side chain and backbone parameters from ff99SB. J. Chem. Theory Comput. 11: 3696–3713.
64 MacKerell, A.D., Feig, M., and Brooks, C.L. (2004). Extending the treatment of backbone energetics in protein force fields: limitations of gas‐phase quantum mechanics in reproducing protein conformational distributions in molecular dynamics simulations. J. Comput. Chem. 25: 1400–1415.
65 Best, R.B., Zhu, X., Shim, J. et al. (2012). Optimization of the additive CHARMM all‐atom protein force field targeting improved sampling of the backbone phi,psi and side‐chain chi1 and chi2 dihedral angles. J. Chem. Theory Comput. 8: 3257–3273.
66 Schmid, N., Eichenberger, A.P., Choutko, A. et al. (2011). Definition and testing of the GROMOS force‐field versions 54A7 and 54B7. Eur. Biophys. J. 40: 843–856.
67 Reif, M.M., Hünenberger, P.H., and Oostenbrink, C. (2012). New interaction parameters for charged amino acid side chains in the GROMOS force field. J. Chem. Theory Comput. 8: 3705–3723.
68 Verlet, L. (1967). Computer ‘experiments’ on classical fluids. I. Thermodynamical properties of Lennard‐Jones molecules. Phys. Rev. 159: 98–103.
69 Swope, W.C., Andersen, H.C., Berens, P.H., and Wilson, K.R. (1982). A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: Application to small water clusters. J. Chem. Phys. 76: 637–649.
70 Hockney, R.W. (1970). The potential calculation and some applications. Methods Comput. Phys. 9: 135–211.
71 Mazur, A.K. (1997). Common molecular dynamics algorithms revisited: accuracy and optimal time steps of Störmer‐Leapfrog integrators. J. Comput. Phys. 136: 354–365.
72 Toxvaerd, S. (1994). Hamiltonians for discrete dynamics. Phys. Rev. E 50: 2271–2274.
73 Bou‐Rabee, N. (2014). Time integrators for molecular dynamics. Entropy 16: 138–162.
74 Berendsen, H.J.C., Postma, J.P.M., van Gunsteren, W.F. et al. (1984). Molecular dynamics with coupling to an external bath. J. Chem. Phys. 81: 3684–3690.
75 Morishita, T. (2000). Fluctuation formula in molecular‐dynamics simulations with the weak coupling heat bath. J. Chem. Phys. 113: 2976–2982.
76 Hoover, W.G. (1985). Canonical dynamics: equilibrium phase‐space distributions. Phys. Rev. A 31: 1695–1697.
77 Nosé, S. (1984). A molecular dynamics method for simulations in the canonical ensemble. Mol. Phys. 52: 255–268.
78 Martyna, G.J., Klein, M.L., and Tuckerman, M. (1992). Nosé‐Hoover chains: the canonical ensemble via continuous dynamics. J. Chem. Phys. 97: 2635–2643.
79 Hünenberger, P.H. (2005). Thermostat algorithms for molecular dynamics simulations. Adv. Polym. Sci. 173: 105–149.
80 Marc, G. and McMillan, W.G. (1985). The virial theorem. Adv. Chem. Phys. 58: 209–361.
81 Parrinello, M. and Rahman, A. (1980). Crystal structure and pair potentials: a molecular‐dynamics study. Phys. Rev. Lett. 45: 1196–1199.
82 Ryckaert, J.P., Ciccotti, G., and Berendsen, H.J.C. (1977). Numerical integration of the Cartesian equations of motion of a system with constraints: molecular dynamics of n‐alkanes. J. Comput. Phys. 23: 327–341.
83 Kräutler, V., van Gunsteren, W.F., and Hünenberger, P.H. (2001). A fast SHAKE algorithm to solve distance constraint equations for small molecules in molecular dynamics simulations. J. Comput. Chem. 22: 501–508.
84 van Gunsteren, W.F. and Karplus, M. (1982). Effect of constraints on the dynamics of macromolecules. Macromolecules 15: 1528–1544.
85 van Gunsteren, W.F. (1980). Constrained dynamics of flexible molecules. Mol. Phys. 40: 1015–1019.
86 Steinbach, P.J. and Brooks, B.R. (1994). New spherical‐cutoff methods for long‐range forces in macromolecular simulation. J. Comput. Chem. 15: 667–683.
87 Linse, P. and Andersen, H.C. (1986). Truncation of Coulombic interactions in computer simulations of liquids. J. Chem. Phys. 85: 3027–3041.
88 Barker, J.A. and Watts, R.O. (1973). Monte Carlo studies of the dielectric properties of water‐like models. Mol. Phys. 26: 789–792.
89 Ewald, P.P. (1921). Die Berechnung optischer und elektrostatischer Gitterpotentiale. Ann. Phys. 369: 253–287.
90 Darden, T., York, D., and Pedersen, L. (1993). Particle mesh Ewald: An Nlog(N) method for Ewald sums in large systems. J. Chem. Phys. 98: 10089–10092.
91 Hockney, R.W. and Eastwood, J.W. (1988). Computer Simulation Using Particles, 2e. Bristol, UK: Institute of Physics Publishing.
92 Weber, W., Hünenberger, P.H., and McCammon, J.A. (2000). Molecular dynamics simulations of a polyalanine octapeptide under Ewald boundary conditions: influence of artificial periodicity on peptide conformation. J. Phys. Chem. B 104: 3668–3675.
93 Hünenberger, P.H. and McCammon, J.A. (1999). Effect of artificial periodicity in simulations of biomolecules under Ewald boundary conditions: a continuum electrostatics study. Biophys. Chem. 78: 69–88.
94 Reif, M.M., Kräutler, V., Kastenholz, M.A. et al. (2009). Explicit‐solvent molecular dynamics simulations of a reversibly‐folding beta‐heptapeptide in methanol: influence of the treatment of long‐range electrostatic interactions. J. Phys. Chem. B 113: 3112–3128.
95 Kastenholz, M.A. and Hünenberger, P.H. (2006). Computation of methodology‐independent ionic solvation free energies from molecular simulations: II. The hydration free energy of the sodium cation. J. Chem. Phys. 124: 224501/1–224501/20.
96 Christ, C.D., Mark, A.E., and van Gunsteren, W.F. (2010). Basic ingredients of free energy calculations: a review. J. Comput. Chem. 31: 1569–1582.
97 Hansen, N. and van Gunsteren, W.F. (2014). Practical aspects of free‐energy calculations: a review. J. Chem. Theory Comput. 10: 2632–2647.
98 Mobley, D.L. and Gilson, M.K. (2017). Predicting binding free energies: Frontiers and benchmarks. Annu. Rev. Biophys. 46: 531–558.
99 Chipot, C. and Pohorille, A. (2007). Free Energy Calculations. Theory and Applications in Chemistry and Biology. Springer Series in Chemical Physics. Springer Verlag.
100 Chipot, C. (2014). Frontiers in free‐energy calculations of biological systems. WIREs Comput. Mol. Sci. 4: 71–89.
101 Kirkwood, J.G. (1935). Statistical mechanics of fluid mixtures. J. Chem. Phys. 3: 300–313.
102 Zwanzig, R.W. (1954). High‐temperature equation of state by a perturbation method. I. Nonpolar gases. J. Chem. Phys. 22: 1420–1426.
103 Oostenbrink, C. (2012). Free energy calculations from one‐step perturbations. In: Computational Drug Discovery and Design (ed. R. Baron), 487–499. New York, USA: Humana Press (Springer).
104 Beutler, T.C., Mark, A.E., van Schaik, R. et al. (1994). Avoiding singularities and numerical instabilities in free energy calculations based on molecular simulations. Chem. Phys. Lett. 222: 529–539.
105 Zacharias, M., Straatsma, T.P., and McCammon, J.A. (1994). Separation‐shifted scaling, a new scaling method for Lennard‐Jones interactions in thermodynamic integration. J. Chem. Phys. 100: 9025–9031.
106 Trzesniak, D., Kunz, A.P.E., and van Gunsteren, W.F. (2007). A comparison of methods to compute the potential of mean force. Chem. Phys. Chem. 8: 162–169.
107 van Gunsteren, W.F., Beutler, T.C., Fraternali, F. et al. (1993). Computation of free energy in practice: choice of approximations and accuracy limiting factors. In: Computer Simulation of Biomolecular Systems, Theoretical and Experimental Applications, vol. 2, 315–367. The Netherlands: ESCOM Science Publishers, B.V., Leiden.
108 Kumar, S., Bouzida, D., Swendsen, R.H. et al. (1992). The weighted histogram analysis method for free‐energy calculations on biomolecules. I. The Method. J. Comput. Chem. 13: 1011–1021.
109 Doudou, S., Burton, N.A., and Henchman, R.H. (2009). Standard free energy of binding from a one‐dimensional potential of mean force. J. Chem. Theory Comput. 5: 909–918.
110 Sugita, Y., Kitao, A., and Okamoto, Y. (2000). Multidimensional replica‐exchange method for free‐energy calculations. J. Chem. Phys. 113: 6042–6051.
111 Yang, L., Liu, C.W., Shao, Q. et al. (2015). From thermodynamics to kinetics: enhanced sampling of rare events. Acc. Chem. Res. 48: 947–955.
112 Bernardi, R.C., Melo, M.C.R., and Schulten, K. (2015). Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta 1850: 872–877.
113 Miao, Y. and McCammon, J.A. (2016). Unconstrained enhanced sampling for free energy calculations of biomolecules: a review. Mol. Simul. 42: 1046–1055.
114 Luitz, M., Bomblies, R., Ostermeir, K., and Zacharias, M. (2015). Exploring biomolecular dynamics and interactions using advanced sampling methods. J. Phys. Condens. Matter 27: 323101/1–323101/13.
115 Sugita, Y. and Okamoto, Y. (1999). Replica‐exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 134: 141–151.
116 Fukunishi, H., Watanabe, O., and Takada, S. (2002). On the Hamiltonian replica exchange method for efficient sampling of biomolecular systems: application to protein structure prediction. J. Chem. Phys. 116: 9058–9067.
117 Periole, X. and Mark, A.E. (2007). Convergence and sampling efficiency in replica exchange simulations of peptide folding in explicit solvent. J. Chem. Phys. 126: 014903/1‐014903/11.
118 Huber, T., Torda, A.E., and van Gunsteren, W.F. (1994). Local elevation: a method for improving the searching properties of molecular dynamics simulation. J. Comput. Aided Mol. Des. 8: 695–708.
119 Grubmüller, H. (1995). Predicting slow structural transitions in macromolecular systems: conformational flooding. Phys. Rev. E 52: 2893–2906.
120 Laio, A. and Parrinello, M. (2002). Escaping free‐energy minima. Proc. Natl Acad. Sci. USA 99: 12562–12566.
121 Hansen, H.S. and Hünenberger, P.H. (2010). Using the local elevation method to construct optimized umbrella sampling potentials: calculation of the relative free energies and interconversion barriers of glucopyranose ring conformers in water. J. Comput. Chem. 31: 1–23.
122 Engkvist, O. and Karlström, G. (1996). A method to calculate the probability distribution for systems with large energy barriers. Chem. Phys. 213: 63–76.
123 Torda, A.E., Scheek, R.M., and van Gunsteren, W.F. (1990). Time‐averaged nuclear Overhauser effect distance restraints applied to tendamistat. J. Mol. Biol. 214: 223–235.
124 Torda, A.E., Brunne, R.M., Huber, T. et al. (1993). Structure refinement using time‐averaged J‐coupling constant restraints. J. Biomol. NMR 3: 55–66.
125 Hansen, N., Heller, F., Schmid, N., and van Gunsteren, W.F. (2014). Time‐averaged order parameter restraints in molecular dynamics simulations. J. Biomol. NMR 60: 169–187.
126 Van Gunsteren, W.F., Allison, J.R., Daura, X. et al. (2016). Deriving structural information from experimentally measured data on biomolecules. Angew. Chem. Int. Ed. 55: 15990–16010.
127 Bashford, D. and Case, D.A. (2000). Generalized Born models of macromolecular solvation effects. Annu. Rev. Phys. Chem. 51: 129–152.
128 Kleinjung, J. and Fraternali, F. (2014). Design and application of implicit solvent models in biomolecular simulations. Curr. Opin. Struct. Biol. 25: 126–134.
129 Chen, J., Brooks, C.L., and Khandogin, J. (2008). Recent advances in implicit solvent‐based methods for biomolecular simulations. Curr. Opin. Struct. Biol. 18: 140–148.
130 Baker, N.A. (2004). Poisson‐Boltzmann methods for biomolecular electrostatics. Methods Enzymol. 383: 94–118.
131 Baker, N.A. (2005). Improving implicit solvent simulations: a Poisson‐centric view. Curr. Opin. Struct. Biol. 15: 137–143.
132 Mellor, B.L., Cortes, E.C., Busath, D.D., and Mazzeo, B.A. (2011). Method for estimating the internal permittivity of proteins using dielectric spectroscopy. J. Phys. Chem. B 115: 2205–2213.
133 Chen, J. and Brooks, C.L. 3rd (2008). Implicit modeling of nonpolar solvation for simulating protein folding and conformational transitions. Phys. Chem. Chem. Phys. 10 (4): 471–481.
134 Nguyen, H., Roe, D.R., and Simmerling, C. (2013). Improved generalized Born solvent model parameters for protein simulations. J. Chem. Theory Comput. 9: 2020–2034.
135 Shaw, D.E., Dror, R.O., Salmon, J.K. et al. (2009). Millisecond‐scale molecular dynamics simulations on Anton. In: SC '09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 1–11. New York, NY, USA: ACM.
136 Shaw, D.E., Maragakis, P., Lindorff‐Larsen, K. et al. (2010). Atomic‐level characterization of the structural dynamics of proteins. Science 330: 341–346.
137 Schaeffer, R.D. and Daggett, V. (2011). Protein folds and protein folding. Protein Eng. Des. Sel. 24: 11–19.
138 Webb, B. and Sali, A. (2014). Comparative protein structure modeling using MODELLER. Curr. Protoc. Bioinformatics 47: 5.6.1–5.6.32.
139 Smith‐Brown, M.J., Kominos, D., and Levy, R.M. (1993). Global folding of proteins using a limited number of distance constraints. Protein Eng. 6 (6): 605–614.
140 Luitz, M. and Zacharias, M. (2013). Role of tyrosine hot‐spot residues at the interface of colicin E9 and immunity protein 9: a comparative free energy simulation study. Proteins: Struct. Funct. Bioinf. 81: 461–468.
141 Meenan, N.A., Sharma, A., Fleishman, S.J. et al. (2010). The structural and energetic basis for high selectivity in a high‐affinity protein‐protein interaction. Proc. Natl Acad. Sci. USA. 107 (22): 10080–10085.
142 Shen, L. and Yang, W. (2016). Quantum mechanics/molecular mechanics method combined with hybrid all‐atom and coarse‐grained model: Theory and application on redox potential calculations. J. Chem. Theory Comput. 12: 2017–2027.
143 Meier, K., Choutko, A., Dolenc, J. et al. (2013). Multi‐resolution simulation of biomolecular systems: a review of methodological issues. Angew. Chem. Int. Ed. Engl. 52: 2820–2834.

Table of Contents for
19 Computer Modelling and Molecular Dynamics Simulation of Biomolecules

19
Computer Modelling and Molecular Dynamics Simulation of Biomolecules

19.1 Significance

19.2 Theory and Principles

19.2.1 General Considerations

19.2.2 Statistical Mechanics: Partition Function and Ensemble Average

19.2.3 Classical Mechanics: Equations of Motion

19.3 Methodology

19.3.1 Molecular Mechanics: Biomolecular Force Fields

19.3.2 Molecular Dynamics Simulation of Biomolecules

19.3.2.1 Common Algorithms

19 Integration of the Equation of Motion

19 Thermo‐ and Barostatting

19 Bond‐Length Constraints

19 Lattice‐Summed Electrostatic Interactions

19.3.2.2 Special Topics

19 Free‐Energy Calculations

19 Enhanced Sampling

19 Experiment‐Based Restraints

19 Implicit Solvation

19.4 Applications

19.4.1 Comparative Protein Structure Modelling

19.4.2 Alchemical Free‐Energy Simulations to Identify Key Residues of Protein–Protein Interaction

19.5 Concluding Remarks

References

Further Reading

Table of Contents for 19 Computer Modelling and Molecular Dynamics Simulation of Biomolecules

Create new playlist

Sign In

Sign Up

19.1 Significance

19.2 Theory and Principles

19.2.1 General Considerations

19.2.2 Statistical Mechanics: Partition Function and Ensemble Average

19.2.3 Classical Mechanics: Equations of Motion

19.3 Methodology

19.3.1 Molecular Mechanics: Biomolecular Force Fields

19.3.2 Molecular Dynamics Simulation of Biomolecules

19.3.2.1 Common Algorithms

19 Integration of the Equation of Motion

19 Thermo‐ and Barostatting

19 Bond‐Length Constraints

19 Lattice‐Summed Electrostatic Interactions

19.3.2.2 Special Topics

19 Free‐Energy Calculations

19 Enhanced Sampling

19 Experiment‐Based Restraints

19 Implicit Solvation

19.4 Applications

19.4.1 Comparative Protein Structure Modelling

19.4.2 Alchemical Free‐Energy Simulations to Identify Key Residues of Protein–Protein Interaction

19.5 Concluding Remarks

References

Further Reading

Table of Contents for
19 Computer Modelling and Molecular Dynamics Simulation of Biomolecules