Molecular graphic visualization tools allow for interactive exploration of molecular structures. Most read PDB-formatted files, which we describe in Multiline Records. For example, Jmol (in the following graphic) is a Java-based open source 3D viewer for these structures.
In a molecular visualizer, every atom, molecule, bond, and so on has a location in 3D space, usually defined as a vector, which is an arrow from the origin to where the structure is. All of these structures can be rotated and translated.
A vector is usually represented by x, y, and z coordinates that specify how far along the x-axis, y-axis, and z-axis the vector extends.
Here is how ammonia can be specified in PDB format:
| COMPND AMMONIA |
| ATOM 1 N 0.257 -0.363 0.000 |
| ATOM 2 H 0.257 0.727 0.000 |
| ATOM 3 H 0.771 -0.727 0.890 |
| ATOM 4 H 0.771 -0.727 -0.890 |
| END |
In our simplified PDB format, a molecule is made up of numbered atoms. In addition to the number, an atom has a symbol and (x, y, z) coordinates. For example, one of the atoms in ammonia is nitrogen, with symbol N at coordinates (0.257, -0.363, 0.0). In the following sections, we will look at how we could translate these ideas into object-oriented Python.
We might want to create an atom like this using information we read from the PDB file:
| nitrogen = Atom(1, "N", 0.257, -0.363, 0.0) |
To do this, we’ll need a class called Atom with a constructor that creates all the appropriate instance variables:
| class Atom: |
| """ An atom with a number, symbol, and coordinates. """ |
| |
| def __init__(self, num: int, sym: str, x: float, y: float, |
| z: float) -> None: |
| """Create an Atom with number num, string symbol sym, and float |
| coordinates (x, y, z). |
| """ |
| |
| self.number = num |
| self.center = (x, y, z) |
| self.symbol = sym |
To inspect an Atom, we’ll want to provide __repr__ and __str__ methods:
| def __str__(self) -> str: |
| """Return a string representation of this Atom in this format: |
| |
| (SYMBOL, X, Y, Z) |
| """ |
| |
| return '({0}, {1}, {2}, {3})'.format( |
| self.symbol, self.center[0], self.center[1], self.center[2]) |
| |
| def __repr__(self) -> str: |
| """Return a string representation of this Atom in this format: |
| |
| Atom(NUMBER, "SYMBOL", X, Y, Z) |
| """ |
| |
| return 'Atom({0}, "{1}", {2}, {3}, {4})'.format( |
| self.number, self.symbol, |
| self.center[0], self.center[1], self.center[2]) |
We’ll use those later when we define a class for molecules.
In visualizers, one common operation is translation, or moving an atom to a different location. We’d like to be able to write this in order to tell the nitrogen atom to move up by 0.2 units:
| nitrogen.translate(0, 0, 0.2) |
This code works as expected if we add the following method to class Atom:
| def translate(self, x: float, y: float, z: float) -> None: |
| """Move this Atom by adding (x, y, z) to its coordinates. |
| """ |
| |
| self.center = (self.center[0] + x, |
| self.center[1] + y, |
| self.center[2] + z) |
Remember that we read PDB files one line at a time. When we reach the line containing COMPND AMMONIA, we know that we’re building a complex structure: a molecule with a name and a list of atoms. Here’s the start of a class for this, including an add method that adds an Atom to the molecule:
| class Molecule: |
| """A molecule with a name and a list of Atoms. """ |
| |
| def __init__(self, name: str) -> None: |
| """Create a Molecule named name with no Atoms. |
| """ |
| |
| self.name = name |
| self.atoms = [] |
| |
| def add(self, a: Atom) -> None: |
| """Add a to my list of Atoms. |
| """ |
| |
| self.atoms.append(a) |
As we read through the ammonia PDB information, we add atoms as we find them; here is the code from Multiline Records, rewritten to return a Molecule object instead of a list of lists:
| from molecule import Molecule |
| from atom import Atom |
| from typing import TextIO |
| |
| def read_molecule(r: TextIO) -> Molecule: |
| """Read a single molecule from r and return it, |
| or return None to signal end of file. |
| """ |
| # If there isn't another line, we're at the end of the file. |
| line = r.readline() |
| if not line: |
| return None |
| |
| # Name of the molecule: "COMPND name" |
| key, name = line.split() |
| |
| # Other lines are either "END" or "ATOM num kind x y z" |
| molecule = Molecule(name) |
| reading = True |
| |
| while reading: |
| line = r.readline() |
| if line.startswith('END'): |
| reading = False |
| else: |
| key, num, kind, x, y, z = line.split() |
| molecule.add(Atom(int(num), kind, float(x), float(y), float(z))) |
| |
| return molecule |
If we compare the two versions, we can see the code is nearly identical. It’s just as easy to read the new version as the old—more so even, because it includes type information. Here are the __str__ and __repr__ methods:
| def __str__(self) -> str: |
| """Return a string representation of this Molecule in this format: |
| (NAME, (ATOM1, ATOM2, ...)) |
| """ |
| |
| res = '' |
| for atom in self.atoms: |
| res = res + str(atom) + ', ' |
| |
| # Strip off the last comma. |
| res = res[:-2] |
| return '({0}, ({1}))'.format(self.name, res) |
| |
| def __repr__(self) -> str: |
| """Return a string representation of this Molecule in this format: |
| Molecule("NAME", (ATOM1, ATOM2, ...)) |
| """ |
| |
| res = '' |
| for atom in self.atoms: |
| res = res + repr(atom) + ', ' |
| |
| # Strip off the last comma. |
| res = res[:-2] |
| return 'Molecule("{0}", ({1}))'.format(self.name, res) |
We’ll add a translate method to Molecule to make it easier to move:
| def translate(self, x: float, y: float, z: float) -> None: |
| """Move this Molecule, including all Atoms, by (x, y, z). |
| """ |
| |
| for atom in self.atoms: |
| atom.translate(x, y, z) |
And here we’ll call it:
| ammonia = Molecule("AMMONIA") |
| ammonia.add(Atom(1, "N", 0.257, -0.363, 0.0)) |
| ammonia.add(Atom(2, "H", 0.257, 0.727, 0.0)) |
| ammonia.add(Atom(3, "H", 0.771, -0.727, 0.890)) |
| ammonia.add(Atom(4, "H", 0.771, -0.727, -0.890)) |
| ammonia.translate(0, 0, 0.2) |