A genetic programming example

CnvKit is also available, but it is a CLI and not easy to use. In addition to this, PyCogent, which was developed by researchers at NCBI from the National Institutes of Health (NIH), is a useful tool. However, they are not easy to use. We will use a package called Bio (https://github.com/biopython/biopython/tree/master/Bio) and libraries from Python programming for biology.

In general, every experiment, research project, or study has sequence as the key object that is used in bioinformatics. As a mathematician, my visual thought of a sequence relates to a string with certain patterns (such as ATAGCATATGCT). To begin with, here is a simple example that shows a sequence, GC ratio, and codons:

from Bio.Seq import Seq 
from Bio.Alphabet import IUPAC 
from Bio.SeqUtils import GC 

def DNACodons(seq):
    end = len(seq) - (len(seq) % 3) – 1
    codons = [seq[i:i+3] for i in range(0, end, 3)]     
    return codons DNACodons(my_seq)
my_seq = Seq('GGTCGATGGGCCTAGCAGCATATCTGAGC', IUPAC.unambiguous_dna) 
print "GC Result==>", GC(my_seq)  

DNACodons(my_seq)
[Seq('GGT', IUPACUnambiguousDNA()),
 Seq('CGA', IUPACUnambiguousDNA()), 
 Seq('TGG', IUPACUnambiguousDNA()), 
 Seq('GCC', IUPACUnambiguousDNA()), 
 Seq('TAG', IUPACUnambiguousDNA()), 
 Seq('CAG', IUPACUnambiguousDNA()), 
 Seq('CAT', IUPACUnambiguousDNA()), 
 Seq('ATC', IUPACUnambiguousDNA()), 
 Seq('TGA', IUPACUnambiguousDNA())]

GC Result==> 58.6206896552

Let's consider two molecular structures, collect certain atoms, and try to plot their positions with their Phi and Psi angles. The allowed molecular structures are DNA, RNA, and protein. Using the Modelling and Maths modules from the PythonForBiology library, we will attempt to plot these structures side by side:

A genetic programming example

The two plots uses data from two files: testTransform.pdb and 1A12.pub. This contains the regulator of chromosome condensation (RCC1) of humans, as shown in the following code:

# bio_1.py
#
import matplotlib.pyplot as plt
from phipsi import getPhiPsi
from Modelling import getStructuresFromFile

def genPhiPsi(fileName):
  struc = getStructuresFromFile(fileName)[0]

  phiList = []
  psiList = []
  for chain in struc.chains:
    for residue in chain.residues[1:-1]:
      phi, psi = getPhiPsi(residue)
      phiList.append(phi)
      psiList.append(psi)

  return phiList, psiList

if __name__ == '__main__':

  phiList = []
  psiList = []
  phiList, psiList = genPhiPsi('examples/testTransform.pdb')

  phiList2 = []
  psiList2 = []
  phiList2, psiList2 = genPhiPsi('examples/1A12.pdb')

  plt.figure(figsize=(12,9))
  f, (ax1, ax2) = plt.subplots(1, 2, sharey=True, figsize=(12,9))

  ax1.scatter(phiList, psiList, s=90, alpha=0.65)
  ax1.axis([-160,160,-180,180])
  ax1.set_title('Ramachandran Plot for Two Structures')
  ax2.scatter(phiList2, psiList2, s=60, alpha=0.65, color='r')
  plt.show()

The library used in this example will be available with the code examples in a file called PythonForBiology.zip. You can extract it and run this code via a command line, assuming that you have numpy and matplotlib installed.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset