Hsueh‐Fen Juan
Department of Life Science, Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
‘Proteome’ was first coined by Marc Wilkins and Keith L. Williams in 1994 in the first Siena meeting on ‘2D Electrophoresis: from protein maps to genomes’ [1] and defined in 1996 as ‘the entire proteins in a cell, a tissue or an organism’ [2]. The term, proteome, was not adopted immediately because the genomics researchers preferred to call the study of proteome as functional genomics. In ‘A Sydney proteome story’, Williams talked about proteins as ‘functional genomes’, but that did not really work for us, so we stayed with the name ‘proteome’ [3]. One year later, the term ‘proteomics’ was coined as the study of proteome, including protein identification, protein modification and protein–protein interaction (PPI). Proteins are functional molecules and the workhorses of the cell [3], and hence profiling proteome helps us to understand cellular processes.
In the 1980s, researchers used Edman sequencing to obtain protein sequence information for the abundant proteins after applying two‐dimension gel electrophoresis (2DE) to separate proteins by size and charge. Edman sequencing was a laborious process, however, and new techniques were sought. In the 1990s, genome sequencing techniques were improved dramatically, and genome databases were built fast. Not only genome sequencing techniques but also mass spectrometry (MS) techniques enhanced proteomics progress rapidly because MS has high sensitivity for identification of proteins if the corresponding protein sequences are in the genome databases. Hence, both developments of genome sequencing and MS technologies were essential for enabling effective proteomic studies [4].
MS allows us to rapidly and reliably identify a protein in a sample on the basis of its peptide mass and becomes a vital tool for current proteomics. In 2002, John B. Fenn and Koichi Tanaka were awarded the Nobel Prize in Chemistry for their contributions in hovering through spraying and blasting, respectively, in order to study of biological macromolecules (www.nobelprize.org). John B. Fenn pioneered the use of a strong electric field for sample spraying and then producing small, charged, freely hovering ions. This method came to be called ‘electrospray ionisation (ESI)’. Koichi Tanaka was the first person using an intense laser pulse to let large molecules such as proteins become released as free ions under suitable chemical environments. The phenomena were called ‘soft laser desorption’.
MS always produces a very complex spectrum, which makes the analysis of the data difficult. Matthias Mann, now the director of the Max Planck Institute of Biochemistry, when a graduate student in John B. Fenn's group, quickly worked out a couple of algorithms to simplify the spectrum and make the MS data analysis easier. Matthias Mann and his colleague developed a useful bioinformatics tool, MaxQuant [5], to analyse the enormous amounts of data generated by MS‐based proteomics. They provided detailed protocols about what data types can be analysed in this software and how to use it [6].
Since MS and proteomics produce a large amount of data, bioinformatics has become an essential tool. There are four major bioinformatics topics in proteomics: identification, quantification, functional analysis, and data sharing [7]. First, we need tools, such as MaxQuant, to identify and quantify proteins and post‐translational modification (PTM) sites from the mass spectrum data. If proteins are separated by 2DE, image analysis software tools are required to quantify the intensities of protein spots. After identification and quantification of proteins and PTM sites, we need to annotate the data with existing biological knowledge such as Gene Ontology (GO) Enrichment Analysis (http://geneontology.org/page/go‐enrichment‐analysis) and functional networks such as STRING (https://string‐db.org). Finally, sharing the data using online repositories, such as ProteomExchange (http://www.proteomexchange.org), is also important in the proteomics community.
Proteomics has been applied to solve biological and medical problems, such as disease prognosis and diagnosis, drug‐induced mechanism studies, as well as plant science and paleography. The accurate quantification in proteomics becomes an important requirement for clinical applications and biological research. In this chapter, the principles of major proteomics techniques and methods, including sample preparation, quantitation and data analysis, will be introduced and, finally, their applications will also be reviewed.
The major high throughput proteomics techniques include MS and proteome microarrays. There are two MS‐based techniques: 2DE and shotgun methods (Figure 5.1). After generating high throughput data, bioinformatics becomes the key issue to obtain worthy biological explanations. Here I will also briefly introduce the bioinformatics tools and resources.
For protein identification, the proteins need to be digested into peptides, which can be sequenced by MS. Before applying the peptide samples to MS, various ionisation methods need to be considered. Although many ionisation methods exist, two are most suitable for peptides: ESI and matrix‐assisted laser desorption ionisation (MALDI). MALDI is suitable for simple samples, while ESI is suitable for complex samples because it can be combined with liquid chromatography (LC‐MS). Figure 5.2 illustrates these two methods.
ESI uses an electrospray at high voltage to produce ions from macromolecules such as proteins. ESI can produce multiple‐charged ions, which extend the mass range of the analyser. ESI is one of the two so‐called ‘soft ionisation’ techniques since it splits biomolecules with little fragmentation. ESI can be coupled with MS (electrospray ionisation mass spectrometry [ESI‐MS]). In principle, solutes are sprayed from a capillary and droplets break down and evaporate. Peptides are multiply charged: +1, +2, +3, +4, +5, +6, etc.
In 1984, Masamichi Yamashita and John B. Fenn first reported the ESI technique [8] and later the analysis of biological macromolecules [9]. John B. Fenn was awarded the Nobel Prize in Chemistry in 2002 with Koichi Tanaka. He reviewed the history in the development of ESI MS in his laboratory [10] and described the work of Malcolm Dole, who was the first scientist to use ESI with MS in 1968 [11].
MALDI is an ionisation method that utilises a matrix with laser energy absorbing ability to create ions from large molecules such as proteins and peptides with minimal fragmentation [12]. This method is also called a ‘soft ionisation’ technique suitable for samples after 2DE separation. Firstly, the sample is mixed with a matrix that absorbs ultraviolet light. Secondly, the laser is fired to irradiate the sample. Finally, the samples are ionised and then fly in the presence of an electric field to the detector. The detector records a time‐of‐flight spectrum, which can be equated with mass since the flight time is proportional to the sample mass.
Many matrixes are used now, e.g. alpha‐cyano‐4‐hydroxy‐cinnamic acid (CHCA), which is used for peptides <10 kDa, and 2,5‐dihydroxybenzoic acid (DHB), which is used for proteins >10 kDa. However, in 1980, scientists were looking for a matrix suitable for large biomolecules. Fortunately, in February 1985, Koichi Tanaka mistakenly used a glycerin‐UFMP (ultrafine metal powder) and found the matrix was better than cobalt UFMP to absorb photon (UV laser) and form ions. For this finding, he was awarded the Nobel Prize in Chemistry in 2002.
The proteome microarray, also known as a protein microarray or protein chip, is a high throughput proteomics method used in addition to MS‐based methods and has emerged as a promising approach. Proteome microarrays are miniaturised, parallel assay systems that use immobilised coded proteins of an organism in a high density format placed on a microscope slide using a contact spotter or a non‐contact microarray [13,14]. The construction of defined sets of overexpressed cloned genes for high throughput expression and purification of recombinant proteins are essential for a proteome microarray. Similar to DNA microarrays, printing proteins as microarrays is a challenging technique in the proteome microarray field. To detect proteins on a proteome chip, molecular probes are labelled with either fluorescent, affinity, photochemical or radioisotope tags [14]. Typically, the most widely used microarray labelling is fluorescence, which offers high sensitivity and can be read out on a confocal DNA microarray scanner, thereby in principle enabling the production of both qualitative and quantitative data [15].
Proteome microarrays have many applications, including protein–protein [16], protein–phospholipid [17], protein–small molecule and protein–DNA/RNA interactions, biomarker discovery and identification of enzyme (e.g. protein kinases) substrates [ 13– 15]. In general, there are three types of proteome microarrays according to their applications: analytical, functional and reverse‐phase microarrays [14]. One of the model analytical microarrays is an antibody microarray that contains antibodies immobilised on a chip and the targeted proteins can be detected either by direct labelling or using a reporter antibody in sandwich assay format similar to the enzyme‐linked immunosorbent assay (ELISA), except that the reaction happens on a chip, not in solution. So‐called ‘functional microarrays’ focus on revealing protein functions by microarrays, which are broadly applied for protein–enzyme, protein–protein, protein–lipid and protein–DNA interactions, and so on [ 14, 16, 17]. The third proteome microarrays are reverse‐phase protein microarrays, which differ from classic protein microarrays by printing many different lysate samples such as tissues or cell lysates on the chip and then identify specific proteins with suitable probes [18].
As noted earlier, proteomics approaches generate large amounts of data, and therefore bioinformatics tools and resources are essential for biological interpretation. Using MS‐based methods, hundreds of thousands of spectrum datasets can be obtained and used to identify proteins. ExPASy is the SIB Bioinformatics Resource Portal, which provides access to scientific databases and software tools including proteomics tools and resources (https://www.expasy.org/proteomics). Among these tools in EXPASy, Mascot (http://www.matrixscience.com/search_form_select.html) is useful for protein identification from MS data. Another software, MaxQuant (http://www.biochem.mpg.de/5111795/maxquant), is widely used for quantitative proteomics with large‐scale mass‐spectrometric data derived from Thermo Fisher Scientific, Bruker Daltonics, AB Sciex and Agilent Technologies MS systems and supports the main labelling techniques such as stable isotope labelling with amino acids in cell culture (SILAC), di‐methyl, TMT, and isobaric tags for relative and absolute quantitation (iTRAQ), as well as label‐free quantification [ 5, 6]. Since 2015, MaxQuant has provided an updated component, termed the Viewer, which displays high resolution proteomics data by visualisation.
To explain the functions of biological molecules, network biology is core to understanding how protein–protein, DNA/RNA–protein and lipid–protein interactions occur and to determine the functions of these molecules. Many software packages can be used for network analysis including commercial, e.g. Ingenuity Pathway Analysis (IPA) and Pathway Studio, and free open tools, e.g. Cytoscape (http://www.cytoscape.org). All of these tools offer functional analysis for not only multiple gene expression but also proteomics and metabolomics experimental data. Most biological networks are generated by searching databases of curated literature‐sourced interactions, such as Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg) and IntAct (www.ebi.ac.uk/intact). Gene Set Enrichment Analysis (GSEA) is a widely used computational method for omics data analysis that determines whether a defined set of genes is over‐represented and shows statistically significant differences between two biological states [19,20]. PTM proteome analysis is sometimes different from global proteome analysis. For example, phosphoproteome data allow us to infer the phosphor motifs and upstream kinases. Hence, we developed a software application focusing on phosphoproteome data, named DynaPho (http://dynapho.jhlab.tw). The tool includes five analytical modules: phosphosite clustering, biological pathways, kinase activity, dynamics of interaction networks and the predicted kinase–substrate associations [21].
The methods for proteomics comprise MS‐based and chip‐based methods. Here I focus on MS‐based methods. I will first explain how to prepare samples from cells, tissues and blood and then describe several methods for proteome quantification.
Samples can be taken from cells, tissues and blood. Given the recent trend of research on single‐cell omics profiling, I will also describe how to isolate the single cells from cell population and tissues. Figure 5.3 briefly explains the preparation methods for different types of samples.
For cell sample preparation [22,23], in general, cells need to be cultured, harvested and then lysed using a lysis buffer containing 1% (v/v) sodium dodecyl sulfate, 50 mM Tris‐HCl, 10% (v/v) glycerol and a protease inhibitor cocktail. The cells are homogenised on ice using a homogeniser for two to three minutes. The cell lysate is centrifuged at 12 500g at 4 °C for 30 minutes. The supernatant containing the protein extract is collected and the protein concentration is measured using, for example, a Pierce BCA Protein Assay Kit. To digest proteins into peptides, gel‐assisted protein digestion is useful to reduce sample volume and reagents, which are sensitive for mass spectrometry [23]. Peptides can be extracted from the gel with 0.1% (v/v) trifluoroacetic acid (TFA), 50% acetonitrile (ANC)/0.1% (v/v) TFA and 100% ANC sequentially by vigorous vortexing [23]. The extracted peptide solution has to be dried using a centrifugal evaporator. Now the peptide samples are ready for identification or various tag labelling such as iTRAQ and dimethyl labelling.
For phosphoproteome experiments [ 23–25], the protein extract should be reduced with 10 mM dithiothreitol at room temperature for 30 minutes and carbamidomethylated with 55 mM iodoacetamide at room temperature in the dark for 30 minutes. Alkylated proteins can be digested with endopeptidase Lys‐C (1 : 100 w/w) for two hours followed by sequencing grade modified trypsin (1 : 100 w/w) overnight at room temperature for two minutes. The trypsin reaction can be inactivated by acidified the peptide solution to a pH < 3 using TFA. To remove detergent, the acidified peptide solution should be combined with an equal volume of ethyl acetate and agitated vigorously for one minute, followed by centrifugation at 15 700g for two minutes to separate the aqueous and organic phases. The sample from the aqueous phase is dried using a centrifugal evaporator and then subjected to desalting using Styrenedivinylbenzene Empore disk membranes (SDB‐XC) StageTips (catalogue no. 2340; 3 M) and eluted in a buffer containing 0.1% (v/v) TFA and 80% (v/v) acetonitrile (CAN) [26]. Before applying MS analysis, performing phosphopeptide enrichment is better to identify more phosphopeptides. Many strategies have been developed to enrich phosphopeptides, e.g. Ni‐NTA, titanium dioxide (TiO2 ) enrichment and hydroxyl acid‐modified metal oxide chromatography (HAMMOC) [ 23,24].
Single‐cell omics become important for understanding the functional differences of various cells contributing to health and disease [27]. Before performing single‐cell omics profiling, cell isolation is a key issue. Many methods for single‐cell isolation have been developed, including mouth pipette, serial dilution, flow‐assisted cell sorting (FACS), robotic micromanipulation and microfluid platforms [28]. However, proteome profiling in single cells is a longstanding challenge since proteins cannot be amplified using the polymerase chain reaction (PCR). Recently, scientists developed a few methods to measure proteins in single cells according to conjugation of antibodies with oligonucleotides, for example, ribonucleic acid expression and protein sequencing assay (REAP‐seq), which is similar to standard flow cytometry methods but with antibodies conjugated to DNA barcodes instead of fluorophores [29]. Proteins are probed using a homogeneous affinity‐based proximity extension assay (PEA) using pairs of antibodies conjugated with oligonucleotides after single‐cell isolation by FACS [30]. Zhu et al. developed a nanoPOTS (nanodroplet processing in one pot for trace samples) platform that used serial dilution to obtain single cells for further proteomics analysis [31]. Using this method, they claimed that they could identify >3000 proteins from as few as 10 cells [31]. The nanoPOTS method also can be applied to tissue samples.
Tissues are excised and collected for proteome and phosphoproteomic analyses. Tissue is first ground into a fine powder using a mortar and pestle in liquid nitrogen. Liquid nitrogen must be added to the mortar frequently to ensure that the tissues do not thaw during grinding [32].
For global proteomics experiments, tissue power is then suspended in lysis buffer containing 1% (v/v) safety data sheet (SDS), 50 mM Tris‐HCl, 10% (v/v) glycerol, and protease inhibitor. The amount of lysis buffer added is based on the amount of tissue powder. The solution containing the tissue powder is re‐suspended by pipetting until there is almost no visible pellet. The sample solution should be homogenised on ice using a homogeniser similar to cell sample preparation [23]. The following method is the same as with the cell sample preparation.
For phosphoproteomics experiments, the protein extract of each sample is denatured in an 8 M urea solution. Proteins are then reduced, carbidomethylated and diluted five times with 50 mM TEAB for trypsin digestion. Tryptic peptides are then processed by labelling and phosphopeptide enrichment, and then analysed by LC‐MS/MS [24].
Blood containing serum and plasma is the predominant sample used for biomarker studies [33,34] and clinical diagnosis [35,36]. However, MS‐based plasma/serum proteomics is very challenging because of its extremely large dynamic range of protein abundances [ 35, 36]. Removal of abundant proteins for discovery proteomics is one method to use before applying LC‐MS/MS. Many methods can be used for this purpose, including immunoaffinity‐based depletion, fractionation by chromatography and electrophoresis (Figure 5.4).
Immunoaffinity‐based depletion is a method using antibodies to remove abundant proteins specifically. For fractionation, chromatography such as reversed‐phase (RP) or strong cation exchange (SCX) are often used to remove abundant proteins. The principles for RP and SCX chromatography are based on protein hydrophobicity and charge, respectively. Electrophoresis is used for protein separation using size (1D) or 2DE, which combines two separation methods, isoelectric point (PI) and size. Only non‐abundant proteins are applied for further MS identification.
To quantify proteins, two major methods can be used, gel‐based and gel‐free. The popular gel‐based method is two‐dimensional gel electrophoresis, which can be used to separate proteins by their sizes and charges, as shown in Figure 5.4. Gel‐free quantitative methods comprise label‐free, chemical labelling, such as dimethyl labelling [37] and iTRAQ [38], and metabolic labelling, such as SILAC [39]. Figure 5.5 briefly explains the concept for these three gel‐free labelling methods.
The label‐free quantitative method can be used to quantify proteins. The advantage of the label‐free method is cost‐saving, but the corresponding peptide identification is not a trivial task. Therefore, powerful software is required in label‐free proteomics. One of the label‐free quantification methods is to calculate spectrum counts, inferring protein abundance by the number of times a peptide isobserved and the number of distinct peptides observed from a given protein [40]. Another label‐free method is quantification by comparing the intensity of mass spectrometric signal of each peptide from a given protein [40].
Dimethyl labelling is a fast, inexpensive and easy labelling method. This method can be applied to global or PTM such as in a phosphorylation proteomic study. After trypsin digestion of proteins into peptides, dimethyl labelling can be used for comparison of two (Figures 5.5 and 5.6) or three protein samples (Figure 5.7). When labelling tryptic peptides, most peptides have a 4 or 8 Da mass difference (Figure 5.6). When the protein is cleaved after an arginine residue, only the N terminus of the peptide is labelled, so there is a 4 Da difference; when the protein is cleaved after a lysine residue, both the N terminus and the lysine residue are labelled; therefore there is an 8 Da difference.
In a previous study [24], the experimental method is described as follows.
The peptide samples are first mixed with 6 µl of 4% formaldehyde‐H2 (Sigma‐Aldrich) and 4% formaldehyde‐D2, respectively, and then immediately 6 µl of freshly prepared 0.6 M sodium cyanoborohydride is added to each mixture. Each mixture is vigorously mixed and then reaction is allowed to proceed for 60 minutes at room temperature. Ammonium hydroxide (1%, 24 µl) is added to stop the reaction by reacting with the excess formaldehyde. Formic acid (10%, 30 µl) is further added with functions to end the labelling reaction and acidify the samples. Finally, the H‐ and D‐labelled samples are combined at a 1 : 1 ratio and then desalted by using SDB‐XC StageTips. The samples are further applied to LC‐MS/MS or phosphopeptide enrichment.
The iTRAQ method is one of the most popular chemical labelling methods. The term ‘isobaric’ describes the characteristic that different iTRAQ reagents (114, 115, 116 and 117) labelling different samples have equal mass. The iTRAQ reagent is amide reactive and can link to the N‐terminus and lysine side chains of peptides [38]. The concept for iTRAQ is shown in Figure 5.8.
To perform iTRAQ, first the peptides need to be re‐suspended in iTRAQ dissolution buffer. For the duplicate experiment and small‐scale experiment, 5 mg of peptides from each sample are required for iTRAQ labelling. For the large‐scale experiment, 150 mg of peptides from each sample are required [32]. Equal amounts of peptides from different samples are labelled by adding iTRAQ Reagent 114, iTRAQ Reagent 115, iTRAQ Reagent 116 or iTRAQ Reagent 117 and vortexing at room temperature for one hour. Labelled peptides are combined and dried with a centrifugal evaporator. The labelling peptide samples are now ready for LC‐MS/MS analysis directly or fractionation by SCX chromatography and further applied to LC‐MS/MS analysis.
SILAC [39] is one of the powerful approaches by metabolic labelling and is a popular method for quantitative proteomics. The cells are cultured in media containing stable 13C or 15N isotope‐labelled arginine and lysine, so the labelled amino acids are incorporated into each protein in cells. The protein sample preparation and peptide digestion for SILAC labelling quantification are similar to the methods described in cell sample preparation above. Here, I show the SILAC method (Figure 5.9). The cells are cultured with 13C isotope‐labelled arginine and lysine and the labelled proteins are purified for further digestion into peptides and LC‐MS/MS analysis.
Several software packages including MaxQuant, Mascot, and Proteome Discoverer can be used for MS spectrum data to identify proteins. MaxQuant is more popular than the other three since it is free and can be run in many kinds of systems such as Microsoft Windows, Mac, and Linux. How to set the parameters when using the software is critical for protein identification. Here I briefly introduce the setting parameters based on our team's experience.
If using MaxQuant, raw MS spectra are processed for peak detection and quantitation, and peptide identification is performed by using the Andromeda search engine and the Swiss‐Prot database. Search criteria can be set, such as trypsin specificity, fixed modification of carbamidomethyl, variable modifications of oxidation and phosphorylation, and allow for up to two missed cleavages. A minimum of six amino acids in the peptide is required. The precursor mass tolerance is 3 ppm and the fragment ion tolerance is 0.5 Da. By using a decoy database strategy, peptide identification is based on the posterior error probability with a false discovery rate (FDR) of 1%. FDR is a statistical method to present the expected proportion of type I errors for multiple hypotheses testing. The detailed FDR description is described in the MaxQuant website (http://www.biochem.mpg.de/5111795/maxquant). Precursor intensities of already identified peptides are further searched and recalculated by using the ‘match between runs’ option in MaxQuant [24].
If using Mascot for peptide identification, the search criteria can be set as follows: trypsin specificity allowing up to two missed cleavages, fixed modification of carbamidomethyl (C) and variable modifications of oxidation (M) and phosphorylation (ST), (Y), (D) and (H) [41]. Peptides are considered to be identified if the Mascot score yielded a confidence limit above 99% based on the significance threshold (p < 0.01) and if at least three successive y‐ or b‐ions with an additional two and more y‐, b‐ and/or precursor‐origin neutral loss ions are observed, based on the error‐tolerant peptide sequence tag concept [41].
If using Proteome Discoverer, the MS/MS spectral information is submitted to the software and the data files are combined and searched against the Swiss‐Prot human (or other species) database, allowing a maximum of two missed cleavage sites. Search criteria need to be set such as trypsin specificity, variable modification as carbamidomethyl (C), oxidation (M), iTRAQ4plex (K) and iTRAQ4plex (N‐term) if using iTRAQ. Precursor mass tolerance is set to 10 ppm and the fragment mass tolerance is set to 50 mmu to prevent precursor interference. The strict target FDR of the decoy database search is set at 0.01 and the relaxed target FDR was set at 0.05. Only peptides satisfying all the following criteria were considered as qualified peptides and subjected to further analyses: (i) the peptide is labelled with iTRAQ tags, (ii) the peptide is considered as confidently identified (FDR < 0.01) and (iii) the peptide is unique for protein identification [23].
In this case study, I introduce how to use multiple proteomics to discover a drug target and further decipher the molecular mechanism by the drug.
As described previously, 2DE coupled with MS can be also used to identify differential proteins from different samples such as normal and tumour tissues. In the study [42], we performed 2DE and MALDI‐TOF‐MS to identify the tumour‐specific protein expression in breast carcinoma. A list of upregulated proteins in cancerous tissue might be promising drug targets. Among these upregulated proteins, ATP synthase β‐subunit was found to be expressed at high levels in the tumour tissue. Similar results showed that ATP synthase β‐subunit in lung cancer tissue was upregulated compared to adjacent normal tissues from patients.
ATP synthase is a ubiquitous multimeric protein complex that catalyses the synthesis of ATP, the common ‘energy currency’ of living cells. This molecular machine consists of two moieties, a transmembrane portion (Fo), the rotation of which is induced by the proton gradient, and a globular catalytic moiety (F1) that synthesises ATP [43]. In 1966, Peter D. Mitchell proposed the chemiosmotic hypothesis that a proton‐motive force across the inner mitochondrial membrane is the immediate source of energy for ATP synthesis [44]. Paul D. Boyer proposed the ‘binding‐change hypothesis’, a detailed elucidation of the mechanism by which ATP synthase catalyses the synthesis of ATP [45,46]. John E. Walker determined the DNA sequence of the genes encoding the proteins in ATP synthase [47–49] and the first X‐ray structure of F1, the soluble fraction of ATP synthase [50]. Mitchell, Boyer and Walker were awarded the Nobel Prize in Chemistry in 1978 and 1997. In 2013, Martin Karplus was awarded the Nobel Prize in Chemistry for the development of multiscale models for complex chemical systems. His research was also related to the smallest biological rotatory motor, F1‐ATPase.
In general, it is localised to the mitochondrial inner membrane. Recent studies showed that ATP synthase was also found on the extracellular surface of endothelial cells in some cancer tissues, lymphocytes, hepatocytes, paraganglioma, proliferating cell lines, breast and lung cancer cells [ 42 51–57]. With the property of facing outside the cell, this kind of ATP synthase is called ‘ectopic ATP synthase’. Ectopic ATP synthase not only functions as an energy generator but also as proton channels and receptors for various ligands, which are involved in numerous biological processes including the mediation of intracellular pH, cholesterol homeostasis, the regulation of the proliferation and differentiation of endothelial cells, and the recognition of immune responses of tumour cells [58–60]. Ectopic ATP synthase together with the whole respiratory chain are localised on C6 glioma and lung cancer cell surfaces [56,61]. Ectopic ATP synthase is expressed on the surfaces of various cancer cells, but not on normal or normal‐like cells; therefore, researchers suggest that ectopic ATP synthase is a potential molecular target for anti‐tumour and anti‐angiogenesis therapies [ 24, 32, 42, 51,53, 56, 57 61–63]. Inhibition of ectopic ATP synthase showed inhibitory effects on cell proliferation in various cancer cells, suggesting the oncogenic role of ectopic ATP synthase in tumorigenesis.
To investigate the effects of targeting ectopic ATP synthase on protein expression in lung cancer, comprehensive time‐course protein expression profiles using 2DE‐based proteomics were analysed after ectopic ATP synthase inhibitor citreoviridin treatment [ 56, 57]. The amounts of protein spots were quantified using ImageMaster and proteins were identified using MS, where 49 proteins were successfully identified and further mapped to PPI networks. The constructed PPI networks were analysed by gene ontology (GO) functional enrichment. The results indicated that protein folding, negative regulation of ubiquitin‐protein ligase activity involved in mitotic cell cycle and mRNA processing were major functions affected by citreoviridin. Further biological experiments such as rescued experiments and western blotting showed that citreoviridin indeed induced a reactive oxygen species (ROS) dependent unfolded protein response (UPR). The similar 2DE‐based method was applied to a breast cancer study [57] and found the same results as those in lung cancer.
Protein phosphorylation is a reversible, ubiquitous, and fundamentally PTM and plays a significant role in a wide range of physiological processes in both eukaryotic and prokaryotic organisms and is one of the major modes of regulation in signal transduction, growth control and metabolism [ 24, 41 64–66]. Dimethyl labelling of peptides, a quantitative labelling method described above, was used for this study. The phosphopeptide was enriched by HAMMOC [67,68] and further applied to LTQ‐Orbitrap XL (Thermo Scientific, Berman, Germany) equipped with a nanoACQUITY UPLC system (Waters Corp., Milford, MA). MS raw data were analysed by MaxQuant software version 1.3.0.5 [5]. Phosphorylation motif and clustering analyses were performed by Motif‐X algorithm [69] and the fuzzy c‐means algorithm implemented in the ‘Mfuzz’ package for R [70], respectively. The results showed that after treatment with citreoviridin, 41 motifs containing 33 serine (Ser) and 8 threonine (Thr) phosphosites were overrepresented. To associate the enriched motifs with specific kinases, similarities between enriched motifs and kinase recognition motifs were examined and 300 kinase recognition motifs from the PhosphoNetworks database [71] were obtained. To elucidate citreoviridin‐induced signalling pathways, we also performed mathematical modelling using the time‐series phosphoproteome data to construct the response network with citreoviridin treatment. The results from a clustering and constructed response network displayed the temporal relation of phosphorylated HSP90 and MAPK1 in citreoviridin treatment [24]. Additionally, site‐directed mutagenesis was used to change HSP90 phosphosite Ser255 to E255 and A255 and further western blotting analysis was performed to measure the protein expression levels of the MAPK/ERK pathway. The results showed that phosphosite Ser255 of HSP90AB1 is crucial for MAPK/ERK1 signalling.
As described previously, iTRAQ is one of the most popular chemical labelling methods. Therefore, in case study 2, two iTRAQ quantitative proteomic applications for the study of molecular mechanisms are introduced.
As described in case study 1, we understand that targeting ectopic ATP synthase using citreoviridin is a potential therapeutic strategy [32]. To investigate the in vivo molecular mechanism by citreoviridin in lung cancer, we employed the xenograft mice model, where 5 × 106 CL1‐0 cells resuspended in 0.1 ml Hanks' balanced salt solution (HBSS) were mixed with matrigel and injected subcutaneously into NOD.CB17‐Prkdcscid female mice (four to five weeks old) for tumour growth. After tumour volumes reached 100 mm3, animals were randomly assigned into two groups: those receiving intraperitoneal injection with the vehicle control (DMSO) or the ATP synthase inhibitor (citreoviridin). Tumours (control and treatment) were excised for further iTRAQ quantitative proteomics. Tumour tissues were ground into a fine powder using the method described in the previous section ‘sample preparation’. Moreover, after conducting reduction, alkylation and digestion of proteins, the peptides were resuspended in iTRAQ dissolution buffer and follow the method described in the ‘iTRAQ method’. Peptide samples from the control tumour (C1 and C2) and the citreoviridin treated tumour (T1 and T2) were labelled with iTRAQ 114, 115, 116 and 117 tags, respectively. For small‐scale profiling, only 277 proteins with an FDR of 3.51% were identified. Therefore, we performed fractionation by SCX chromatography and obtained 2659 proteins with an FDR of 2.22%.
Due to measurement errors in the experiments and individual variations from biological replicates of samples, after protein identification, normalisation is necessary for accuracy in protein quantitation. In this study, we compared and evaluated seven different normalisation methods and found that the method ‘the sum of intensities in protein quantitation’ is the best. The iTRAQ signature ion intensities of peptides matching the protein were summed and the protein abundance ratio was calculated as dividing a sample's summation of intensities to another sample's summation of intensities. This is a weighted calculation because the larger intensity contributes more to the protein abundance ratio. For each protein, we calculated four protein abundance ratios, T1/C1, T2/C2, T2/C1 and T1/C2. The R value of each protein, which represents the relative abundance of the protein, was calculated with the four protein abundance ratios, T1/C1, T2/C2, T2/C1 and T1/C2. The S value of each protein, which represents the error of protein abundance ratios, was calculated by its protein abundance ratios, T1/C1 and T2/C2. Each protein had one S value and the distribution of S values can be considered as the distribution of errors. Assuming that the errors follow a normal distribution, a 1.96‐fold of the standard deviation (1.96 SD) of S values is statistically significant (P < 0.05) and can be taken as the cut‐off value. The proteins with R values larger than the cut‐off value +1.96 SD of S values can be taken as upregulated proteins. On the other hand, the proteins with R values smaller than the cut‐off value (1.96 SD) of S values can be taken as downregulated proteins. These proteins are considered as ‘differentially expressed proteins’, which can be further analysed for functional enrichment and pathway mapping. Here, in a large‐scale experiment among 2659 proteins, 144 proteins were differentially expressed and applied for further analysis.
In the gene ontology biological process, DAVID Bioinformatics Resources [72,73] were used for the functional enrichment. The top two GO biological process clusters were related to glucose metabolism and protein ubiquitination process. MetaCore was used for pathway mapping. The top pathway map enriched was the glycolysis and gluconeogenesis pathway map, which is related to glucose metabolism. Furthermore, expressions of the seven glucose‐metabolism related proteins validated by western blotting were all upregulated in citreoviridin‐treated tumour samples, which confirmed the results of the proteomic analysis. The results indicated that citreoviridin may reduce the glycolytic intermediates for macromolecule synthesis and further inhibit cell proliferation and tumour growth in vivo.
Infrared (IR) light is divided into three major regions based on wavelength: near‐infrared (NIR, 0.76–1.5 µm), middle‐infrared (MIR, 1.5–5.6 µm) and far‐infrared (FIR, 5.6–1000 µm) [22]. NIR and FIR were used for therapy such as pain relief, muscle fatigue and inflammatory osteoarthritis [74,75]. In the study, we found MIR inhibited the cell growth and altered the morphology of breast cancer cells. To investigate MIR‐interfered networks, we performed iTRAQ quantitative proteomics in breast cancer MDA‐MB‐231 cells.
The experimental design was similar to that in Section 5.4.2.1. Peptides from the control were labelled with iTRAQ reagent 114 and iTRAQ reagent 115; peptides from the MIR treatment were labelled with iTRAQ reagent 116 and iTRAQ reagent 117. The same normalisation method as in Section 5.4.2.1 was used. In the study, we did not select differentially expressed proteins for functional enrichment; instead we used GSEA, which uses a running‐sum statistic for the whole gene (or protein) set on a rank list of all the available expression values [19]. In MIR‐induced MDA‐MB‐231 cells, the proteasome pathway, p53‐dependent DNA damage, anaphase‐promoting complex (APC) and Cdc2/CDK1‐mediated degradation, mitotic pathway, tumour necrosis factor (TNF) pathway, insulin glucose pathway, and integrin and cell‐to‐cell were enriched, indicating that MIR might regulate cell cycle progression, induce DNA damage, alter nucleotide metabolism and affect cell adhesion related functions. We further validated the enriched functions using several biological experiments. Indeed, MIR induced cell cycle arrest at the G2/M phase and DNA damage, caused cytoskeleton rearrangement and reduced cell mobility and invasion ability of MDA‐MB‐231 cells.
In the post‐genome era, proteomics has become essential to understanding the functions of genes and proteins in living cells and organisms. Over the past decades, proteomics has opened our eyes to a global view and to the dynamic interactions when studying complicated biological systems. Many high‐throughput and sensitive methods have been developed for proteome profiling, e.g. MS and proteome microarrays. Although data‐independent acquisition (DIA) MS produces a vast amount of data, the techniques in proteomics have still a long way to go before they reach maturity.
Proteome data quality and quantity depend on sample preparation, so how to collect and prepare protein samples as well as label peptides is a critical issue. Similar to the analysis of genome data, bioinformatics is necessary for proteome data analysis. How to extract peptide sequences from mass raw data accurately and how to figure out the biological meanings, pathways and signalling, rely on bioinformatics tools and software. Therefore, the analysis tools should combine informatics techniques such as artificial intelligence and machine learning for a true understanding of the proteome data.
Proteomics can be applied to not only discovery of clinical biomarkers, drug targets and investigation of molecular mechanisms but also plant science and palaeontology. Proteomics provides the insight for disease diagnosis and therapy. The journey from a single protein to proteome to PPIs and functional modules is challenging but exciting.
This work was supported by the Ministry of Science and Technology (MOST 105‐2320‐B‐002‐057‐MY3 and MOST 106‐2320‐B‐002‐053‐MY3). The author thanks Dr. Chantal Ho Yin Cheung for drawing the figures and Professor Hsuan‐Cheng Huang for proofreading the draft of this chapter.