RSS 2.0
  • Home
  • About
  • Aligners
  • Genomes
  • VarScan
  •  

    A Guide for Deep Sequencing of Human Genomes

    August 26th, 2011

    The incredible throughput of current second-generation sequencing platforms makes it possible to sequence a complete human genome to high coverage, with a single instrument run, in less than 2 weeks. As whole-genome sequencing becomes more routine, it is increasingly important to understand the accuracy of sequence-level analyses, such as SNP detection, and its relationship to overall sequence depth. Enter a recent study from the lab of Elliott Margulies at NHGRI. As part of the NIH Undiagnosed Diseases Program, the authors generated over 380 gigabases of sequence data from the blood sample of a male patient. This is an astonishing amount of sequence for one sample, roughly 126-fold theoretical redundancy genome-wide.

    Perhaps just as importantly, the dataset comprised four runs on two different but related platforms: the Illumina GAIIx, and the Illumina HiSeq2000. Here is a brief summary of the dataset.

    Dataset Total Gbp Map Rate Dup. Rate Mapped Depth % Genome Callable
    GAIIx (14 lanes) 118 95.3% 3.9% 34.2x 88.82%
    HiSeq A (8 lanes) 122 94.0% 13.7% 32.7x 90.99%
    HiSeq B (8 lanes) 144 92.6% 8.7% 40.4x 93.10%
    All (30 lanes) 384 93.9% 13.6% 102x 95.88%

    With this impressive dataset in hand, the authors undertook a detailed examination of the technical aspects of sequence analysis: coverage uniformity, platform comparisons, genotyping accuracy, etc. and seek to answer two questions:

    1. Given a specific amount of sequencing data, what fraction of the genome is “callable”?
    2. How many SNVs can be accurately identified?

    The results, I think, are critically important in the near future as whole-genome sequencing becomes routine and widely accessible to investigators.

    Coverage Versus Callability

    The authors correctly note that while many studies report “coverage” of genomes or exomes in terms of minimum depth achieved (1x, 5x, 10x, etc.), this metric alone is insufficient. Namely, it may not include information about alignment and quality filters, as well as the requirements of genotype calling algorithms. A better approach might be to report the fraction of the genome/exome that is “callable” -  where genotypes can be determined with at a specified confidence threshold when all filters are applied. This term is roughly equivalent to what the 1000 Genomes Projects calls the “accessible” portion of the genome. In this study, the authors calculate callability by:

    1. Starting with reads that pass the Illumina chastity filter
    2. Further removing reads with <32 Q20 bases
    3. Mapping reads to the reference sequence using BWA
    4. Removing duplicates (using SAMtools rmdup)
    5. Considering only bases with quality >= 20.
    6. Requiring a genotype probability score of 10.

    The last metric refers to the score from the group’s Bayesian genotype calling algorithm, Most Probable Genotype (MPG). An MPG score of 10 is a log-scaled value indicating a 1/e^10 (that’s 1/22026) theoretical probability of being incorrect. By these criteria, 88.82% of the genome was callable in the GAIIx dataset (34.2x mapped depth) and 90.99% was callable in the HiSeq-A dataset (32.7x).

    You may notice that the GAIIx platform had more mapped bases but yielded a lower callability than HiSeq-A, and wonder, how could this be? It has long been observed that coverage is non-uniform across the genome and follows a Poisson distribution, influenced by factors such as read length, region mappability, and GC content. Although the amount of sequence data was similar, HiSeq platforms achieved a more uniform coverage than GAIIx, yielding more callable bases genome-wide.

    GAIIx vs HiSeq Coverage of the Genome and Exome

    To enable some direct comparisons, the authors normalized the HiSeq2000 data into a set of equivalent size to the GAIIx datset (34.2x average mapped depth), then assessed coverage of the genome as well as the exome (here defined as ~34 Mbp of non-redundant coding sequence from the UCSC Known Genes). Here’s a plot of the Q20 coverage for GAIIx and HiSeq values from Supp. Table 1.

    On both platforms, around 97% of the genome was covered by at least one read. At 10x coverage, however, GAIIx covers 89.4% of the genome whereas HiSeq covers 92.2%. These differences were even more pronounced in the exome, where GAIIx and HiSeq covered 67.4% and 76.2% of the exome at 10x, respectively. Since both platforms performed unbiased whole-genome sequencing, the authors conclude that HiSeq’s superior coverage comes from a better representation of high-GC-content sequences, which tend to have higher gene density.

    Filters for Accurate Genotype Calling

    The authors next undertook a careful experiment to establish appropriate filters for SNV calling genome-wide. Pooling all Illumina data together, they generated two equal-sized datasets with an average mapped coverage of 50x by random read sampling. Next, they compared genotype calls at all bases that were “callable” with MPG score >=10. Among the 2.8 billion positions (98.3% of the genome) that met these criteria in both datasets, there were 46,580 discordant genotypes. Many of these, unsurprisingly, arose from sequence reads that were improperly aligned (misplaced, or locally mis-aligned). To address this, the authors removed reads with mapping quality <30 from both datasets. This mapping quality filter reduced the comparison set to 93.6% of the genome, but removed 81% of discordant calls.

    Among the 8,710 remaining discordant positions, the authors observed consistently lower MPG scores than were seen among concordant positions, particularly at high coverage sites. They made perhaps one of the most useful inferences of this study: that genotype accuracy can be improved by requiring higher probability scores at higher sequence depths. Basically, they required that, for a given position, the ratio of MPG score to Q20 coverage be at least 0.5. The confidence-by-depth filter removed 61.5% of discordant positions but reduced callability by just 0.02%.

    Finally, the authors employed the widely used strategy of removing SNV calls within 10 bp of called indels. This indel-nearby filter removed 26% of the remaining discordant positions, while reducing callability by 0.43%. Thus, by applying three filters aimed at reducing false positives, the authors removed 96.4% of discordant positions and maintained callability across 93.13% of the genome.

    How Many Variants Can Be Detected?

    The next experiment was quite interesting: the authors pooled all Illumina data, and progressively added reads to create datasets of 5x, 10x, 15x mapped coverage, all the way up to 100x. In each dataset, they applied their variant calling with all filters, then reported the number of SNVs that were identified. I’ve generated a plot of the number of SNVs called genome-wide by dataset:

    At 30x, which might be considered a de facto standard, around 3 million variants were identified. Each new depth adds perhaps 10,000 variants, but at 50x the discovery power is nearly saturated (3.32 million, or 95% of the total). Very little is gained going from 50x to 105x, although, if the relationship between genes, GC content, and callability holds true, many of these could be coding variants. In summary, deep resequencing of a sample to 105-fold coverage tells us that a typical human genome contains around 3.5 million SNPs. That’s very close to estimates from the personal genomes that have already been published (~3.1 m to 4.1 m SNPs), which I find reassuring. It would be informative to see a similar experiment on a sample of African origin, where the number might be closer to 4.5 million.

    The Sweet Spot of Coverage and Callability

    Based on these experiments and their callability calculations, the authors estimate that generating 50x mapped coverage (60x before read mapping/filtering are applied) renders ~95% of the genome and ~81% of the exome callable. Intriguingly, however, the authors note that they’d sequenced an unrelated sample using the latest HiSeq chemistry and basecalling software, achieving the same level of callability with just 35x mapped coverage. If anything, this emphasizes that (as the authors suggest), a “callability” metric is far more informative to report when describing the resequencing of human genomes.

     

    References
    Ajay SS, Parker SC, Ozel Abaan H, Fuentes Fajardo KV, & Margulies EH (2011). Accurate and comprehensive sequencing of personal genomes. Genome research PMID: 21771779

    AddThis Social Bookmark Button

    NGS and the Hallmarks of Cancer

    January 28th, 2011

    Massively parallel sequencing will be applied to hundreds or thousands of tumor genomes this year. Catalogues of somatic alterations in human cancers (e.g. COSMIC) will grow, perhaps as exponentially as dbSNP did in the past decade. Perhaps more importantly, we will begin to see cases where whole-genome or whole-exome sequencing of a patient’s tumor guides his or her treatment. Bridging the gaps between mutation discovery, biological interpretation, and clinical action, however, will be a substantial challenge. Hence the theme of this month’s posts, cancer biology and pathology.

    Just over a decade ago, Douglas Hanahan and Robert A. Weinberg published the landmark article “The Hallmarks of Cancer” in the journal Cell. At the time, nearly a quarter-century of rapid advances had revealed a wealth of knowledge about this deadly disease. Although more than 100 subtypes of cancer had been described, Hanahan and Weinberg described six principal cellular traits shared by virtually all forms of human cancers. Collectively, these essential alterations in cell physiology dictate tumor development and growth.

    hallmarks-of-cancer

    Credit: Hanahan and Weinberg, Cell (2000) 100:57-70

    Each of these six acquired capabilities – evasion of apoptosis, self-sufficiency in growth signals, insensitivity to growth inhibition signals, limitless replicative potential, sustained angiogenesis, and tissue invasion/metastasis – represents the successful circumvention of inherent anticancer defense mechanisms of cells and tissues.

    Genomic Instability and Driver Mutations

    Most of these acquired capabilities arise from somatic alterations – mutations, structural events, and epigenetic changes – which presents something of a dilemma. Thanks to a swath of fastidious DNA monitoring and repair enzymes, mutation is a rare event, and altering the critical genes to successfully acquire each capability is inefficient. For a single cell to achieve all of them in the span of a human lifetime is, well, statistically improbable. The authors suggest a seventh principle, not a hallmark of cancer but a universally enabling characteristic, to explain the means by which these six biological endpoints are reached: genomic instability.

    Mutations that cause genomic instability are likely critical, early events in tumorigenesis. Studies have shown that mutations in DNA repair genes (e.g. ATM, RAD51, CHEK1) and, more recently, components of DNA methylation pathways (DNMT3A/DNMT3B) are recurrently mutated in human cancers, suggesting an important functional role in disease development.

    Hallmark 1: Self-sufficiency in Growth Signals

    Autonomous growth signaling was the first hallmark to be defined by cancer researchers, due in part to the large number of oncogenes that modulate it. Three common molecular strategies are used by tumors to provide self-sufficient growth stimulation:

    1. Modulation of extracellular growth signals, for example, the production of PDGF and TGF-alpha by glioblastomas and sarcomas, respectively.
    2. Alteration of the trans-cellular signal transducers (surface receptors), e.g. the up-regulation of EGFR in stomach/brain/breast tumors and HER-2 in mammary tumors.
    3. Deregulation of the intracellular signaling pathways linked to transmembrane receptors, such as the Ras/Raf/mitogen activated protein kinase (MAPK) cascade.

    The authors suspected that growth signaling pathways suffer deregulation in all human tumors. Ten years later, we know that this is largely true. Large-scale sequencing efforts have revealed that mutations in Ras-family genes (KRAS, NRAS, HRAS, etc.) and MAP kinase genes are frequent events in human cancers. In breast cancer, for example, PI3 kinase genes are among the most highly mutated, suffering alterations in as many as 40% of tumors.

    Hallmark 2: Insensitivity to Antigrowth Signals

    In normal tissue, both soluble factors and matrix-embedded inhibitors cooperate to maintain homeostasis by blocking cell growth. Much like growth stimulation, these signals are transduced to cells via transmembrane receptors and then into complex intracellular circuits. At the molecular level, most or all anti-proliferative signals are funneled through retinoblastoma (Rb) related proteins. When hypophosphorylated, Rb sequesters and alters the functions of E2F transcription factors, which normally serve to activate a number of genes required for transition from G0 to S-phase.

    The best documented modulator of Rb signaling is TGFB, a soluble signaling molecule that suppresses cell growth. TGFB prevents the phosphorylation that activates Rb, thereby blocking the advance through G1. Tumor cells disrupt TGFB signaling by a number of mechanisms, including downregulation or alteration of its cellular receptor, or mutation of the key transducer of TGFB signaling, Smad4. One way or another, the anti-growth circuit converging on Rb is disrupted in a vast majority of human malignancies, virtually “defining the concept” of tumor suppressor loss in cancer.

    Hallmark 3: Evasion of Apoptosis

    I won’t dwell much on this topic, since much of it was covered in my post on Cancer Versus the Immune System. Simply put, evasion of apoptosis is a hallmark of many and perhaps all human cancers.

    Hallmark 4: Limitless Replication Potential

    Most types of mammalian cells carry intrinsic programs that limit their replication. Once cells reach a certain number of divisions, they stop growing, or senesce. In culture, human fibroblasts can be forced to keep dividing (by knocking out p53 and pRb tumor suppressors) beyond this point. These cells eventually enter a crisis state characterized by karyotypic disarray and massive cell death. A small fraction of cells, however, continue to grow and divide without limit, a trait known as immortalization.

    The limit for most normal cell types is 60-70 divisions, after which they enter senescence. Obviously tumor cells surpass this limit, managing to grow and progress even while undergoing massive apoptosis. One key rate-limiting mechanism in cell division is the length of chromosome telomeres, which decreases by 50 to 100 base pairs with each consecutive cell division. At some point, telomere loss introduces massive genetic instability, and crisis ensues.

    Many, if not all tumor cells address this issue by up-regulating telomerase, the enzyme that extends telomeres. This maintenance operation is a key component for enabling limitless replication potential in tumor cells.

    Hallmark 5: Sustained Angiogenesis

    Here’s something I didn’t know: virtually all cells must remain within 100 um of a capillary blood vessel to get the oxygen and nutrients they need to survive. You’d think that because of this limitation, rapidly proliferating cells must have an intrinsic ability to induce angiogenesis (blood vessel growth). It turns out, not so much. The ability to stimulate angiogenesis is not inherent in normal cells or developing neoplasias, and represents an acquired capability that successful tumors must achieve.

    Like the other biological processes discussed in this review, angiogenesis is encouraged or prevented by a complex network of signaling molecules. Soluble factors, cell surface receptors, integrins, and cell adhesion molecules all play a role in the counter-balancing of blood vessel growth. Vascular endothelial growth factor (VEGF) and fibroblast growth factors (FGF1/FGF2), for example, are molecules that stimulate angiogenesis initiation. Thrombospondin-1 is an important inhibitor of this process.

    Tumor cells encourage blood vessel invasion/growth by up-regulating inducers and suppressing inhibitors, often at the level of transcription. Loss of TP53, for example, causes thrombospondin-1 levels to fall, thereby reducing the latter’s inhibitory potential. Similarly, Ras activation and loss of the VHL tumor suppressor induce an up-regulation of the gene encoding VEGF. More recently, we’ve come to realize that the transmembrane receptors for angiogenesis-stimulating molecules (VEGFR, FGFR) are commonly mutated in human cancers.

    Hallmark 6: Tissue Invasion and Metastasis

    Eventually, tumor cells venture out from the primary lesion to colonize and grow in other, often distant parts of the body. Ultimately, it is these metastases that account for 90% of cancer deaths. Several families of proteins involved in tethering cells to their surrounding tissue are altered during this process. Perhaps the best-known of these are cell-cell adhesion molecules (CAMs) and integrins, which mediate cell-cell interactions and cell-matrix interactions, respectively. One example of the communication between cell and environment is offered by E-cadherin, which is expressed on the surface of epithelial cells. Bridging of E-cadherin receptors between adjacent cells triggers anti-growth signals within the cell (Lef/Tcf transcription factor activation) via cytoplasmic B-catenin. A number of epithelial cancers block this pathway, either by mutational inactivation of E-cadherin or B-catenin genes, transcriptional repression, or proteolysis of the extracellular E-cadherin domain.

    Integrins have dozens of subtypes with distinct substrate preferences. Successful colonization by tumor cells at a distant site requires adaptation, which is often achieved through shifts in the spectrum of integrin alpha- and beta-subunits displayed by migrating cells. Support for this idea can be seen even in cell culture, where forcing expression of different integrin subunits can induce invasive and metastatic behavior. This aspect of tumor progression will be especially challenging to characterize, because there are large numbers of integrin genes and many, many unique heterodimeric receptors that can be generated by differential subunit expression.

    Summary

    It was clear even a decade ago that the innate defense mechanisms of cells to prevent transformation and metastasis are diverse and complex, and the processes by which tumor cells subvert those defenses are equally so. Nevertheless, Hanahan and Weinberg postulated that 10-20 years after the time of writing this review, diagnosis of virtually all somatic lesions within a tumor would be a routine procedure, as would comprehensive gene expression analysis. With such knowledge in hand, it would be possible to definitively test whether all tumor types behave according to a set of common rules like the ones outlined above. We aren’t quite able to provide those answers just yet, but given the rapid advances of next-gen sequencing, that day is soon coming.

    References
    Hanahan, Douglas, & Weinberg, Robert (2000). The Hallmarks of Cancer Cell, 100 (1), 57-70 DOI: 10.1016/S0092-8674(00)81683-9

    AddThis Social Bookmark Button

    Cancer Versus the Immune System

    January 21st, 2011

    The human immune system is an incredible success story of evolution. It defends against a constant barrage of external threats – bacteria, viruses, and other pathogens – and, as I’ve recently learned, protects against an intrinsic threat: cancerous cells. In their review “Natural and Adaptive Immunity to Cancer“, Vesely and colleagues draw from recent mouse models of cancer and human clinical data to describe how cells, effector molecules, and pathways of the immune system act to suppress and control tumor cells. It’s not all good news, however. Apparently, certain immune system pathways (e.g. inflammation) instead serve to promote tumor growth.

    The Immune System Strikes: Senescence and Apoptosis

    Cells already have an array of intrinsic defense mechanisms that halt the transformation process. Numerous cellular proteins detect DNA damage and induce senescence, a permanent change of state characterized by morphological and gene expression changes. The activation of oncogenes, too, can trigger senescence. In fact, the hijacking of Ras signaling to escape senescence and proliferate is a key requirement for cell transformation. Alternatively, cells that sense injury or loss of mitochondrial integrity may undergo programmed cell death (apoptosis). This process may also be initiated externally by the ligation of tumor necrosis factor (TNF) family ligands to their corresponding receptors: TNF, TNF-related apoptosis-inducing ligand (TRAIL), and Fas ligand (FasL). There are still other, non-apoptotic paths to cell death (necrosis, autophagy, mitotic catastrophe) that are gaining attention as barriers to transformation.

    How the Immune System Prevents Cancer

    The immune system has three key responsibilities when it comes to preventing cancer:

    • Suppression of viral infections, which when unchecked can induce certain kinds of tumors
    • Timely elimination of pathogens, to reduce the extent and duration of inflammation, which often promotes tumorigenesis
    • Immunosurveillance, in which transformed cells are identified and destroyed before they can establish malignancy.

    The idea that the immune system might recognize and destroy tumor cells was conceived 50-100 years ago. This concept of “immunosurveillance” remained controversial, and saw little progress until the 1990′s. Does this story sound familiar? It’s much like the story of cancer and the metabolism, which also saw a long period of general ignorance before its “rediscovery” in the 1990′s. Mice get the credit for rekindling interest in the immune system’s tumor suppressor potential. Specifically, mice that were immunocompromised after loss of interferon (IFN) signaling or T-cell function. Such animals were significantly more susceptible to sarcomas after exposure to methylcholanthrene (MCA), implicating a role for the immune system in preventing these tumors in healthy mice.

    Over the last 10 years, work from many labs (including the authors’) has demonstrated how the immune system works to prevent outgrowth of many types of primary and transplanted tumors. The RAG2-knockout mouse, which is deficient in T-cells, B-cells, and natural killer (NK) cells, develops more spontaneous cancer lesions and is also more susceptible to MCA-induced sarcoma. Interestingly, a significant portion (40%) of tumors that develop in RAG2-knockout mice are rejected when transplanted to immunocompetent (wild-type) mice, demonstrating that normal immune system function successfully suppresses these cells. Sarcomas induced in wild-type mice (with MCA), however, grow unrestricted when transplanted to other mice. These observations suggest a dual role for the immune system: in wild-type mice, it protects against tumor development, but also edits the immunogenicity of developing tumors, allowing them to grow unimpeded when transplanted to healthy mice.

    The Three E’s: Elimination, Equilibrium, and Escape

    The authors have come to view immunoediting as a dynamic process with three distinct phases:

    Credit: Strausberg, Genome Biol. (2005) 6:211

    Credit: Strausberg, Genome Biol. (2005) 6:211

    1. Elimination, when innate and adaptive immune cells work together to identify and destroy tumor cells before a malignancy can form.
    2. Equilibrium, a phase when the immune system contains tumor outgrowth but does not eliminate transformed cells entirely.
    3. Escape, in which tumor cells grow unrestricted by the immune system, and develop into clinically apparent disease.

    Both elimination and equilibrium might be considered satisfactory clinical endpoints for a patient, because tumor cells are either destroyed entirely or held in check to prevent outgrowth of disease.

    The transition from equilibrium to escape is facilitated, at least in part, by the micro-evolution of the tumor cells during equilibrium. The selective pressure of immune recognition and destruction selects for tumor cells that are less immunogenic. Also aiding tumor escape is the breakdown of the immune system, either naturally (as a person ages) or as a direct result of immunosuppression (often induced by the tumor).

    The Mouse Evidence: Knockout and Induced Tumors

    Humans and mice have similar immune systems, with a largely overlapping repertoire of immune cells and effector molecules. The development of mouse strains deficient for specific genes, and the induction of tumors by carcinogens MCA (sarcoma) and DMBA/TPA (papilloma) have demonstrated that NK cells and cytotoxic lymphocytes (CTLs) suppress tumor initiation and growth in vivo. Interferon signaling also plays a key role in immunosurveillance, as demonstrated by the increased tumor susceptibility in mice lacking perforin, IFN-γ, IFNGR1, TRAIL, IL-12, TNF-α, and DNAM-1.

    Numerous cytokine molecules and receptors have also been implicated in controlling induced tumors. Mice deficient in IL-12, for example, develop increased numbers of papillomas than wild-type mice. Interestingly, mice lacking IL-23 or IL-17A are resistant to tumor development, suggesting a tumor-promoting role for these cytokines. Interestingly, DMBA/TPA exposure in mice lacking the TRAIL receptor did not affect the number of induced tumors, but did increase the rate of metastasis to lymph nodes (compared to wild-type mice), indicating a role for TRAIL-R in suppressing metastasis.

    Aging Studies and Spontaneous Tumor Development

    The incidence of spontaneous tumors in normal mice is very low, possibly because they have long telomeres. Many strains of immunodeficient strains fail to develop tumors even after two years of observation. Aging studies in knockout mice, however, have elucidated the roles of certain  genes, effector molecules, and immune cells in the defense against spontaneous tumors. This is an elegant type of experiment that requires some patience; one simply removes specific components of the murine immune system and monitors them for spontaneous tumor development. One striking discovery highlighted in this review was the incidence of immunogenic B-cell lymphomas, which increases from 0-6% in wild-type mice to 40-60% in mice lacking perforin, a cytolytic protein used by NK cells and T-lymphocytes. Penetrance of lymphomas in these mice is even higher when they also lack MHC class I accessory molecules (B2M) or IFN-γ. These observations support the importance of “cytotoxic” immune cells in protecting against spontaneous tumors.

    Aging experiments have also been performed in mice lacking specific immune cell types. RAG-2 knockout mice, for example, develop significantly more ephithelial tumors (35% gastrointestinal, 15% lung), even when raised on broad-spectrum antibiotics in a pathogen-free facility. RAG-2 knockouts that also lack STAT1, a key player in interferon I/II signaling, develop an earlier and broader spectrum of malignancy, including colon and mammary adenocarcinomas.

    Loss of Equilibrium

    The equilibrium phase, in which the immune system holds tumors in check but fails to eliminate them entirely, is an interesting phenomenon. Here we observe a dynamic balance between a powerful immune system response and a genetically heterogeneous population of tumor cells that can persist for a number of years. It has become clear that adaptive immunity, and not innate immunity, takes the lead in controlling tumor outgrowth. This has been demonstrated by experiments in which healthy mice are subjected to low levels of carcinogen exposure (which tends to induce few tumors) and later depleted for CD4+/CD8+ T-cells and/or IFN signaling. As many as 50% of apparently tumor-free mice develop sarcomas at the injection site upon this depletion, suggesting that micro-tumors were present but held in check by adaptive immunity. Granted, the tumors that arise after immunodepletion tend to be highly immunogenic; when transplanted to healthy mice, 40% are rejected by the competent immune response. In contrast, sarcomas obtained from mice that were not immunodepleted tend to grow progressively when transplanted.

    The Human Evidence: Immunodeficency and Immunosuppression

    Although we have fewer experimental liberties with human subjects, clinical and epidemiological data have proven useful. Human patients with specific perforin mutations, for example, not only develop familial hemophagocytic lymphohistocytosis as adults, but have recently been shown to also develop leukemia and lymphoma. Surveillance of human patients with AIDS has shown an increased frequency of several malignancies due to the immunodeficiency. Most often, these tumors are induced by pathogens, such as Epstein-Barr virus (lymphoma), herpesviruses (Kaposi’s sarcoma), and human papilloma virus (cervical cancer) that fail to be eliminated by the deficient immune system.

    Intentional immunosuppression in the recipients of organ transplants can also increase the risk of cancer. Patients receiving kidney transplants, for example, exhibit a three-fold increase in overall malignancy. Most of these, too, are virus-associated tumors, though there’s also an increased risk for colon, lung, pancreas, and other non-infectious cancers. Renal transplant patients are a dramatic example; these individuals have a 200-fold (yes, two hundred) risk for non-melanoma skin cancers, highlighting the importance of immunosurveillance in tumors induced by exposure to UV radiation. Further, the duration of pharmacology-induced imunosuppression and incidence of cancer are positively correlated; that is, the longer the immune system is suppressed, the more likely a tumor will form. Taken together, these observations support the importance of immunosurveillance in preventing human cancers.

    Further evidence of the immunity-cancer relationship, particularly the equilibrium phase, is offered by the occasional organ recipients who develop cancer that originated from the organ donor. I’m horrified to hear that this can happen, but it does. Often, the donors had died of other causes and bore no signs of clinically-detectable disease, suggesting that their immune systems had held cancerous cells in check. The combination of a naive immune system, and immunosuppressive therapies required for successful engraftment, allows these tumors to grow without restriction in the unfortunate recipient.

    Miracles Happen: Spontaneous Tumor Regression

    Perhaps the most compelling evidence for the anti-cancer role of the immune system is the spontaneous regression of melanoma tumors accompanied by T-cell clonal expansion. This phenomenon suggests the ability of CD4+ and CD8+ T-cells to identify tumor-specific antigens and destroy cancerous cells. As many as 100 tumor-associated antigens (TAAs) generate an antibody response in patient serum, though only 8 have been observed in multiple studies. This suggests that TAAs, much like somatic mutations, are largely unique to individual tumors. T-cell responses vary from antigen to antigen; for example, responses to MAGE family antigens are rare, whereas responses to melanocyte differentiation antigen (MART/Melan-A) are seen in >50% of healthy individuals.

    More studies are needed here to catalogue TAAs and quantify their antigenicity across patient populations. Here, too, is where high-throughput sequencing of tumor genomes might offer useful information as well. Knowledge of the full set of protein-coding mutations in a tumor might shed light on its immunogenic potential, or vice-versa, thereby leading to better informed prognoses and treatment decisions.

    Tumor-Infiltrating Lymphocytes and Disease Prognosis

    Even without complete tumor regression, the presence and quality of tumor-infiltrating lymphocytes (TILs) – NK cells, T-cells, and NKT cells – has a favorable prognosis for numerous tumor types. This correlation was first observed in melanoma, where patients with high CTL infiltration of their tumors survived longer. A “landmark” study in ovarian cancer found that 38% patients with high TIL numbers survived longer than 5 years, compared to 4.5% of patients with low TIL numbers. Studies in colon and lung cancers have found that the type and density of TILs was more powerful prognostic indicator than the clinical stage of the tumor.

    There is, of course, a downside to TILs: when they’re macrophages or regulatory T cells. High numbers of these can have a poorer prognosis, possibly due to their immuno-suppressive functions.

    Inflammation and Tumor Development

    Chronic inflammation can contribute to cancer by inducing genotoxic stress, cell proliferation, angiogenesis, and even enhancing tissue invasion. Even so, the tumor-promotion activities of inflammation and tumor-suppressing actions of the immune system are not mutually exclusive. In the authors’ mouse model of MCA sarcoma, for example, tumor development requires several inflammation molecules (MyD88, IL-10, IL1B,and IL-23), but these factors induce the host-protective immune response (IFN and T-cells) that destroy the tumors. In other primary carcinogen models, MyD88 and IL1B promote tumor development, but also facilitate the recognition of dying tumor cells that leads to anti-tumor immunity.

    Another important role of inflammation is the transition from equilibrium to escape, when inflammatory and regulatory immune cells are recruited to the tumor, and then subverted to dampen anti-tumor immunity, allowing cancer progression. Indeed, the authors suggest that pro-inflammatory transcription factors NF-KB and STAT3 may be valuable therapeutic targets, whose inhibition may facilitate the transition from tumor-promoting inflammation to tumor-suppressing immunity.

    References
    Vesely MD, Kershaw MH, Schreiber RD, & Smyth MJ (2010). Natural Innate and Adaptive Immunity to Cancer. Annual review of immunology PMID: 21219185

    AddThis Social Bookmark Button

    The Year of the Exome

    December 29th, 2010

    Next-generation sequencing technologies have dramatically altered the landscapes of genetics and genomics. There has been considerable interest in applying NGS platforms to selected regions of the human genome. Targeted sequencing of just the coding regions of the human genome — the exome — is of particular interest, because these regions presumably harbor the lion’s share of relevant genetic variation. In 2010, low-cost, high-throughput exome sequencing was made possible.

    Two companies emerged as the titans of exome capture for sequencing:

    • Nimblegen, whose SeqCap EZ Exome kit claims to yield >10x coverage for 90% of the exons from 18,000 genes.
    • Agilent, whose SureSelect All Exon kits target 40-50 megabases of CCDS exons.

    So which one is better? That’s a difficult question to answer, particularly because most groups with early access to these technologies are bound by non-disclosure agreements. Based on information in the public domain, such as the 20+ articles this year that employed exome sequencing, there is no clear winner. Some studies used Nimblegen, some used Agilent, and all of them achieved some kind of scientific success, or else we wouldn’t have read them. Clearly both of the companies are working hard to improve their products, and to incorporate the suggestions/requests of customers into their products. Both platforms saw a “version 2″ release this year with a larger target space and other improvements. At Personal Genomes I saw at least two posters for studies where 1,000 or more exomes would be (or have been) sequenced. One thing is clear: Agilent and Roche/Nimblegen are selling exome kits like crazy.

    Many fruits of exome sequencing have already come to market. A search for publications with ‘exome’ in the title turned up dozens of entries – two thirds of which were research articles on exome sequencing, and the other third, news briefs or reviews discussing its potential. A significant portion of these were low-hanging fruit: rare diseases of suspected genetic origin for which the causal gene(s) had not been identified. You can recognize these because they often have “syndrome” in the name: Fowler syndrome [13], Miller Syndrome [17], Kabuki syndrome [16], Sensenbrenner syndrome [7], Brown-Vialetto-van Laere syndrome [9] were all figured out (genetically) by exome sequencing this year. Mutations in a number of genes were linked to other rare inherited disorders:

    • WDR62 (severe brain malformations) [2]
    • GPSM2 (nonsyndromic hearing loss) [23]
    • STIM1 (fatal classic Kaposi sarcoma) [5]
    • ACAD9 (complex I deficiency) [8]
    • VCP (familial ALS) [10]
    • ADIPOQ (insulin resistance atherosclerosis) [4]
    • PIGV (hyperphosphatasia mental retardation) [12]
    • ANGPTL3 (familial combined hyperlipidemia) [15]
    • TGM6 (spinocerebellar ataxias) [24]
    • FADD (autoimmune lymphoproliferative syndrome) [3]

    You might think that given how rare these diseases are, the impact of such findings is not very significant. But to an investigator who’s spent his or her life studying a rare disease (or the family that has it), the possibility of finding the disease-causing gene in a single experiment is simply irresistible. Though the sample numbers are small, the ramifications of these discoveries are not. They enable everyone in the world with a rare disease, even if this only totals a handful of patients, to be efficiently genotyped for causal mutations. They shed light on new and unanticipated mechanisms of disease pathogenesis. They’ve even justified having CXXorfXX genes in the set of human genes (C20orf54 was shown to cause Brown-Vialetto-van Laere syndrome [9]).

    Larger studies of more common, more complex phenotypes are already beginning to pop up.  A collaboration between the University of Copenhagen (Denmark) and BGI (Shenzen) has sequenced the exomes of at least 250 individuals. A subset of these (n=50) were used to study adaptation to high altitude [26], while another 200 were the subject of a recent Nature Genetics paper [14] entitled “Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants,”  (whose inline title could have just been “Duh”).

    Thus, exome sequencing has already enabled significant advances in the understanding of [rare] human diseases. In the coming year, I expect we’ll see a dramatic scale-up as exome sequencing is applied to thousands of patients with cancer, diabetes, autism, and other common diseases. Who knows? Maybe 2011 will be the year of exome sequencing as well.

    References

    1. Bainbridge, M. N., M. Wang, et al. “Whole exome capture in solution with 3 Gbp of data.” Genome Biol 11(6): R62.
    2. Bilguvar, K., A. K. Ozturk, et al. “Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations.” Nature 467(7312): 207-10.
    3. Bolze, A., M. Byun, et al. “Whole-exome-sequencing-based discovery of human FADD deficiency.” Am J Hum Genet 87(6): 873-81.
    4. Bowden, D. W., S. S. An, et al. “Molecular basis of a linkage peak: exome sequencing and family-based analysis identify a rare genetic variant in the ADIPOQ gene in the IRAS Family Study.” Hum Mol Genet 19(20): 4112-20.
    5. Byun, M., A. Abhyankar, et al. “Whole-exome sequencing-based discovery of STIM1 deficiency in a child with fatal classic Kaposi sarcoma.” J Exp Med 207(11): 2307-12.
    6. Cirulli, E. T., A. Singh, et al. “Screening the human exome: a comparison of whole genome and whole transcriptome sequencing.” Genome Biol 11(5): R57.
    7. Gilissen, C., H. H. Arts, et al. “Exome sequencing identifies WDR35 variants involved in Sensenbrenner syndrome.” Am J Hum Genet 87(3): 418-23.
    8. Haack, T. B., K. Danhauser, et al. “Exome sequencing identifies ACAD9 mutations as a cause of complex I deficiency.” Nat Genet 42(12): 1131-4.
    9. Johnson, J. O., J. R. Gibbs, et al. “Exome sequencing in Brown-Vialetto-van Laere syndrome.” Am J Hum Genet 87(4): 567-9; author reply 569-70.
    10. Johnson, J. O., J. Mandrioli, et al. “Exome sequencing reveals VCP mutations as a cause of familial ALS.” Neuron 68(5): 857-64.
    11. Kozlowski, P., M. de Mezer, et al. “Trinucleotide repeats in human genome and exome.” Nucleic Acids Res 38(12): 4027-39.
    12. Krawitz, P. M., M. R. Schweiger, et al. “Identity-by-descent filtering of exome sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome.” Nat Genet 42(10): 827-9.
    13. Lalonde, E., S. Albrecht, et al. “Unexpected allelic heterogeneity and spectrum of mutations in Fowler syndrome revealed by next-generation exome sequencing.” Hum Mutat 31(8): 918-23.
    14. Li, Y., N. Vinckenbosch, et al. “Resequencing of 200 human exomes identifies an excess of low-frequency non-synonymous coding variants.” Nat Genet 42(11): 969-72.
    15. Musunuru, K., J. P. Pirruccello, et al. “Exome sequencing, ANGPTL3 mutations, and familial combined hypolipidemia.” N Engl J Med 363(23): 2220-7.
    16. Ng, S. B., A. W. Bigham, et al. “Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome.” Nat Genet 42(9): 790-3.
    17. Ng, S. B., K. J. Buckingham, et al. “Exome sequencing identifies the cause of a mendelian disorder.” Nat Genet 42(1): 30-5.
    18. Otto, E. A., T. W. Hurd, et al. “Candidate exome capture identifies mutation of SDCCAG8 as the cause of a retinal-renal ciliopathy.” Nat Genet 42(10): 840-50.
    19. Rosenfeld, J. A., A. K. Malhotra, et al. “Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing.” Nucleic Acids Res 38(18): 6102-11.
    20. Summerer, D., N. Schracke, et al. “Targeted high throughput sequencing of a cancer-related exome subset by specific sequence capture with a fully automated microarray platform.” Genomics 95(4): 241-6.
    21. Teer, J. K. and J. C. Mullikin “Exome sequencing: the sweet spot before whole genomes.” Hum Mol Genet 19(R2): R145-51.
    22. Tennessen, J. A., J. Madeoy, et al. “Signatures of positive selection apparent in a small sample of human exomes.” Genome Res 20(10): 1327-34.
    23. Walsh, T., H. Shahin, et al. “Whole exome sequencing and homozygosity mapping identify mutation in the cell polarity protein GPSM2 as the cause of nonsyndromic hearing loss DFNB82.” Am J Hum Genet 87(1): 90-4.
    24. Wang, J. L., X. Yang, et al. “TGM6 identified as a novel causative gene of spinocerebellar ataxias using exome sequencing.” Brain 133(Pt 12): 3510-8.
    25. Worthey, E. A., A. N. Mayer, et al. “Making a definitive diagnosis: Successful clinical application of whole exome sequencing in a child with intractable inflammatory bowel disease.” Genet Med.
    26. Yi, X., Y. Liang, et al. “Sequencing of 50 human exomes reveals adaptation to high altitude.” Science 329(5987): 75-8.
    27. Zhao, Q., E. F. Kirkness, et al. “Systematic detection of putative tumor suppressor genes through the combined use of exome and transcriptome sequencing.” Genome Biol 11(11): R114.
    AddThis Social Bookmark Button