Random Post: VarScan
RSS 2.0
  • Home
  • About
  • Aligners
  • Genomes
  • Subscribe
  • VarScan
  •  

    Disease-causing Mutations Discovered by NGS in 2011

    December 29th, 2011

    The number of human genetic diseases unraveled by next-generation sequencing skyrocketed this year. Several factors contributed to this growth, two of which were the ever-increasing throughput of sequencing instruments and widespread availability of commercial exome platforms. A number of large-scale initiatives to discovery disease genes by exome sequencing, particularly for Mendelian disorders, got off the ground. I’d also argue that the rapid pace of discovery is also aided by a growing acceptance of sequencing as a clinical tool.

    A PubMed search restricted to keywords “exome” and “sequencing” and year 2011 returned over 100 publications, of which more than 60 were studies linking genetic variation to human disease. I’ve whittled the list down to around 40 and (after consulting a medical dictionary for most) divided them down by rough disease categories.

    Developmental Disorders

    The largest of these was what I call “developmental disorders” – mental retardation, dysplasia (abnormal growth), dyskinesia (impaired movement), and the like. There were at least 14 gene-disease associations published this year, many of them in the Americal Journal of Human Genetics.

    Developmental Disorders
    SMOC2 Major dental developmental defects Bloch-Zupan et al. AJHG.
    SYT14 Spinocerebellar ataxia with psychomotor retardation Doi et al. AJHG.
    TECR Non-syndromic mental retardation Caliskan et al. Hum Mol Genet.
    PRRT2 Paroxysmal kinesigenic dyskinesia Chen et al. Nat Genet.
    SERPINF1 Osteogenesis imperfecta Becker et al. AJHG.
    KIF22 Spondyloepimetaphyseal dysplasia with joint laxity Min et al. AJHG.
    KAT6B Say-Barber-Biesecker syndrome Clayton-Smith et al. AJHG.
    POP1 Novel skeletal dysplasia Glazov et al. PLoS Genet.
    CCDC8 3-M syndrome Hanson et al. AJHG.
    SLCO2A1 Primary hypertrophic osteoarthropathy Zhang et al. AJHG.
    WDR62 Recurrent polymicrogyria Murdock et al. Am J Med Genet A.
    FAM20A Amelogenesis imperfecta O’Sullivan et al. AJHG.
    SHROOM3 Heterotaxy Tariq et al. Genome Biol.
    MCT8 X-linked leucoencephalopathy Tsurusaki et al. J Med Genet.

    Familial Cancer Syndromes

    Sequencing of individuals with hereditary cancer syndromes enabled the identification of some new cancer susceptibility genes. This category will undoubtedly explode in the coming year as thousands of cancer patients have their genomes or exomes sequenced.

    Hereditary Cancer Syndromes
    MAX Hereditary pheochromocytoma Comino-Mendez et al. Nat Genet.
    RET Familial medullary thyroid carcinoma Qi et al. PLoS One.

    Metabolic Disorders

    Next up, metabolic disorders. Interestingly, a study by Vissers and colleagues linked germline variants in IDH1 — a gene recurrently mutated in leukemia, glioblastoma, and other cancers — to “metaphyseal chondromatosis”, a rare disorder of severe bone dysplasia, neurodevelopmental problems, and strongly increased secretion of D-2-hydroxy-glutaric acid.

    Metabolic Disorders
    ACSF3 Combined malonic and methylmalonic aciduria Alfares et al. J Med Genet. and Sloan et al Nat. Genet.
    MTHFD1 Novel inborn error of folate metabolism Watkins et al. J Med Genet.
    IDH1 Metaphyseal chondromatosis with aciduria Vissers et al. Am J Med Genet A.

    Blood and Lymphatic Deficiencies

    Several inherited deficiencies of the blood and lymphatic system were linked to causal mutations. What I liked about this category was that half of the publications came out in “non-genome” journals — Blood and Haematologica — indicating that medical specialists in the field recognize the importance of (and in some cases, are already applying) exome sequencing to study such diseases.

    Blood and Lymphatic Disorders
    NBEAL2 Gray platelet syndrome Albers et al. Nat Genet.
    GATA2 Dendritic cell, monocyte, B and NK lymphoid deficiency Dickinson et al. Blood.
    MPL Familial aplastic anemia Walne et al. Haematologica.
    GJC2 Primary lymphoedema Ostergaard et al. J Med Genet.

    Neurological Diseases

    Neurological disorders win the prize for making me look up the layman’s term for virtually every disorder whose causal gene was pinpointed by sequencing this year. These include such genes as lipofuscinosis (excessive accumulation of lipopigments), paraparesis (lower limb paralysis), and dystonia (abnormal muscle tone leading to movement and stature problems).

    Neurodegenerative Disorders
    DNAJC5 Adult neuronal ceroid-lipofuscinosis Benitez et al. PLoS One.
    KIF1A Hereditary spastic paraparesis Erlich et al. Genome Res.
    GCDH Early-onset generalized dystonia Marti-Masso et al. Hum Genet.
    FA2H Fatty acid hydroxylase-associated neurodegeneration. Pierson et al. Eur J Hum Genet.
    AFG3L2 Spastic ataxia-neuropathy syndrome Pierson et al. PLoS Genet.
    BANF1 Hereditary progeroid syndrome Puente et al. AJHG.
    DYNC1H1 Dominant axonal Charcot-Marie-Tooth disease. Weedon et al. AJHG.

    Myopathies

    New disease genes were identified for several muscle fiber diseases (myopathies), including cardiomyopathy (heart muscle deficiency, usually fatal) and ophthalmoplegia, in which the muscles that control eye movement are paralyzed. Interestingly, the two mitochondrial cardiomyopathy disease genes (MRPL3 and AARS2) reported both encode products required for mitochondrial ribosomal function (MRPL3 encodes a ribosomal sub-unit, while AARS2 encodes a t-RNA synthetase).

    Myopathies
    MRPL3 Mitochondrial cardiomyopathy Galmiche et al. Hum Mutat.
    AARS2 Infantile mitochondrial cardiomyopathy Gotz et al. AJHG.
    RRM2B Progressive external ophthalmoplegia Takata et al. Genome Biol.
    BAG3 Dilated cardiomyopathy Norton et al. AJHG.

    Vision-loss Disorders

    The last disease category I’ll mention is that of vision (loss) disorders. A number of new disease-causing genes were identified this year, mostly by exome sequencing. Two studies were particularly interesting. First, Bowne and colleagues (including myself) identified a mutation in the RPE65 gene causing autosomal dominant retinitis pigmentosa. This gene had only been associated with autosomal recessive RP; finding that it acts in dominant fashion suggests previously unknown routes of disease pathogenesis and new therapeutic possibilities. Second, Shi et al linked mutations in the ZNF644 gene to high myopia (severe nearsightedness), a common cause of blindness. Have you ever heard of a ZNFxxx gene that actually does something? Most of the time, you look these up and it says “May be involved in transcriptional regulation.” It’s good to know that at least one of them serves a purpose, namely, keeping most of us from virtual blindness.

    Vision Disorders
    RPE65 Retinitis pigmentosa with choroidal involvement Bowne et al. Eur J Hum Genet.
    MAK Retinitis pigmentosa Ozgul et al. AJHG.
    ZNF644 High myopia Shi et al. PLoS Genet.
    MAK Retinitis pigmentosa Tucker et al. PNAS.
    ALMS1
    IQCB1
    CNGA3
    MYO7A
    Leber congenital amaurosis Wang et al. Hum Mutat.
    DHDDS Retinitis pigmentosa Zuchner et al. AJHG.

    And there you have it. The genetic basis of dozens of inherited disorders, pinpointed by next-generation sequencing. There is simply no plausible way to deny the importance of next-generation sequencing to advancing human health and medicine. One can only imagine what we’ll know by next December, as large federally-funded initiatives ramp up their efforts to systematically apply exome and whole-genome sequencing to inherited disorders.

    References
    Shendure, J. (2011). Next-generation human genetics Genome Biology, 12 (9) DOI: 10.1186/gb-2011-12-9-408

    AddThis Social Bookmark Button

    Recurrent splicing mutations in MDS and leukemia

    December 15th, 2011

    Myelodysplastic syndrome (MDS, also called preleukemia) is a blood disorder characterized by ineffective production of myeloid cells, or leukocytes. The disorderly and ineffective production of blood cells from stem cells in the bone marrow results in low blood counts, or cytopenias. As many of 30% of MDS cases progress to full-blown, chemotherapy-resistant secondary AML. This week in Nature Genetics, two studies report recurrent mutations in splicing-related genes in blood tumors.

    MDS Cells (Wikipedia)

    First, Tim Graubert and colleagues describe the whole-genome sequencing of an MDS-derived secondary AML tumor and a matched normal (skin) sample. They detected and validated 507 somatic single-nucleotide variants in the tumor, nearly all of which (505) were detected in the MDS sample. Among these were 30 coding SNVs, of which one was a missense mutation in the U2AF1 gene. The same codon of U2AF1 was also mutated in two other MDS cases evaluated by WGS, highlighting it as a potential recurrently mutated gene. The authors undertook systematic sequencing of U2AF1 exons in 150 MDS cases, and found that 8.7% had mutations at Ser34.

    Characterization of Recurrent U2AF1 Mutations

    The authors undertook deep genomic resequencing, cDNA sequencing, and other experiments to characterize the nature of the U2AF1 mutations, finding that:

    • Mutant allele frequencies were at 40-50%, suggesting that the mutation was present in most or all tumor cells.
    • • SNP arrays and WGS indicated no large deletions or uniparental disomy spanning the U2AF1 locus.
    • • Deep cDNA sequencing demonstrated that both wild-type and mutant alleles were expressed.
    • • There were no apparent differences in the amount of U2AF1 mRNA between wild-type and mutated samples.
    • • In the 150 cases examined, no other positions in the gene were mutated (other than residue 34).

    Taken together, these observations suggest that U2AF1 alteration was an early, initiating event and likely represents a gain-of-function mutation.

    U2AF1 and Splicing Factors

    U2AF1 encodes a small regulatory subunit of the U2AF splicing factor. It binds the 3′ AG splice acceptor dinucleotide of the pre-mRNA target intron, and forms a heterodimer with U2AF2, which binds the adjacent polypyrimidine tract. U2AF1 is highly conserved, and loss of both copies is lethal in many species. Although it’s not known which domain of U2AF1 binds the mRNA, the Ser34 mutation occurs in a zinc-finger motif that may have RNA binding activity. Interestingly, in vitro reporter assays revealed that the Ser34 mutation causes an increase in splicing activity and more exon skipping relative to wild-type U2AF1. Further, an analysis of differentially expressed genes (by microarray) between samples with or without U2AF1 mutations revealed that three of the top functional categories for down-regulated genes were splicing- or RNA-recognition-motif-related genes. This observation may reflect one or more compensatory mutations for the increased splicing activity of U2AF1 mutants.

    Recurrent Mutation of SF3B1 in Chronic Lymphocytic Leukemia

    A second study in Nature Genetics, led by Victor Quesada and colleagues, employed exome sequencing to identify recurrent mutations in chronic lymphocytic leukemia (CLL), the most common form of adult leukemia in western nations. The authors sequenced the exomes of tumor samples and matched controls from 105 patients with CLL, 60 of which had mutated IGHV regions (a common alteration in CLL) and 45 of which did not. They reported ~45 somatic mutations per case, and observed more protein-altering mutations in IGHV-mutated samples (12.8 +/- 0.7) than non-IGHV-mutated (10.6 +/- 0.7). Comparing this study to their previous work (WGS of 4 CLL cases), the authors identified several new recurrently-mutated genes, including:

    • SF3B1, a subunit of the spliceosomal U2 snRNP11;
    • POT1, a nuclear protein involved in telomere maintenance12;
    • CHD2, which regulates gene expression by modification of chromatin structure13
    • LRP1B, which has recently been defined as a tumor suppressor in different malignancies

    The authors focused on SF3B1 mutations, which was altered by somatic point mutations in ~10% of cases. Systematic screening of 279  cases by 3730 sequencing revealed that 9.7% of CLL tumors harbored SF3B1 mutations, making this the most frequently mutated gene in CLL identified to date. The protein encoded by SF3B1 is involved in the binding of the U2 snRNP to the branch point near 3′ splice sites. It interacts with RNA sequences and at least two proteins near the branch point: the early 3′-splice-site recognition factor U2AF65 and the branch point–binding protein SF3B14, as well as the RNA sequences near the branch point.

    SF3B1 Mutations. Credit: Quesada et al., Nat. Genet., 2011

    RNA-seq of SF3B1-mutated cases revealed some patterns of aberrant splicing, most of which paired a known 5′ donor site with a new, abnormal 3′ acceptor site. An analysis of splicing target genes revealed truncated versions of SLC23A2, a vitamin C transporter, and TCIRG1, one of whose gene products is a T-cell immune regulator. Another altered gene was FOXP1, known to be dysregulated in diffuse large B-cell lymphoma; the altered transcript lacked two PEST sequences normally required for protein degradation.

    Role of Splicing in Tumor Development and Progression

    Most adult tumors harbor hundreds or thousands of somatic mutations, only a fraction of which are likely to drive development and growth. Recurrence of mutations in the same gene or pathway remains the best way to isolate these “driver” mutations from background passenger events. These two studies, and a handful of others published late this year, suggest an important role for aberrant splicing in the early development of myeloproliferative disorders, such as MDS/sAML and CLL. What’s particularly important is that these appear to be gain-of-function mutations, which opens the door to new potential targeted therapies. It’s one step closer to personalized medicine for cancer patients, brought to you by next-generation sequencing.

    References

    Graubert TA, Shen D, Ding L, Okeyo-Owuor T, Lunn CL, Shao J, Krysiak K, Harris CC, Koboldt DC, Larson DE, McLellan MD, Dooling DJ, Abbott RM, Fulton RS, Schmidt H, Kalicki-Veizer J, O’Laughlin M, Grillot M, Baty J, Heath S, Frater JL, Nasim T, Link DC, Tomasson MH, Westervelt P, Dipersio JF, Mardis ER, Ley TJ, Wilson RK, & Walter MJ (2011). Recurrent mutations in the U2AF1 splicing factor in myelodysplastic syndromes. Nature genetics PMID: 22158538

    Quesada V, Conde L, Villamor N, Ordóñez GR, Jares P, Bassaganyas L, Ramsay AJ, Beà S, Pinyol M, Martínez-Trillos A, López-Guerra M, Colomer D, Navarro A, Baumann T, Aymerich M, Rozman M, Delgado J, Giné E, Hernández JM, González-Díaz M, Puente DA, Velasco G, Freije JM, Tubío JM, Royo R, Gelpí JL, Orozco M, Pisano DG, Zamora J, Vázquez M, Valencia A, Himmelbauer H, Bayés M, Heath S, Gut M, Gut I, Estivill X, López-Guillermo A, Puente XS, Campo E, & López-Otín C (2011). Exome sequencing identifies recurrent mutations of the splicing factor SF3B1 gene in chronic lymphocytic leukemia. Nature genetics PMID: 22158541

    AddThis Social Bookmark Button

    Somatic Mutation Detection in Whole Genome Sequencing Data

    December 8th, 2011

    A paper online at Bioinformatics describes our flagship algorithm for detecting somatic point mutations in whole-genome sequencing of tumor samples. This freely available software package, called SomaticSniper, performs a Bayesian comparison of the genotype likelihoods in tumor and normal samples at every [covered] position in the genome.

    Overview
    Documentation
    Install

    The study includes a detailed investigation of common sources of false positive mutation calls (usually from sequencing- or alignment-related artifacts) and describes a filtering strategy to remove them from mutation callsets.

    Inception: First Cancer Genomes

    Like many bioinformatics algorithms, SomaticSniper reached publication after a long and colorful history. It began in 2008 when we sequenced the first cancer genome, AML1. At the time, we were generating fragment-end, 32 bp reads on early Illumina GA instruments. It took over a hundred lanes to achieve ~30-fold coverage on each sample (tumor and normal). We were in dire need of a short read aligner that could handle this amount of data, and Maq answered the call (see my Maq Top Ten).

    In addition to serving as one of the most widely used short read aligners, Maq included a probabilistic genotype calling model for detecting germline SNPs in a single genome. Dave Larson (the lead author) and others from our group developed an algorithm to compare genotype likelihoods between tumor and normal, to compute the probability that a site is not somatic given the sequence data. Putative somatic mutations receive a somatic score, a Phred-scaled value representing the quality of the call. Here’s something interesting: during the data generation phase for AML1, as we added more sequence, the number of candidate mutations went down. This is because only a tiny fraction of variants in a tumor genome are somatic; the vast majority are germline variants also present in the normal. As better coverage was achieved, more and more variants turned out to be germline. By the end, it turned out that there were just ten somatic coding mutations in the tumor genome of AML1, a cytogenetically normal leukemia. A lot of people were flabbergasted. Ten little changes, and a woman got leukemia.

    More Genomes, Better Algorithm

    This algorithm became the core of our cancer whole-genome sequencing analysis pipeline, evolving and improving over the course of the second cancer genome (AML2) in the New England Journal, a breast cancer genome (BRC1), and others. It found, among others, mutations in IDH1 and DNMT3A that we and others showed to be recurrent across many tumors. The algorithm’s name changed a few times, settling at last on SomaticSniper. It’s now a lean and hungry animal, capable of processing high-coverage whole-genome sequence pairs in a matter of hours.

    Filtering Out the Noise

    No matter how good the mutation caller, there are going to be some false positives. This is because you’re looking for a one-in-a-million event, a true somatic mutation. Raw SomaticSniper calls therefore undergo a series of Maq-inspired filters. Sites are retained if they meet these criteria:

    • Covered by at least 3 reads
    • Consensus quality of at least 20
    • Called a SNP in the tumor sample with SNP quality of at least 20
    • Maximum mapping quality of at least 40
    • No high-quality predicted indel within 10 bp
    • No more than 2 other SNVs called within 10 bp

    Sites passing these criteria are subjected to two additional filters: a screen against germline variants from dbSNP (remove if matches position and allele of known non-cancer dbSNP) and an LOH filter (remove if normal is heterozygous and tumor homozygous for the same variant allele). Sites removed by the former are probably inherited variants under-sampled in the matched normal, while sites removed by the latter are likely due to large-scale structural changes (e.g. deletions) causing the loss of one allele. Finally, the filter-passed mutations are classified as high-confidence (HC) if the somatic score is at least 40 and the mapping quality is at least 40 (for BWA) or 70 (for Maq).

    Frequent Sources of False Positives

    Even sites that pass the filters above are vulnerable to certain sequencing and alignment artifacts that produce false positive calls. A detailed study revealed (as many in the field know already) a few common sources of false positives: strand bias, homopolymer sequences, paralogous reads (deriving from a paralogous region of the genome, but mapped to the wrong region, usually three or more substitutions), and the read position of the predicted variant. The latter type of artifact is something new; it turned out that variants only seen near the “effective” 3′ end of reads (the start of soft-trimmed bases or the actual end of the read if untrimmed) were more likely to be false positives. This may be a combination of sequencing error, which is higher at the 3′ end of reads, and alignment bias favoring mismatches over gaps near the ends of reads. In any case, false positives deriving from these common causes tend to have certain properties enabling them to be identified and removed while maintaining sensitivity for true mutations.

    SomaticSniper adds to the growing arsenal of tools developed by our group to address the significant challenges presented by next-generation sequencing data analysis.

    References

    Larson, DE., Harris, CC., Chen, K., Koboldt, DC., Abbott, TE., Dooling, DJ., Ley, TJ., Mardis, ER., Wilson, RK., & Ding, L. (2011). SomaticSniper: Identification of Somatic Point Mutations in Whole Genome Sequencing Data Bioinformatics, Online : doi: 10.1093/bioinformatics/btr665

    AddThis Social Bookmark Button