Archives for August 2009

WUCGI: WashU Cancer Genomics Initiative

August 27, 2009 by Dan Koboldt

WUCGI

Yesterday afternoon was the kickoff party launching WashU’s Cancer Genomics Initiative (CGI), better known as our goal to sequence 150 cancer genomes in the coming year.

Cancer Sequencing Ramps Up

Under the leadership of Wilson, Ley, and Elaine Mardis, our group sequenced the first cancer genome, from a woman who had died of AML (M1), and published the results in Nature last fall. Three weeks ago came the sequel. In the New England Journal of Medicine, we published the complete genome of another M1 leukemia, this time from a man who’s been treated and remains in full remission. In less than a year, the number of Illumina runs required to sequence a tumor genome dropped by over 80%, from 98 runs in AML1 to just 16.5 runs in AML2.

It’s not just the sequencing throughput that makes WUCGI a realistic effort. Many groups have Illumina sequencers, some even more than we do. Some of the most critical advances have taken place behind the scenes – for example, the variant detection pipelines developed by David Larson, Ken Chen, Chris Harris, and others. Sequencing on this scale would not be possible without the IT and informatics infrastructure, built under the leadership of David Dooling and Gary Stiehr, that gives us the computational firepower to run whole-genome analyses.

Two Genomes Down, 150 To Go

With two genomes published, the center leadership has set an ambitious goal: To sequence 150 cancer genomes in the coming year. Obviously, these will include more AML samples, hopefully some with therapy-related changes or abnormal cytogenetics. In collaboration with Matt Ellis and others at the Siteman Cancer Center, we’ll be tackling breast cancer as well. No doubt we’ll be revisiting lung cancer, for which we sequenced candidate genes as part of the Tumor Sequencing Project (TSP) consortium. As part of the Cancer Genome Atlas (TCGA) consortium, we’re working on glioblastoma multiforme (brain cancer) and ovarian cancer. Also, intriguingly, I hear rumors that there will be some sequencing of less common, largely unexplored cancers like multiple myeloma.

As Tim Ley said yesterday, it’s thrilling to be a part of this. We truly are entering the golden age of cancer genomics.

Genetics of Human Longevity

August 19, 2009 by Dan Koboldt

A new study in PLoS ONE resequenced candidate genes in a cohort of the “healthy oldest-old” – individuals aged 85 or older that are healthy and have never been diagnosed with cancer, cardiovascular disease, Alzheimer’s, pulmonary disease, or diabetes. The idea is that these robust old-timers harbor genetic variants that reduce susceptibility to, or even protect against, the prevalent age-related disorders that tend to shorten lifespans. Demographic data suggest that less than 36% of the population of western nations will live to see 85, and only a third of these (12% overall) will do while remaining in good health. Since longevity is highly heritable (~25%), it stands to reason that genetics play a key role.

Tortoise Still Winning the Race

Intriguingly, despite the sum of human technological achievements – in agriculture, sanitation, medicine, etc. – our maximum observed lifespan (122 years) is not the longest on the planet even among animals. Indeed, the authors point out that rougheye rockfish, bowhead whales, red sea urchins, and Galapagos tortoises easily outlive us, with lifespans of 150-200 years. We can extend the lifespan of other animals – mice, by putting them on reduced-calorie diets, and C. elegans, by inhibiting expression of insulin/IGF receptor daf-2 – but can’t seem to change our own.

Rounding Up the Usual Suspects (Genes)

Next, the authors selected 24 candidate genes known to be involved in age-related processes. These included genes implicated in dietary restriction (SIRT1/3, UCP2/3, PPARG), autophagy (FRAP1, BECN1), stem cell activation (NOTCH1, DLL1), progeria syndromes (LMNA, ZMPSTE24, KL), tumor suppression (TP53, ING1, CDKN2A), and DNA methylation (TRDMT1, DNMT3A/B). Also included were the human homologs of several genes known to be differentially expressed in long-lived daf-2 mutant worms: IGF1R (growth factor receptor), SCD and APOB (lipid metabolism), and CRYAB and HSPB2 (heat shock proteins). Such an esoteric gene list allowed the authors to screen for variants across a wide range of gene functions and biological pathways that might contribute to longevity.

Ye Olde Candidate Gene Resequencing

Some 716 PCR amplicons were designed to isolate the exons, 5′ and 3′ UTRs, 1.5 kbp promoters, intron-exon junctions, and selected conserved noncoding sequences (CNSs) for each of the 24 genes. Altogether some ~360 kbp of DNA was sequenced, bidirectionally, producing a grand total of ~35 million high quality (phred > 20) bases.

Variant detection with phred/phrap/polyphred and Mutation Surveyor identified 935 sequence variants (848 SNPs and 87 small indels), of which 59% were previously known to dbSNP. Unsurprisingly, the majority of variants found mapped to introns or conserved noncoding regions. About 50 novel coding SNPs were identified, though the authors point out that they were far less common (average MAF 1.6%) than the 80 or so previously known coding SNPs (average MAF 19%).

Tag SNPs: Leveraging the HapMap Resource

Here the authors took a rather puzzling turn and sought to compile a set of longevity tag SNPs by combining their data with the findings of the International HapMap Project. Only 12% of the combined variant set was shared between HapMap and the resequencing dataset, but that’s hardly surprising – HapMap variants were selected on the basis of high frequency (MAF > 5%), whereas many of the novel variants identified in this study were rare (in coding regions, MAF=1.6%). Thus the SNP sets are very likely to complement one another.

The authors selected 682 tag SNPs representing 1,550 non-redundant variants from the combined datasets (using LD > 0.8 for HapMap SNPs, LD >= 1 for resequencing SNPs). These were utilized to genotype a larger cohort (493 healthy oldest-old and 439 random controls), but unfortunately, the data was not shown. How disappointing! It seems to me that if the authors had found any significant association between their tag SNPs and longevity, that would have been an important result.

Common vs. Rare Variants: Is HapMap Enough?

One conclusion that was perhaps over-emphasized was that HapMap SNPs were inadequate to capture rare variation in the study population. Some 264 of the 935 variants identified by resequencing were singletons, i.e. present in just one individual, and only around 2.5% of these could be captured by HapMap tag SNPs using r-squared of 0.8. The authors conclude that “This shows that HapMap tagSNPs generally do not adequately represent, private re-sequencing SNPs. This analysis highlights a major challenge for genetic association studies. Using only HapMap SNPs, effects due to uncommon variants would often be missed.” Well, yes, but also, duh. HapMap was intended to represent common, and not rare, variation. Far more compelling would have been if the authors found rare variants actually associated with their phenotype of healthy aging. But alas…

The authors raise a fair point in that association studies cannot rely on the HapMap alone. To obtain the complete picture of genetic variation underlying a phenotype of interest requires a hybrid strategy that includes both common and rare variants. At some point this will require whole-genome resequencing of affected individuals, and for that, we’ll need something more than the 3730.

References
Halaschek-Wiener, J., Amirabbasi-Beik, M., Monfared, N., Pieczyk, M., Sailer, C., Kollar, A., Thomas, R., Agalaridis, G., Yamada, S., Oliveira, L., Collins, J., Meneilly, G., Marra, M., Madden, K., Le, N., Connors, J., & Brooks-Wilson, A. (2009). Genetic Variation in Healthy Oldest-Old PLoS ONE, 4 (8) DOI: 10.1371/journal.pone.0006641

Second Cancer Genome in New England Journal

August 6, 2009 by Dan Koboldt

Today our group published the second cancer genome, AML2, in the New England Journal of Medicine. In this study, we sequenced the complete genomes of tumor cells and matched normal (skin) cells from a patient with cytogenetically normal de novo FAB M1 AML. This is an exciting publication for many reasons, the foremost of which may be the venue: with an impact factor of 52.59, the NEJM is almost certainly the most widely read biomedical journal in the world.

Diagnosed with Leukemia: It Could Happen to You

The story begins three years ago, with a previously healthy 38-year-old man of European ancestry who went to his doctor complaining of fatigue and a persistent cough. After an elevated white blood cell count, his physician ordered a bone marrow biopsy, which revealed 90% cellularity and 86% blasts. Diagnosis: Leukemia.

The patient underwent ten days of chemotherapy with cytarabine (7 days) followed by daunorubicin (3 days). Five weeks later he’d obtained morphologically complete remission and recovered counts. Now, three years later, he remains in complete remission. According to my conversations with an oncologist, this kind of happy ending is not very common with leukemia. Most leukemia patients are diagnosed at an advanced age, and don’t do as well.

Acute myelogenous leukemia cells. Credit: Univ. of Virginia

Moving Beyond Cytogenetics

At the time of his diagnosis, routine cytogenetic analysis of the patient’s tumor cells showed a normal 46XY karyotype. Bone marrow and skin samples were banked with informed consent for whole genome sequencing in accordance with our IRB. There was no family history of leukemia, though the patient’s mother had developed breast cancer and later non-Hodgkins lymphoma. Her half sister had also developed breast cancer. The field for discovery of mutations underlying this AML was wide open.

Whole Genome Sequencing with Illumina

We sequenced the genomes of tumor cells and matched normal (skin) cells to high depth (23.3x and 21.3x, respectively) on the Illumina/Solexa platform. The tumor sample required just 16.5 runs (most of which were 2×75 PE) to reach 98% diploid coverage. That’s a dramatic improvement over our first cancer genome, AML1, which took 98 runs (36 bp SE) to achieve 91% diploid coverage. At current rates, we really can sequence a genome a week. As any bioinformatician knows, however, the analysis usually takes a bit longer.

Dave Larson in my group really deserves the credit for the whole genome variant detection pipeline applied to AML2. With direction from Elaine Mardis, Rick Wilson, and Tim Ley, and others, Dave created a pipeline for automated variant calling, somatic scoring, and tiered classification of variants for cancer genomes (see Figure 1 of the paper). We identified 3.87 million single nucleotide variants (SNVs) in the tumor genome, of which 97.5% were in the skin genome and another 1.7% were previously described (i.e. dbSNP). That left 20,256 putative somatic variants which we classified as follows:

Tier 1 variants were coding variants that alter amino acid sequences, like nonsynonymous, nonstop, and splice-site mutations.
Tier 2 variants were variants in evolutionarily conserved or regulatory-potential sequences of the genome.
Tier 3 were the remaining variants that were in non-repetitive regions of the genome.
Tier 4 were the remaining variants that were in repetitive regions of the genome.

Validation and Deep 454 Read Counts

We used 3730 sequencing to validate somatic variants in Tiers 1 and 2. Some 62 mutations were validated, of which 10 were tier 1 (amino acid-altering) mutations. Additionally, we validated two somatic indels, one of which (NPM1) was previously described; the other was an insertion in the CEP170 gene predicted to add a leucine residue to the encoded protein.

In the absence of true functional validation, there are at least two approaches to evaluating whether or not a somatic mutation is a driver – a mutation that confers some advantage to drive tumor development – or a passenger – a background mutation that’s just along for the ride. First, driver mutations should be present most tumor cells, since the dominant clone will be the most “fit” in the tumor population. To assess mutation frequencies in our patient’s tumor cells, we applied 454 sequencing of mutation-containing amplicons in the tumor DNA, tumor cDNA, and skin DNA. Deep read counts for somatic events on the X and Y chromosomes showed allele frequencies of around 98%, consistent with the fact that nearly all cells in the bone marrow sample were part of the malignant clone. For the rest of the somatic mutations, variant frequencies hovered near the 50% mark (as expected) with a few exceptions. The CEP170 indel had a reduced (~35%) frequency in tumor DNA, suggesting that perhaps it’s not a driver mutation.

Recurrence of Mutations in Other AMLs

The other measure of importance of a somatic mutation is recurrence in other tumors of the same type. Thus, we screened for the presence of validated somatic mutations in a panel of 187 additional leukemia patients to see if any were recurrent. Most, unfortunately, were not. However, two variants were found in other samples, suggesting an important role in the development of AML. One was a noncoding conserved mutation (tier 2) on chromosome 10 which was detected in one other sample. Recurrence in just one other sample might not seem impressive, but by our estimation, the odds of such an event happening by chance are 1.1 x 10^-9. Thus, we may have uncovered a noncoding functional mutation that contributes to carcinogenesis via an as-yet-unknown mechanism.

The other was a nonsynonymous (tier 1) mutation in IDH1 at residue 132. Sixteen of 187 other leukemia samples carried mutations at the same residue in IDH1, suggesting an important role for this gene in the development of AML. Somatic mutations in IDH1 were recently characterized in glioblastoma (GBM) by our friends at Johns Hopkins, but this is the first time that IDH1 mutations were described in AML.

Conclusions: Lots of Passengers, Not Many Drivers

After sequencing the complete tumor genomes of two AML patients, we have estimated that these cancers carry an estimated 750 somatic events. Most such events will be background passenger mutations, acquired in the progenitor tumor cell before it became cancerous. Admittedly, that means there’s much more work to do to fully characterize the sequence changes underlying development of AML and other cancers. Our group is eager for the challenge. With the ever-growing throughput of the Illumina platform and our automated pipelines for whole-cancer-genome analysis, we hope to sequence at least a hundred more cancers in the coming year.

References

Mardis, E., Ding, L., Dooling, D., Larson, D., McLellan, M., Chen, K., Koboldt, D., Fulton, R., Delehaunty, K., McGrath, S., Fulton, L., Locke, D., Magrini, V., Abbott, R., Vickery, T., Reed, J., Robinson, J., Wylie, T., Smith, S., Carmichael, L., Eldred, J., Harris, C., Walker, J., Peck, J., Du, F., Dukes, A., Sanderson, G., Brummett, A., Clark, E., McMichael, J., Meyer, R., Schindler, J., Pohl, C., Wallis, J., Shi, X., Lin, L., Schmidt, H., Tang, Y., Haipek, C., Wiechert, M., Ivy, J., Kalicki, J., Elliott, G., Ries, R., Payton, J., Westervelt, P., Tomasson, M., Watson, M., Baty, J., Heath, S., Shannon, W., Nagarajan, R., Link, D., Walter, M., Graubert, T., DiPersio, J., Wilson, R., & Ley, T. (2009). Recurring Mutations Found by Sequencing an Acute Myeloid Leukemia Genome New England Journal of Medicine DOI: 10.1056/NEJMoa0903840