Marco Island Meeting Preview

February 22, 2010 by Dan Koboldt

The Advances in Genome Biology and Technology (AGBT) meeting begins this week at Marco Island. I’ll be there to present a poster on our somatic mutation detection pipeline, and also to learn about what’s to come in next-generation and next-next-generation sequencing.

Some of the companies are already ramping up. Last week Pac Bio announced the intial members of their partnership program to provide complete solutions for single molecule real-time sequencing. Microfluidics company Caliper Life Sciences formed a scientific advisory board for next-gen sequencing that included WashU’s own Vince Magrini. Other companies – Illumina, Complete Genomics, and RainDance Technologies, for example – are hosting workshops or other events at AGBT.

AGBT Sessions Not To Miss

Day 1 of the meeting will be very strong, with opening remarks from Len Pennacchio (JGI), Kelly Frazer (UCSD) on genomic enrichment, Mike Snyder (Stanford) on paired-ends for SVs/assembly, and Barbara Wold on ChIP-Seq. On Day 2, Stacey Gabriel of the Broad Institute will discuss applications of new sequencing technology to medical and cancer genetics. Carlos Bustamante of Stanford will present the complete genome sequencing and analysis of African-American and Mexican-American individuals. WashU’s David Wang will give a talk on metagenomic approaches to pathogen discovery.

Some friends of mine are giving talks later that evening. Jeff Reid (Baylor College of Medicine) has what looks to be a very interesting talk on miRNA precursor variants in schizophrenia. Daniel MacArthur, of Sanger and Genetic Future fame, will present “Loss-of-Function Mutations in Healthy Human Genomes,” likely based on his work with the 1,000 Genomes Project.

Cancer Genomics and Sequencing

I’m very excited about an entire session devoted to cancer genomics. Elliott Margulies (NHGRI) will discuss the sequencing and analysis of a melanoma genome. In what may be the first application of single-molecule sequencing to cancer, the sequencing of Ewing’s Sarcoma on a Heliscope instrument will be presented by Timothy Triche of Childrens Hospital Los Angeles. Two speakers from BC Cancer Agency will discuss rearrangements in follicular lymphoma and capture/transcriptome sequencing in lung cancer.

Whole Genome Sequencing

There are to be big-picture sequencing talks as well. Genome center co-director Elaine Mardis will present “Single Molecule Sequencing to Detect and Characterize Somatic Mutations in Cancer Genomes.” Stan Nelson of UCLA will give a talk, presumably on his group’s recent publication – whole genome sequencing of a glioblastoma cell line on ABI SOLiD.

I’ll be there, and posting regular updates, as the latest and greatest in sequencing technologies unfolds at Marco Island.

Cancer Genomics Meeting in St. Louis

December 3, 2009 by Dan Koboldt

The Genome Center at Washington University is currently hosting a remarkable two-day event focused on the study of cancer genomics. Yesterday there was a symposium on the School of Medicine campus, featuring speakers from major genome centers around the world, who delivered an excellent series of talks on recent advances in cancer genome research. Here were the highlights.

Mutational Signatures in Lung Cancer (Peter Campbell, WTSI)

First up was Peter Campbell from Wellcome Trust Sanger Institute, who presented their first cancer genome – a small lung cancer cell line called NCI-H209. Using the ABI SOLiD platform, they sequenced NCI-H209 and a matched B-cell sample from the same individual to 30-40x coverage. Extensive PCR-based validation yielded almost 23,000 somatic substitutions and over a hundred structural events (indels and rearrangements). As expected, the mutational spectrum was enriched for G->T and C->A changes associated with adduct formation on guanine nucleotides induced by benzopyrene, the chemical mutagen found in tobacco smoke. Dr. Campbell also described some of the complex rearrangements observed in the paired-end sequencing data, which were particularly convincing when overlaid with spectral karyotyping images.

Next-Gen Sequencing Strategies to Study Cancer Genomes (Elaine Mardis, WU)

Next was our own Elaine Mardis, who gave an excellent overview of the strategies developed here to apply NGS to cancer genomes. She described five key elements to success in this arena:

Genomic characterization prior to sequencing. For example, at WashU we type tumor and normal samples on genome-wide SNP arrays, which yield tumor purity/ploidy estimates, LOH information, and a dense set of SNPs for tracking the coverage of genomes by Illumina sequencing.
Resource characterization. The tissue preservation method, DNA/RNA quality and quantity, and pathology information are all critical components. Also important are high-quality clinical data (diagnosis, chemotherapy/radiation protocols, and outcome), informed consent, IRB approval, and additional cases of the same cancer subtype for recurrency screening.
Data production capacity. US genome centers seem to have this, either in the form of Illumina (WashU and Broad) or ABI SOLiD (Baylor). It’s not just the throughput of the machines, either – it’s the ability to construct sequencing libraries from ever-shrinking DNA inpus. Tumor samples are precious, and the ability to use only a tiny amount of DNA or RNA while achieving informative results is one of the key areas of focus of tech development groups.
Informatics and bioinformatics. We have entire groups devoted to LIMS, pipeline automation, medical genomics, and sequence data submission. Other important elements of bioinformatics that Elaine touched on were data display interfaces for collaborators and high-end data storage and computational infrastructure.
Validation and recurrent site screening. This the essential coup de grace for tumor genome characterization, in which we validate somatic mutations and identify those that are recurrent in other samples of the same subtype, the best indication that we currently have of pathological relevance.

Elaine also discussed the rapid scaling up of TCGA (which is adding 20 tumor types thanks to ARRA funds) and other projects, which will only exacerbate the challenges of scale that NGS platforms have already presented.

Integrating Genomics with Biology (Richard Gibbs, Baylor)

Richard Gibbs gave an action-packed talk of some relevant work going on at Baylor, both for cancer and inherited diseases. They are applying an intriguing if controversial multiple-platform strategy for whole genome sequencing: deep (20-30x) coverage on ABI SOLiD and light (6-10x) coverage on 454. “We’re just telling people that if you do it twice, you’ll get it right,” Dr. Gibbs said. One interesting project is an investigation of Charcot-Marie Tooth (CMT) syndrome, a recessive inherited disorder where the locus is unknown. Whole-genome sequencing of an affected individual on ABI SOLiD identified a few dozen novel missense mutations; among them lurked the causal variant, which was found to segregate with the disease in a family cohort.

Dr. Gibbs also gave an overview of their investigations into heritable variants in pediatric cancers (in collaboration with MD Anderson). There’s also a lot of work under way for TCGA, not just the 6K capture project, but also adjunct analyses of gene expression, DNA copy number, microRNA, and DNA methylation data being generated on TCGA samples.

Insights into Rare Tumors (Steven Jones, BC Cancer Agency)

Steven Jones from BC Cancer Agency retold the story of the rare tongue adenocarcinoma that I heard at AGBT 2009. What I didn’t know about BCCA is that under the Canadian universal healthcare system, they see all of the cancer patients in the surrounding population of over 4 million citizens. One of these was a rare one – an 80 year old man with adenocarcinoma of the tongue. It was removed surgically, of course, but in a short time metastasized to the lungs. The clinician prescribed erlotinib, an EGFR inhibitor, but unfortunately the patient did not respond. To help the patient, and also make some advances in tech development, Jones and his colleagues did whole-genome and RNA-Seq of the tumor samples and matched normals. There were just four somatic mutations: two in known cancer genes and two in zinc finger proteins (these remain unexplained). Transcriptome and copy number analysis showed that the tumor had loss of PTEN and down-regulation of SMAD4. Unfortunately, it had recently been shown that tumors lacking PTEN and TP53 don’t respond to TK inhibitors like erlotinib. However, this particular tumor showed an amplification of Ret, and as it happened, the drug bank had a single drug, sunitinib, that was known to inhibit Ret. The patient’s response, initially, was quite dramatic – all of the metastases vanished. Sadly, several months later they turned up again, and this time were resistant even to sunitinib. Still, the results of this effort were promising, because genomic information was used to keep cancer at bay, if only for a short time.

Genomic Medicine in Pediatric Brain Tumors (Chinc C. Lau, Baylor)

Ching Lau of Baylor presented genomic studies of medulloblastoma (MBM), which accounts for 20% of all brain tumors and has a 60% survival rate. Classification of MBM patients in the past was relatively crude – based on the amount of residual tumor post-surgery and metastatic status. Using gene expression profiling, Lau and colleagues identified 4-5 distinct clusters. Two clusters were associated with known cancer pathways – SHH signaling and WNT activation. The same four clusters could also be isolated by unsupervised miRNA clustering. Also, gene expression analysis showed that ERBB2 expression correlates with outcome (higher expression = poor prognosis).

Finally, Dr. Lau mentioned some future directions for targeted cancer therapy. One of these that I readily admit I don’t understand: cytotoxic T-cells with Chimeric TCRs. Evidently these are T-cells that recognize and attack cancer cells in the body. There was a short movie, courtesy of Dr. Lau’s collaborators, in which we saw these specially programmed immune cells recognizing and attacking a tumor that was roughly four times their size. It was like watching ants swarm a piece of fruit on the sidewalk, and very compelling.

Evolution of a Breast Cancer Tumor (Samuel Aparicio, BC Cancer Agency)

Dr. Aparicio presented a study recently published in Nature and already discussed on Massgenomics. However, he did discuss the continuing challenge of mutation heterogeneity in tumors – we can no longer refer to mutations as present or absent, but instead should report their frequency, which represents the proportion of clones with each mutation. The question of how deep we need to sequence to find the very rare variants has yet to be answered.

Breast Cancer Genomics (Matthew Ellis, Siteman Cancer Center)

Matthew Ellis, our collaborator from the Siteman Cancer Center, presented very recent work we’ve done on a basal subtype breast cancer. A quartet of samples were sequenced in this study – the primary breast tumor, the matched normal tissue, the brain metastasis (from which the patient died), and finally, a mouse xenograft model developed in “humanized” NODSCID mice. We validated some 50 tier 1 mutations, all of which were detected (at some level) in all four samples. Deep read counts for these mutations in each sample revealed some interesting stories about the progression of the cancer from tumor to metastasis.

Genomic Signatures and Cancer (Todd Golub, Broad / Dana Farber)

Todd Golub of the Broad Institute and Dana Farber Cancer Center presented his group’s work on Hepatocellular Carcinoma (liver cancer), which is the fifth most common cancer worldwide. It’s a disease of growing concern on the African and Asian continents, and presents numerous challenges. Molecular classification “is a mess,” Dr. Golub said, and recurrence is common. The problem is that there are few frozen samples with long-term outcome information. Thus, Dr. Golub and his group applied the Illumina DASL assay – which enables very small, highly multiplexed, locus-specific PCR – to perform expression profiling in formalin-fixed paraffin-embedded (FFPE) samples. They achieved up to 90% success across 6,000 genes in samples that were 25 years old. Doing so opened up a vast bank of viable samples for gene expression profiling, from which Dr. Golub and colleagues made some interesting findings.

The AML Genome (Tim Ley, WashU and Siteman Cancer Center)

Tim Ley gave the last talk, which highlighted the work that he and colleagues at WashU began around a decade ago on the disease acute myeloid leukemia (AML). Our goal, he said, was to find 95% of the mutations that occur in at least 5% of AMLs. To do so will require whole genome sequencing of at least 30 genomes, according to statistics from my colleague Mike Wendl. Two of these (AML-1 and AML-2) are already done and published, and a number of others are currently under way. One intriguing bit of work that Dr. Ley described was on the “Mouse APL” project, a knock-in mouse with the PML-RARA gene fusion backcrossed 10+ generations to CBL/BL6 mice. This yielded inbred strains of mice, some of which developed AML after ~6 months, presumably after acquiring “cooperative” mutations. One mouse was sequenced to 15x coverage, and among the handful of somatic nonsynonymous mutations found, one was recurrent, not only in the APL mice, but also in the same gene in human tumors.

HITS-CLIP Unravels microRNA-mRNA Interactions

September 3, 2009 by Dan Koboldt

Micro-RNAs (miRNAs) are short (18-26 nt) sequences that act as post-transcriptional repressors of gene expression. Over 700 miRNAs have been reported in the human genome; each is believed to bind directly to many mRNAs to regulate their translation or stability. Thus, miRNAs represent a key regulatory mechanism affecting numerous cellular activities, and are of particular interest in cancer research. Understanding the complex relationships between miRNAs and mRNAs remains challenging, however, and computational approaches alone have been largely unsuccessful.

HITS-CLIP: Isolation and Sequencing of Argonaute-miRNA-mRNA Complexes

Enter HITS-CLIP, a new approach that applies high throughput sequencing of RNAs isolated by crosslinking immunoprecipitation. Essentially, it’s a method by which radition is used to cross-link protein-RNA complexes and stringently purify them. Then, massively parallel sequencing yields all of the RNA “tags” bound by the protein of interest.

Ago-miRNA-mRNA Complex (Image Credit: Nature 460: 479-486, 2009)

In a recent Nature paper, Chi et al used HITS-CLIP to isolate RNA bound by the Argonaute protein (Ago), which mediates miRNA-mRNA interaction (see figure). The purified complexes showed two different modal sizes (110 kDa and 130kDa), suggesting that Ago (97 kDa) was crosslinked to two different RNA species – hopefully, miRNAs (small) and the mRNAs that they were targeting (large).

The authors applied Illumina high-throughput sequencing to characterize Ago-bound miRNAs and the mRNA “tags” to which they were linked. With relatively straightforward bio-informatics approaches, it was possible to cross-reference expressed miRNAs with complementary sequences of mRNA tags. The resulting “ternary map” of miRNA-mRNA interaction sites yields a wealth of information about this post-transcriptional regulatory mechanism.

Decoding miRNA-mRNA Interaction

The authors identified 454 unique miRNAs crosslinked to Ago in the mouse brain; mir-30e was the most abundant species, representing 14% of all miRNA tags. In silico clustering and normalization of the messenger RNA tags yielded 1,463 robust clusters from 829 different brain transcripts.

Locations of Ago-bound mRNA tags (Image Credit: Nature 460: 479-486, 2009)

When these tags were overlaid with gene annotations, several patterns emerged. As expected, a substantial portion (40%) of Ago-bound tags were in 3′ UTRs where miRNA activity is known to have high efficacy. Some 8% (one-fifth of the 40%) were actually outside of the UTR but <10kb downstream, regions likely to harbor unannotated 3′ UTRs.

Unsurprisingly, very few Ago-bound tags were in 5′ UTRs. However, a substantial fraction of tags fell in coding sequences (25%), introns (12%), and non-coding RNAs (4%), suggesting that miRNA activity occurs in these regions as well. Another 6% of tags were in intergenic regions, possibly in as-yet-unannotated transcripts. These unexpected locations of miRNA binding may offer additional insights into the mechanisms of miRNA regulation.

Next, the authors sought to define the Ago-mRNA “footprint” in which the majority of tags were contained. The distribution of tags in a defined cluster, at least in their figure, looks like a bell curve, with a sharp peak in the middle. About 95% of the time, Ago bound within 45-62 nucleotides of this peak, so the authors defined this region as the average Ago-miRNA footprint. Linear regression analysis of all 6-8 base motifs in clusters yielded numerous “enriched” seed sequences; the most prevalent corresponded to the binding site of miR-124, a well known brain-specific miRNA. Indeed, Ago-mRNA footprints were rich in miRNA binding sites, suggesting that this approach may predict active sites with far better specificity than other methods.

HITS-CLIP Implications

By reducing the search space for miRNA binding sites to a 45-60-nucleotide Ago footprint, HITS-CLIP offers a powerful complementary approach to bioinformatic methods for miRNA binding site prediction. Computational approaches alone are known to have high false positive rates, whereas the authors estimate FP rates of just 15% for HITS-CLIP. The new method offers dramatic improvement for transcripts with highly conserved 3′ UTRs, which often have many “predicted” miRNA binding sites because so many computational methods rely on conservation. Analysis of the HITS-CLIP ternary map revealed that real miRNA-mRNA binding events are very specific, with an average of just 2.6 Ago-mRNA clusters per regulated transcript. Despite the thousands of predicted binding sites, each miRNA bound an average of 655 targets. These results suggest that miRNA selectivity is much higher than previously believed. Yet Ago-mRNA clusters seemed to show no apparent sequence preference (data not shown), so it’s likely that other RNA-binding proteins are involved.

Thus, this study sets the stage for large scale genome-wide RNA-protein maps that include other proteins, tissues, and species, which should yield an unprecedented new level of understanding of this complex regulatory process.

References
Chi SW, Zang JB, Mele A, & Darnell RB (2009). Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature, 460 (7254), 479-86 PMID: 19536157