AGBT: Sequencing’s Debutante Ball or Déjà vu

February 15, 2013 by Dan Koboldt

It’s less than a week until the Advances in Genome Biology & Technology meeting (#AGBT2013) in Marco Island, Florida. Historically, this meeting has been like a debutante ball for the sequencing community — the place where new technologies are showcased, promised, and occasionally delivered upon.

The trends of past meetings often set the tone for the year that followed:

AGBT 2010 highlighted new emerging sequencing technologies like Pacific Biosciences and IonTorrent.
AGBT 2011 was all about smaller, more affortable benchtop sequencers like the MiSeq and PGRN, culminating with a huge guy literally carrying in a prototype IonTorrent PGM.
AGBT 2012 focused on sequencing’s clinical applications and saw the stunning announcement of Oxford Nanopore’s jump drive sequencers.

AGBT 2013 Anticipation

This year, AGBT is a bit later in February than usual — good news for the 15 minutes of beach time managed by most attendees — and I won’t be attending myself. Looking at the meeting agenda confused me for a minute. It felt like déjà vu … PacBio talking about structural variation, Liz Worthey on clinical applications, my friend Aaron Quinlan presenting a cleverly-named informatics tool (this year it’s “LUMPY”). I read it and wondered, “Am I looking at an old agenda?”

If we take the agenda as a whole and do some basic text mining, the recurrent themes of most presentations are quick to emerge:

count    phrase
21    sequencing
18    genome
11    cancer
10    analysis
9    whole, human, genomic
8    clinical
7    single
6    gene, cell
5    rna, mutations, discovery
4    medicine, mechanisms, exome, complex
3    tumors, somatic, de novo, metagenomic
3    ngs, genetic, expression, applications

Sequencing, of personal genomes and cancer genomes, is the weather-tested mainstay of agenda topics from recent years. There’s a substantial interest in getting sequencing into the clinic, much like we saw last year. Popular applications of next-gen sequencing, like RNA-seq and cancer profiling, will have their say. Interestingly, exome sequencing seems under-represented… one gets the feeling that this community, at least, is beginning to consider it routine (ho hum). I’m pleased to see the term “metagenomic” a few times, which suggests a growing appreciation of the importance of the human microbiome and similar topics.

Following AGBT Live

Thanks to Twitter and some dedicated “tweeps” (people who tweet things live), this meeting is easy to follow. I’ll have a steady feed of the #AGBT13 hashtag, which isn’t nearly as good as being there, but the best I can manage. There are some good talks to look forward to:

Eric Boerwinkle (Univ. of Texas) is giving a talk on ““Whole Genome and Exome Sequencing and Analysis of Large Numbers of Deeply-phenotyped Individuals Reveal the Genetic Architecture of Complex Traits: the CHARGE Consortium”.
Matthew Wiggin of Boreal Genomics will present “Multiplexed Detection of Low Abundance, Tumor Related Nucleic Acids in the Plasma of Cancer Patients”.
Malachi Griffith, of the Genome Institute at Washington University, has a talk on “Clinical Cancer Sequencing and Integrated Analysis of Whole Genomes, Exomes and Transcriptomes”.
Mark DePristo of the Broad Institute (a lead contributor to GATK) will discuss “Overcoming Today’s Limitations in Sequencing Technology for Human Medical Genetics” and hopefully commercial attendees won’t have to pay $300K to listen.

The Twitter hashtag is usually cluttered with commercial stuff, so you might want to pick some prolific Twitter-aholics like Nick Loman and Aaron Quinlan.

Update: AGBT 2013 Wordle Tag Cloud

My friends at @nextgenseq made a Wordle tag cloud that nicely illustrates the theme of this year’s AGBT meeting.

Next-Gen Sequencing in 2010

March 9, 2010 by Dan Koboldt

On the shuttle from Marco Island to the airport last week, I happened to sit next to a very nice gentleman from Illumina. We got to talking, of course, and I asked him if they saw a threat from any of the new sequencing platforms presented at AGBT. I’m aware that Illumina currently enjoys a greater-than-50% share of the next-gen sequencing market, so I was curious about his impressions.

“We definitely see a segmentation of the market,” he admitted.

Something had been bothering me about the sequencing-company presentations this year, and I finally realized what it was. During AGBT 2009, every player was gunning to take over the world. This year it seems like every sequencing platform has a niche in mind.

General Sequencing: Illumina vs. Life Technologies

Illumina’s HiSeq2000 and Life Tech’s SOLiD 4 are after the general sequencing market – whole genome, transcriptome, and targeted (capture) sequencing. It’s a constant game of one-upmanship in throughput and claimed accuracy. In February this year, Illumina launched the HiSeq2000 with expected throughput of 200 GB per run. Life Technologies launched SOLiD 4 with 100 GB per run, but promised 300GB per run later this year. On the read length front, Illumina remains the clear winner – 2×100 is in production at many genome centers, and even longer reads have been promised. Life Tech, to their credit, is pushing the SOLiD 4 platform pretty hard.

When Length Matters: 454

Roche/454 has wisely backed away from large-scale sequencing, and instead seems to be targeting applications where longer (450 bp) reads are a requirement. At AGBT, Henry Erlich (Roche) gave an interesting talk about genotyping and haplotyping human HLA regions to improve donor matching for organ transplants. Here’s a key challenge of modern medicine where sequencing can offer tangible benefits. Here at the genome center, we use 454 runs for validation and for small-scale targeted sequencing. There are many applications where relatively inexpensive long-read sequencing runs are idea; full-length cDNA sequencing, for example, comes to mind.

Complete Genomics: Sequencing as a Service

The business model of Complete Genomics seems a bit of a gamble to me. They aim to be the provider of relatively inexpensive, start-to-finish sequencing services. No technology or reagent sales for these guys. Instead, they want to take your samples and give you back the SNPs. In the coming years, they hope to build as many as 10 facilities throughout the world that provide these services. I’m a bit leery of Complete Genomics, not only because their proprietary technology lags behind others (currently it’s at 2X35 bp), but because they’ll need to do something like 10,000 genomes a year just to stay in business. I don’t think we’re ready for that.

Sequencing for the Masses: IonTorrent

Many of us were impressed by IonTorrent this year at AGBT. The incredibly low cost of their instrument ($50K) and sequencing runs ($300-500) mean that nearly any lab could write a grant around this technology. The sample prep, accuracy, and throughput are still a grey area, but if they prove to be good enough, high-throughput sequencing will suddenly be available to just about everyone.

Single Molecule Applications: Pac Bio and Oxford Nanopore

The true single-molecule sequencing platforms that are close to market are certainly getting everyone excited. In the next few years, however, it’s unlikely that Pacific Biosciences, Oxford Nanopore, mystery-Chinese-platform, or other companies will displace massively parallel sequencing. No, I think Illumina and SOLiD will remain the “work horses” for discovery, certainly at major genome centers. Where SMS technologies can excel, however, is ultra-long reads – think about PacBio’s strobe sequencing to resolve structural variation or finish assemblies – and lots of molecule-kinetics stuff that I don’t understand.

I think that 2010 will be an exciting and telling time for all of these platforms. In a year’s time, we should have results in hand from HiSeq, SOLiD4, PacBio, and even IonTorrent, and be able to distinguish between marketing claims and sequencing reality.

AGBT: PacBio Somewhat Unveiled

February 27, 2010 by Dan Koboldt

Yesterday the Pacific Biosciences commercial instrument (photo) was at last unveiled to a packed room of conference attendees. The road to this third generation sequencer’s release has been paved with nearly $300 million of investment capital since leaving a basement at Cornell University. PacBio, in addition to becoming something of a media darling, has quietly swelled to a several-hundred-employee company.

Since last year, PacBio claims to have achieved read lengths of up to 10.3 kbp, although I haven’t spoken to anyone outside the company who has seen reads that long. Even so, a few vignettes presented in the workshop told of how PacBio has been applied to influenza strain identification and detection of stuctural variants (SVs).

Strobe Sequencing in Real Time

Of particular interest is the “strobe sequencing” mode of the instrument, in which the detection laser is turned off for precise amounts of time to generate mate-pair-like reads spanning large fragments. This feature relies on the real time sequencing, which occurs at a very consistent per-base rate. In fact, it’s possible to infer sequence insertions and deletions as spikes or dips (respectively) in the time required to sequence a template of known size.

Kinetic Variation Applications

The kinetics of real-time sequencing offer an informative new dimension of information from the PacBio data. In a talk today, Eric Schadt of PacBio showed that the kinetics of sequencing vary significantly for “modified” bases, i.e. methylated residues. In a collaboration with Carrie Harwood (UW), PacBio is sequencing the genomes and transcriptomes of 132 isolates of a hydrogen-producing species of Rhodopseudomonas. It turned out that kinetic variation exists at many bases as a “mixture” of sequencing times; by mining these, they identified thousands of methylated bases that caused up to 12-fold variation in sequencing kinetics.

Burning Questions Unanswered

Personally, I was not entirely satisfied with the PacBio workshop. When it opened for questions, I asked the first: whether PacBio had improved any upon the “dark bases” that go by undetected in single molecule sequencing. The presenter — Stephen Turner of PacBio — first gave me a nice 2-minute lecture on why there are no such thing as “dark bases” on PacBio’s sequencing platform due to its inherent awesomeness (sarcasm mine). There is still a problem with “missed bases” but Turner was almost comically evasive (as Daniel MacArthur put it) in stating how often they occur. The next question concerned read lengths, a second topic on which Turner refused to provide concrete information.

Thus, I find myself cautious in my excitement about this new platform, and will reserve judgment until later this year, when the first of the golden-ticket early access partners begin generating data on their own PacBio SMRT sequencers.

AGBT: Cancer Genomics at St. Judes, Harvard, WashU

February 26, 2010 by Dan Koboldt

Today’s plenary session included some great talks on cancer genomics. Keynote speaker Jim Downing of St. Jude Children’s Research Hospital gave a talk on acute leukemia, in which he openly admitted that he would show no next-gen sequencing data. Instead, he gave a very nice overview of the four biological processes that are dysregulated in acute leukemia:

Self renewal. With a few exceptions, pre-leukemic cells have only limited self-renewal capacity. AML1-ETO is often altered to overcome this limitation.
Response to growth factor signals. The BCR-ABL gene fusion is a classic example of an alteration that lets cells grow in the absence of growth factors.
Differentiation. Leukemic cells block this process via alterations in PML-RARA, PAX5, EBF, BTLA, and others).
Apoptosis. This normal pathway of cell death is circumvented in leukemia via alterations in CDKN2A/B, BT6, and the RB pathway.

Non-NGS Molecular Profiling

Dr. Downing’s group uses several molecular techniques to characterize pediatric leukemias, including Affy SNP-chip (for copy number alterations), cytogenetics/FISH, and targeted sequencing in a handful of genes. In a study of 242 pediatric acute lymphoblastic leukemia (ALL) tumors with matched controls, a surprisingly small number of copy number alterations were observed.

There were a few significantly altered genes, however. PAX5 was deleted or amplified in 30% of B-cell ALLs; some apparent 3′ deletions proved to be fusion events with ETV6, FOXP1, or other genes. Another gene, IK2F1, was deleted in 83.7% of ALLs that were BCR-ABL positive. These and other findings convinced the audience, I think, that there is much to be learned, even about the best-characterized human cancer, and even without next-generation sequencing technologies.

Cancer Genomes and Translational Oncology

Levi Garraway of Harvard Medical School spoke about how next-generation sequencing can be applied to translational oncology. He offered a clinical perspective to cancer genomics, which has somewhat different requirements from basic research:

Targeted. The mutations and genes to be assessed in clinical samples must already be known and well-characterized.
Resource-efficient. To minimize costs, clinicians are interested in tests that make efficient use of sample and equipment resources.
Actionable. Only mutations and biomarkers that give actionable information, i.e., “the patient has X mutation, so we should administer drug Y” are valuable in a clinical setting.

A resource compiled by Dr. Garraway and others, called OncoMap, offers a database of known oncogenic mutations that can be tested (on frozen or FFPE samples) for just $200 per patient. Granted, it includes only 46 mutations from 34 cancer genes, but each provides a validated, actionable course in regard to treatment.

The speaker admitted that ideally, a systematic mutational profiling method would have high sensitivity and specificity, testing both oncogenes and tumor suppressors. It would also detect multiple alteration types (SNVs, CNAs, etc) and be able to use either DNA or RNA, or both. And it would have an “acceptable” turnaround time, say 2 weeks. This is what clinicians want, and it may be that hybrid capture approaches may offer the best solution. More on that in another post.

Elaine Mardis: Single Molecule Sequencing in Cancer

My favorite talk of the day (obviously) was by genome center co-director Elaine Mardis, who presented WashU’s pipeline for detecting and validating somatic mutations from whole-genome sequencing. Our pipeline has evolved over the course of AML1, AML2, and other cancer whole-genome sequencing projects, and now has the highly automated capacity to handle the coming 600 tumor-normal pairs to be sequenced for the Pediatric Cancer Genome Project (PCGP).

Dr. Mardis also discussed our methods for systematically assessing the prevalence of somatic mutations (within a tumor population) as well as their recurrence in tumors of the same or other types. Prevalence is important because the greater fraction of tumor cells that share a mutation, the more likely it occurred early during progression. By similar reasoning, assessing the recurrence of mutations in a tumor type provides a measure of their importance for disease development.

The Importance of Recurrence Testing

IDH1 demonstrates this principle well. Initially identified as a key cancer gene in glioma by Bert Vogelstein’s group at Johns Hopkins, the isocitrate dehydrogenase 1 (IDH1) gene was also mutated in AML2, and, in a screen of hundreds of AML samples, proved to be recurrent. At least two large-scale studies of AML have since replicated the common incidence of IDH1 mutations in AML and other cancers.

Third Generation Sequencing in Cancer

Finally, the speaker presented some recent experiments that we’ve performed using the Pacific Biosystems Single Molecule Real Time sequencer on in-house cancer samples. In work that’s part of a manuscript in submission, the accuracy and sensitivity of the SMRT sequencer were assessed on GBM and AML tumor samples that had already been characterized by whole genome sequencing. In general, the results were promising – 25 of 25 known somatic mutations were identified in SMRT sequencing of PCR products, although 6 were detected at lower-than-expected prevalence.

Somatic mutations from AML2 were also used to create mixed PCR libraries of various tumor cellularities from 50% to 100%. It was apparent that “tier 1” somatic coding mutations were more reliably detected on Pac Bio than tier 2 and tier 3 mutations, and that there’s a slight bias against detecting C to T mutations. That said, the ability of SMRT sequencing to detect somatic mutations even at low tumor cellularities is promising.