December 2008

Like many areas of research, the biomedical sciences have seen numerous recent advances that will no doubt have an impact on human health. For many researchers, however, 2008 may most be remembered as the year of the cancer genome. Several key studies this year have offered an unprecedented view of the genetic profiles of tumors, and demonstrate the power of genomic analysis to study the molecular mechanisms that play a role in carcinogenesis.

Two Landmark Studies on Brain Cancer

In September, two groups reported large-scale resequencing efforts in glioblastoma, the most common form of brain cancer. The first study – published online in the journal Nature – was the pilot project of the Cancer Genome Atlas (TCGA) research network, a coordinated effort involving more than a dozen U.S. research institutions with funding from the NCI/NHGRI divisions of NIH. TCGA researchers sequenced several hundred known or suspected cancer genes in tumor samples and matched controls from 91 patients. This work expanded upon and complemented a second study – published in the journal Science – in which a consortium led by Johns Hopkins studied sequence aberrations from 21 GBM tumors. The Hopkins study implicated a new gene, IDH1, in which mutations were associated with longer survival among GBM patients. The TCGA report identified three genes with significant roles in GBM:

NF1, a gene previously identified as the cause of neurofibromatosis 1, a rare, inherited disorder characterized by uncontrolled tissue growth along nerves
ERBB2, a gene that is well-known for its involvement in breast cancer
PIK3R1, a gene that affects activity of an enzyme called PI3 kinase that is deregulated in many cancers.

Insight into Chemotherapy Resistance

TCGA researchers also made another very exciting finding: a possible mechanism for chemotherapy resistance. It was already known that patients with methylation (inactivation) of the MGMT gene respond better to temozolomide, an alkylating chemotherapy drug commonly used to treat brain cancer. Some patients who receive this treatment later relapse with GBM that doesn’t respond to temozolomide. By integrating methylation and sequence-mutation data, TCGA researchers found that this may be due to mutations in DNA mismatch-repair genes that are induced by the alkylating therapy. It may be that a combined treatment could mitigate cancer’s ability to persist in these patients.

Next Up: Lung Adenocarcinoma

Another important study, this one of lung adenocarcinoma, appeared in the print edition of Nature in October. This was the report of the tumor sequencing project (TSP), another consortium that sequenced 623 genes in 188 patients with the most common form of lung cancer. The TSP study identified over 1,000 somatic mutations across the samples, and implicated 26 genes that were frequently altered in lung adenocarcinoma. Another interesting finding was that patients who smoked had 4 times the number of mutations and less chance of survival than non-smokers. Just a friendly reminder that yes, smoking can kill you.

AML: The First Complete Cancer Genome

FInally, in November, the masterpiece – in a study published in the journal Nature, scientists at Washington University in St. Louis reported whole genome sequencing of a patient with acute myelogenous leukemia (AML). It was two accomplishments, really: the first cancer genome, and the first whole-genome sequence from a woman. Over a period of nine months and at a cost of around $1 million, we sequenced DNA from tumor cells (to ~33x) and normal skin cells (to ~14x) using Illumina/Solexa technology. I can’t begin to tell you the sheer amount of data this represents. We’re talking around 98 billion bases in short (36 bp) reads from around 100 runs on the Solexa platform. And the number of validated, somatic coding mutations? Ten. It seems like a lot of effort for just 10 mutations, but the AML study demonstrated that second-generation sequencing platforms are a powerful approach to study cancer genomics. Next year, we’ll probably see 50 complete cancer genomes sequenced. But as the peanut butter jar principle tells us, first is best!

It seems like everyone is looking at structural variant detection these days.

We recently had a visit from Ben Raphael, a friend of the genome center whom we tried to recruit years ago when he was a postdoc. Now he heads a group at Brown University, where (by his own admission) they basically taps into some of the large datasets out there (like TSP and TCGA) and develop/apply their own algorithms. Ben gave a talk on structural variation in human and cancer genomes, in which he presented some of the work that he and colleagues have pioneered in End Sequence Profiling (ESP).

Who is this guy?

Ben’s main background is in mathematics and computer science. The cancer research came later, when (in 2003) a group at UCSF approached him with a cancer genome sequence that had seen massive rearrangement. They developed a way to reconstruct the tumor genome architecture and published the results in Bioinformatics in 2004. Incidentally, this work of Ben’s was profiled when he was named one of Tomorrow’s PIs by Genome Technology. The GT article came out in late 2006, a time when I was very interested in SV, and I remember thinking “who is this guy?”

End Sequence Profiling in Cancer

When I think of ESP, I tend to think of the Tuzun et al 2004 paper, as many people in the field do. There was, however, a study published a year earlier (in 2003) on ESP as an approach for sequence-based analysis of rearranged genomes. The idea is to sequence 500 bp at each end of clones (100-250kbp in size) and then apply a geometric clustering algorithm to look for rearrangements. Ben Raphael’s group applied this method to BRCA cell lines as well as primary tumors (breast, prostate, ovarian, and brain cancers). The principal goal was to identify fusion genes (like the widely known Philadelphia chromosome). In studies published this year, Ben’s group did find rearrangements that created fusion genes, though none appeared to be transcribed.

ESP compared to CGH

Ben’s group compared their findings to competitive genome hybridization (CGH) array results and found a “statistically significant” amount of overlap in rearrangements predicted by both methods (Agilent 244K CGH arrays and 150K ESPs). This past summer, they snagged some of our TCGA glioblastoma data and did the same comparison. In the case of GBM, Ben noted that they found far too many SV’s for them to all be somatic; more likely, most of them are germline variants. As many as 5-20% of them were known inversion polymorphisms, which also seemed high. Nevertheless, I think the audience was impressed by their methods, and my guess would be that invitations to join in the next round of TCGA analysis may be forthcoming.

2008: Year of the Cancer Genome

Everybody was SV Detecting…

Archives for December 2008