Archives for April 2009

DNA Day 2009

April 25, 2009 by Dan Koboldt

Yesterday I participated in the genome center’s DNA Day Ambassadors program, where employees visit area schools to spread the wonders of DNA to students. My school was Our Lady of Lourdes, a catholic school whose 8th grade class I previously visited to talk about my work in genomics.

We brought them a real DNA experiment, some genome center pencils, and hopefully, some excitement about DNA and science in general. Thanks to the Tech D boys Kevin and Devin, I could show off a couple of very cool looking Illumina flowcells and pass them around. My best prop for the visit, however, was a guy named Vince Magrini.

Vince Engages the Class

I somehow managed to recruit the genome center rising star and group leader of our Technology Development to join me on the visit to Lourdes. He was a natural, engaging with the class right away. It also proved fortunate because he’d developed the protocols for the strawberry DNA extraction experiment that we brought for them to do.

Intro to DNA

Before the experiment, the class tolderated my 15 or so slides on DNA, the history of the double helix, and Watson & Crick. Not that they cared, but with some searching I was able to find a can of the original “Photo 51” that Rosalind Franklin took by X-ray diffaction of the crystalline structure of DNA. I also found an early hand-drawn sketch by Francis Crick, his first drawing of the famous double helix structure. We talked about some of the things that are inherited, too, like eye color and hair color. DNA Day has its responsibilities, after all.

Smashing Strawberries

Filtering the Solution

Then Vince took other, and within minutes, the entire room smelled of strawberries. There were around forty kids divided into eight or nine groups, and all of them were eager to participate in the fruit smashing. Vince guided them through cell lysis (soap), precipitation (salt), and then separation (ethanol). There were varying levels of scientific rigor, among the groups, and thus various levels of success. To be perfectly honest, we weren’t as precise as we could have been when adding salt and ethanol to their mixtures. But most of the groups at least precipitated some DNA, and all seemed to have a good time. They all enjoyed putting on the gloves and getting to work – and thanks to the foresight of the outreach department, we had diapers down to absorb most of the mess. Pam Nangle from the genome center came along, and it was she who took all of these great pictures.

We wrapped up with some discussions about genetics while Vince prepared an even more impressive demonstration – a flash gel of their strawberry DNA isolations (as well as some human DNA controls and a ladder) that he ran and put under a blacklight. They came up in groups to see the gel and get a 5-minute lesson from Vince. Then suddenly the hour and fifteen minutes was over, and we were packing up to head home.

Vince's Flash Gel

HapMap Continues to Bear Fruit

April 22, 2009 by Dan Koboldt

You might have thought that the 1,000 Genomes Project would render the International HapMap obsolete. But just yesterday I heard a talk about how some groups are still leveraging the HapMap resource in numerous ways to better understand the relationship between genotype and phenotype. The speaker was Wei Zhang, a postdoc at the University of Chicago who’s published an astonishing 25 papers in the last 2 years.

One key advantage of the HapMap samples is the availability of transformed cell lines for all samples at Coriell. This allows researchers to assess various phenotypes with cell-based assays (e.g. gene expression, drug toxicity) and then mine the rich HapMap genotype dataset to perform genotype-phenotype associations. In a collaboration with Affymetrix, Zhang and his colleagues measured gene expression in 87 CEU samples and 89 YRI samples using the Human Exon 1.0 ST array, which captures ~1.4 million annotated exons from ~18,000 transcript clusters in the human genome. The data are available in the SCAN Database hosted at the University of Chicago.

Differentially Expressed Genes and SNP Association

The researchers found ~9,100 expressed genes in the CEU and YRI samples, including 383 that were differentially expressed between the populations (247 had higher expression in YRI than CEU, 136 had higher expression in CEU than YRI). Next, they used sample-level data in each population to correlate expression of those 383 genes with SNP genotypes. They successfully identified 75 genes with significant expression-genotype correlations, 11 of which were in cis (same chromosome within 2.5 Mb) and 64 of which were in trans.

Isoform Variation

Isoform variation was also detectable in the exon array data – by examining expressed genes with 3 or more exons, the researchers could compare probe intensities for each exon to see if any were differentially expressed. They identified a number of genes with differential isoform expression between YRI and CEU populations, and when they performed GO analysis, the most enriched gene category was, interestingly, genes that encode splicing factors.

SNPs, Gene Expression, and Pharmacogenetics

The Chicago group also performed a number of cell-based assays on the Hapmap samples to measure toxicity induced by a number of anti-cancer drugs. In this case their phenotype was IC50, the drug concentration at which growth was inhibited in 50% of cells. Such a drug study seems ideal for the HapMap samples since they happen to be transformed (i.e. continuously proliferating) cells. They measured IC50 for several types of anti-cancer agents (6 total), including DNA antimetabolites, platinating agents, and topoisomerase II (TopoII) inhibitors.

First, using the HapMap trio (mother-father-child) information in the CEU panel, Zhang and colleagues determined the “heritability” of IC50, which proved to be high (values in the 0.3-0.4 range) for all of the drugs. This provides more evidence for what seems to be an accepted fact: pharmaceutical response is a phenotype with a significant inheritable genetic component.

What they did next was very interesting: they performed an integrated analysis of HapMap genotypes, gene expression, and drug response to identify predictors of drug-induced toxicity. Zhang described their method as a “triangle approach”: first, SNPs were associated with drug response, then those SNPs were analyzed with the expression data to determine if any were also associated with gene expression. The correlated genes were then compared back to the response data, to see if any were also associated with drug response. As a result, they’re able to identify SNPs that influence gene expression which in turn influences repsonse to the drug. Genotype-mechanism-phenotype. I like it.

As an example of their findings, Zhang presented a SNP in GALNTL4 that was associated with response to Cisplatin, which I presume is a platinating agent. SNP genotypes were correlated with expression of GALNTL4, and that in turn was correlated with IC50 to Cisplatin. But here’s what I liked most about this example: the SNP they presented was intronic. It’s another reminder that it’s time to look outside the exons, people!

Future Directions: miRNA and Methylation

Efforts are currently under way at the University of Chicago to measure two more cell phenotypes on the HapMap samples. One is micro-RNA (miRNA) expression, which they’re assessing with something called the Exiqon miRCURY platform. The other is DNA methylation, as measured by chip-CHiP assays with CpG antibodies. I seem to recall that another group has already identified methylation-associated SNPs using HapMap data, but even so, I look forward to what Zhang and his colleagues will find.

Bowtie Comes of Age

April 7, 2009 by Dan Koboldt

The short read aligner Bowtie has gone legitimate, with a publication last month in Genome Biology and a mention by GT’s Daily Scan. While it has yet to supplant Maq as the de facto standard for Illumina/Solexa processing, Bowtie remains one of my favorite short read aligners. It was the first tool (to my knowledge) to implement Burrows-Wheeler Transform indexing, a method fast enough that it was soon adopted by the makers of SOAP and Maq. With last week’s paper, I finally got an idea of how BWT works:

bowtie-paper-figure1-bwt1 It was the first tool (to my knowledge) to implement Burrows-Wheeler Transform indexing, a method fast enough that it was soon adopted by the makers of SOAP and Maq. With last week’s paper, I finally got an idea of how BWT works.

How Much Faster is Bowtie than Maq?

Based on my experience, I’d say it’s orders of magnitude faster, with only a slight hit in sensitivity. Here’s some data to back me up: CPU time to align a full flowcell (by lane) of 36-bp fragment-end reads to Hs36.

Figure 1. CPU time to align 1 flowcell of Illumina/Solexa 36-bp SE reads, by lane

On average, Maq took ~8 hours per lane to align the reads to Hs36, whereas Bowtie took just 54 minutes per lane.

New Paired-End Functionality

Also exciting is yesterday’s Bowtie update, which includes the much anticipated paired-end alignment mode. Paired-end mode is not only important for placing more reads, but also makes detection of structural variation with Bowtie all the more easier. While I haven’t yet evaluated this feature, if it’s done well, then Bowtie has become a serious player in the short read alignment game.