Archives for May 2010

Outsourced Sequencing and Analysis

May 21, 2010 by Dan Koboldt

A company in Malaysia is offering to map whole-genome sequencing data and call variants in one week’s time for $4,000.

I readily admit that I have not taken sequencing-as-a-service companies very seriously. The idea of sending precious samples off to a third party and getting back the sequence and variants doesn’t appeal to me for a number of reasons. Outsourcing just the analysis of sequence data is even more anathema. Why would anyone want to do that? Analysis is the best part! Then again, I’m fairly biased in this matter because (1) I work at a major genome center with significant in-house sequencing resources, and (2) sequence analysis and variant detection are among my job responsibilities. Obviously I don’t want those to go away.

That said, there seems to be a growing interest in outsourcing sequencing and/or analysis in the wider research community. Complete Genomics had a strong presence at Marco Island this year, and has a growing customer list that includes (perhaps surprisingly) at least two genome centers. Beijing Genomics Institute (BGI) announced a purchase of 128 Illumina HiSeq2000 instruments in January; a month later in Science magazine I saw a full-page ad indicating that they’re open for business as a sequencing provider. No big deal, they’re half a world away, right? So I thought, until I heard whispers of a BGI facility in San Francisco.

Second and third-generation sequencing technologies are bringing about volatile changes in the fields of genetics and genomics. Throughput continues to skyrocket, while the costs of sequencing plummet. It’s now possible to sequence a complete human or mammalian genome to high coverage on a single instrument run at ~$20,000. This has had two effects on the research community:

Genomes abound. At least a dozen individual human genomes have been published, but NGS technologies are being applied to a wide range of studies – exomes, transcriptomes, model organisms, you name it.
Everyone wants to sequence. Thanks to a lot of press and some high-profile publications, massively parallel sequencing is known to every corner of the biomedical research world. Suddenly every clinician with a patient cohort wants in, because if they don’t find the disease-causing genes, someone else will.
Not everyone can buy an NGS instrument. Commercially-available sequencers currently cost a quarter to a half million dollars or more each, which is a significant purchase even for labs flush with ARRA funding. This means that a lot of small labs will not be looking to buy a machine, but rather to rent space from someone who has one. Music, no doubt, to the ears of BGI and Complete Genomics.

One thing is clear. These new sequencers and service providers are going to put high-throughput sequencing into the hands of many investigators. Investigators, I might add, who likely have never dealt with NGS data. I think that’s potentially very exciting, and I hope that the experiences of major genome centers will help newcomers address the challenges of massively parallel sequencing.

A Formula for Dosing Humans with Rat Poison

May 11, 2010 by Dan Koboldt

In 1920, a mysterious epidemic broke out in the cattle populations of the United States and Canada. It was a severe disease of internal hemorrhaging that struck quickly and inexplicably; ranchers were soon distraught at the losses to their herds. Two years later, Frank Schofield connected the disease to sweet clover hay, which had been widely used as cattle fodder since the beginning of the century. But the agent behind hemorrhagic sweet clover disease remained elusive. sweetclover

The turning point came in 1933, when a farmer drove to the University of Wisconsin with a truckload of spoiled hay and blood from a cow that had died after eating some of it. The farmer’s plight caught the interest of Karl Link, an associate professor of agricultural chemistry. Seven years later, Link and his colleagues announced the purification and synthesis of dicumarol, the hemorrhagic agent in spoiled sweet clover hay. It seems that a series of wet summers had led to the infection of sweet clover fields by mold. In response, the sweet clover plants produced coumarin, a natural compound that defends against fungal infection. With the support of the Wisconsin Alumi Research Foundation (WARF), Link and his colleagues synthesized over 100 analogues based on dicumarol’s structure. In 1946 they developed the highly potent form that was patented by the WARF organization. It was a compound that smelled, appropriately, like freshly mown hay. It was a toxin deadly enough to be used as a rat poison. They named it warfarin.

Blood Thinner or Rat Poison?

At first, warfarin was considered too toxic for human use. It was marketed as a rodenticide, and became a popular rat poison. In the 1951, a navy recruit took a large dose of warfarin to attempt suicide. Surprisingly, he lived, and clinical trials soon thereafter showed that warfarin could be administered safely to humans. The idea of warfarin therapy became widely known in 1955, when it was given to President Eisenhower after a heart attack. Today, warfarin is most frequently prescribed oral blood thinner, and the eleventh most-prescribed drug overall. It’s given to patients where unwanted clotting is a risk — after surgery, stroke, pulmonary embolism, or deep-vein thrombosis (DVT). Unfortunately, warfarin has a narrow therapeutic range. Too little, and it has no effect on clotting. Too much, and the patient could suffer internal hemorrhaging. To further complicate things, the correct warfarin dose is influenced by a number of factors – clinical ones (weight, age, INR), diet, heritage, etc.

Warfarin Pharmacogenetics and Clinical Trial

It became apparent that genetic factors play a critical role in effective dose of warfarin. Two genes in particular have been demonstrated to modulate warfarin response: VKORC1, which encodes a component of the vitamin K epoxide reductase (VKOR) complex that is targeted by warfarin; and CYP2C9, the cytochrome P450 enzyme primarily responsible for metabolizing the drug. Numerous other genes have been implicated as well, though none have proven more informative than VKORC1 and CYP2C9 genotypes. The clear genetic component, and the as-yet-unraveled complexities of correct dosing, are probably why warfarin has become the poster-child for pharmacogenetics.

Brian Gage, M.D.

Last month, a team led by Brian Gage at Washington University in St. Louis published an elegant formula for warfarin dosing that takes clinical and genetic factors into consideration, in conjunction with a web site (www.WarfarinDosing.org) where clinicians can use it to calculate and track patient doses. This month, the National Heart, Lung, and Blood Institute (NHLBI) announced a five-year, $3.7 million clinical trial to assess warfarin risks and benefits. The Genetics InFormatics Trial of Warfarin (GIFT) trial, to be led by Brian Gage and his colleagues, will enroll knee- and hip-replacement patients at our own Barnes-Jewish Hospital to improve upon the warfarin dosing formula.

It’s the most interesting story I’ve heard that begins with a farmer, a cow, and the state of Wisconsin. Strange how the mysterious sweet clover disease, described as “an insidious hemorrhagic disease” by the Merck Veterinary Manual, would yield a compound so valuable for human health.

References
Lenzini P, Wadelius M, Kimmel S, Anderson JL, Jorgensen AL, Pirmohamed M, Caldwell MD, Limdi N, Burmester JK, Dowd MB, Angchaisuksiri P, Bass AR, Chen J, Eriksson N, Rane A, Lindh JD, Carlquist JF, Horne BD, Grice G, Milligan PE, Eby C, Shin J, Kim H, Kurnik D, Stein CM, McMillin G, Pendleton RC, Berg RL, Deloukas P, & Gage BF (2010). Integration of genetic, clinical, and INR data to refine warfarin dosing. Clinical pharmacology and therapeutics, 87 (5), 572-8 PMID: 20375999

GWAS and the Genetics of Human Disease

May 4, 2010 by Dan Koboldt

An essay published last week in Cell dismissed the findings of genome-wide association studies (GWAS) and questioned their value to the study of human disease. In their article Genetic Heterogeneity in Human Disease, McLellan and King argue that because common diseases exhibit a high deegree of allelic, locus, and phenotypic heterogeneity, their causality “can almost never be resolved by large-scale assocation studies.” Instead, the authors believe that rare mutations underlie most of the disease-relevant genetic variation in humans, and as such, their causal relationships can only be uncovered by sequencing-based approaches. The article as a whole comes off as uninformed and misleading. Thanksfully, genomics bloggers have taken them to task: p-ter at the Gene Expression blog explains how noncoding variants influence disease risk, and Kai Lang guest-posts at Genetic Future with a full-on criticism of the essay.

GWAS Overload

I am tempted to agree with McLellan and King in some respects, particularly in their concern that the myriad of GWAS publications often fail to advance our understanding of many common diseases. And I do mean myriad. Once upon a time, Nature Genetics was my favorite journal for cutting-edge genetics and genomics discoveries. Take, for example, the summer of 2006 when three high-profile papers revealed the presence of extensive structural variation in the human genome. In recent months, however, I find myself underwhelmed by the content of this particular journal, as it seems saturated with GWAS, GWAS, and more GWAS.

In fact, when I looked at the ~70-80 research articles published in 2010 in Nat. Genetics, more than half (46) were association studies, or worse, meta-analyses of association studies. It’s like every investigator in the world with a disease cohort got a hold of an Affy or Illumina SNP array. When I scan the titles each month in my RSS reader, my eyes begin to glaze over with each new title that reads “Common variants associated with…” or “Genome-wide assocation study identifies…” Unless you happen to be an investigator studying the phenotype or disease of interest, these cookie-cutter papers probably hold little interest for you.

That said, I took issue with much of what was written in the McLellan and King essay. Specifically:

Their disparagement of the value of GWAS studies based upon the observation that most associations come from intergenic regions. As my colleagues in the blogosphere have pointed out, the aim of high-density SNP arrays is not to pinpoint the causal SNP; in fact, a high-frequency variant is more likely to be included than a rare nonsynonymous SNP simply because the former is more informative as a genetic marker.
Their blanket dismissal of most GWAS findings as artifacts of “cryptic population stratification.” The authors suggest that although outliers based on population substructure may be excluded, “hypervariable polymorphisms resmain vulnerable to stratification.” As Kai Wang points out in a guest post on Genetic Future, the methods to account for hidden population structure are well established in the GWAS community.
Their apparent misunderstanding of how genome-wide association studies work. They write: “Had sickle cell anemia been investigated among afected individuals worldwide, the number of responsible mutations would be far greater and hence no one allele at any SNP would be consistently associated with the disease.” This is flat-out wrong. Although there are hundreds of known mutations in HBB — the gene that encodes hemoglobin and, when mutated, causes sickle-cell anemia — most cases are caused by a single amino acid change (glutamic acid -> valine). Sickle-cell is autosomal recessive, so it’s rather preposterous to assume that a worldwide study would fail to associate the homozygous variant with the disease.

Common Disease, Common Variants

The authors seem convinced that the common disease, common variant theory no longer holds because (according to them) not many have been found. Rather, McLellan and King believe that “the overall magnitude of human genetic variation, the high rate of de novo mutation, the range of mutational mechanisms that disrupt gene function, and the complexity of biological processes underlying pathophysiology all predict a substantial role for rare severe mutations in complex human diseases.” Do humans have a high rate of de novo mutation? That’s news to me.

Unfortunately, the difficulties of associating common variants with complex disease are also faced by rare variants. Namely, picking out causal relationships among complex networks of interactions between many genes and environmental factors. The observation that few such relationships have been elucidated, if true, does not mean that we are looking at the wrong variants. An important fact that seems to have been overlooked by the authors it that the vast majority of human genetic variation *is* shared. From the dozen or so individual genomes published so far, it is clear that perhaps 10% of variants are novel; as databases like dbSNP continue to grow, this will shrink even further. I am reluctant to believe that this small fraction of “rare” mutations accounts for the numerous prevalent human diseases.

A Time to Sequence

Strangely, the emphasis on rare variation seems to indicate that the authors would make a strong case for sequencing. Yet the issue does not even come to light until the last 3/4 of a page in a section entitled “A Time to Sequence – With an appreciation to Maynard Olson.” Surely, I thought, they’ll wow us with the capabilities of next-generation sequencing technologies and their promise for studying complex disease. Not so. Instead, the authors vaguely hint that “new sequencing technologies provide conceptual and practical advantages over current approaches (Olson, 1995).” Why are they citing a fifteen-year-old article to support the advantages of new sequencing technologies? Where are the citations of landmark sequencing/WGS papers? The only citation related to NGS that I see is McKernan 2009, and you know how I feel about that one.

This ending is unfortunate, because sequencing ultimately will provide us with many of the answers. I’m tired of seeing Yet-Another-GWAS that concludes with a table of loci and p-values, or at most, a list of genes. Comprehensive, convincing studies of genetic association should have a strong sequencing component, in which the regions implicated by genotyping are exhaustively sequenced to identify all putative causal variants. Such variants could then be analyzed computationally and experimentally to characterize their effects on gene structure or regulation. Thus, I find myself reluctantly agreeing with King and McLellan on this point: genetic association is not enough.

References
McClellan J, & King MC (2010). Genetic heterogeneity in human disease. Cell, 141 (2), 210-7 PMID: 20403315