September 2011

Next-generation sequencing has immense transformative potential for medicine in the coming decade. Rapid, economical whole-genome sequencing can provide a wealth of information useful for diagnosis, treatment, and even prevention of disease. Very soon (if not already), generating whole-genome sequencing data will be routine. The challenges will lie in accurate variant calling, phasing, annotation, and clinical interpretation.

A new study in PLoS Genetics reports the whole-genome sequencing and detailed genetic risk assessment of a family quartet with a history of familial thrombophilia. There’s a lot to like about this paper, but let me give you the highlights.

Construction of and alignment to an ethnicity-specific major allele reference sequence yielded improved alignment and more accurate genotyping, especially at disease-associated loci.
Mendelian inheritance state analysis in the family structure enabled identification and removal of >90% of variants arising from sequencing errors.
Per-trio phasing, inheritance state of adjacent variants, and population-level linkage disequilibrium data were integrated to provide long-range phased haplotypes.
By fine-mapping recombination events to sub-kilobase resolution, the authors were able to perform sequence-based human lymphocyte antigen (HLA) typing.
A curated database of genotype-phenotype correlations made it possible to construct comprehensive genetic risk profiles, including multigenic risk of inherited thrombophilia, common disease susceptibility, and pharmacogenomics.

Advantages of an Ethnically-Concordant Reference Sequence

The human reference sequence is a composite, assembled using pooled sequence data from about 20 individuals. Several groups have reported that the current reference harbors a number of biases – some alleles represented are the minority of those present in world populations, and insertions are better represented than deletions. Using SNP genotype data from the 1,000 genomes project (~6-10m loci), the authors of this study developed three ethnicity-specific reference sequences for the CEU (Western Europe), YRI (Sub-saharan Africa), and CEU/JPT (Han Chinese / Tokyo Japanese) populations. They did so by determining the major allele in each population, and swapping it in when the NCBI reference base differed. This resulted in ~1.6 million substitutions for each population reference:

Credit: Dewey et al, PLoS Genetics 2011.

There were almost 800,000 positions where the reference allele was not the major allele in all three populations. Thus, at roughly 10% of SNP positions examined, the NCBI reference sequence contained a minor allele relative to European, African, and Asian populations.

Self-reported ethnicity of the parents in the quartet was northern/western European, a claim largely confirmed by PCA analysis. The authors therefore aligned all genomes to the CEU major allele reference, resulting in a small increase (0.1%) in the fraction of reads mapped by BWA. This seems like a small fraction, but it works out to around 6 million reads across the four samples. Presumably, more reads were mapped because the population-matched reference reduces allele-specific mapping bias (ASMB) against non-reference bases. Next, the authors compared variants to an internally-curated database of genotype-phenotype correlations, identifying 9,389 correlated variants in the family quartet. This number would have been 10,396 if the NCBI reference were used, indicating that 10% of disease-associated markers are in fact major population alleles less likely to contribute to inter-individual variation in disease susceptibility.

The ethnicity-matched reference also enabled a more accurate estimation of population mutation rate (7.8 x 10-4). Using the NCBI reference, this rate was 9.2 x 10-4, indicating that a standard reference sequence yields inflated population mutation rates.

Mendelian Inheritance and Long-Range Haplotyping

Whole-genome sequencing of a “nuclear” family (mother, father, son, daughter) has a number of advantages:

It enables comprehensive Mendelian inheritance analysis, to facilitate the removal of false-positive variants, isolate putative de novo mutations, and even identify regions of structural variation based on blocks of Mendelian inconsistencies.
Meiotic crossover sites can be comprehensively surveyed, in this case to sub-kilobase resolution.
Trio information (each child compared to both parents) helps to phase the variants, in other words, to determine which variants are on the paternal chromosome, and which are on the maternal chromosome. This is especially useful for identifying compound heterozygotes for recessive traits.
Paired with population linkage information from the HapMap and 1,000 Genomes Project, this information can be used to infer long-range haplotypes. On chromosome 6, the authors used haplotype and population information to accurately determine HLA genotypes for every sample.

The family information also made possible this fascinating mosaic of chromosomal inheritance:

Credit: Dewey et al, PLoS Genetics 2011.

There are obviously key benefits to having sequence data for everyone in the family. In the future, when clinical sequencing is commonplace, don’t forget to bring your parents along.

Synonymous But Not the Same

One downstream analysis that I particularly enjoyed was that of synonymous coding variants. These variants are often ignored in studies of human genetics, despite a growing body of evidence that they can have translational effects via codon usage bias, mRNA stability, and splice site alteration. The authors developed an algorithm to evaluate these effects for 186 rare, novel synonymous SNPs found in the family. One of these, in the gene ATP6V0A4, is predicted to significantly affect mRNA secondary structure by disrupting a stable “tetraloop” – likely reducing mRNA stability. This is relevant because homozygous loss-of-function variants in this gene have been associated with distal renal tubular acidosis (a disease in which the kidneys don’t remove enough acid into the urine).

Clinical Annotation and Interpretation

The authors build on their previous work to comprehensively annotate clinically-relevant variants in all family members. There’s an extensive amount of work done here, much of it hinging on the authors’ internally-developed, hand-curated database of 16,400 SNPs associated with disease traits. An analysis of rare variants bolstered with evolutionary conservation data highlighted variants in two genes related to thrombophilia: one in the F5 gene, encoding Leiden factor V, with increased risk for thrombophilia, and another in the MTHFR gene (love that gene symbol), which predisposes carriers to hyperhomocysteinemia.

Looking ahead to the probable treatment of family members with blood-thinning medication, the authors next undertook a pharmacogenetic analysis. Perhaps the best-known example of pharmacogenetics is warfarin (coumadin), an oral anticoagulant given to patients at risk for stroke or deep vein thrombosis (DVT). Warfarin was the fifth-most prescribed drug in the U.S. the last time I checked, but it has a narrow therapeutic window. Too little, and it has no anticoagulant effect. Too much, and it can cause internal bleeding. Variants in a number of genes have been associated with warfarin dosing, but two are predominant: CYP2C9, the primary metabolizing enzyme for the drug, and VKORC1, the drug target. In this family, all four members were homozygous for the CYP2C9*1 allele, associated with normal dose, but heterozygous for VKORC1-1639, associated with “therapeutic prolongation” of warfarin response at low doses. Based on these genotypes and patient clinical data, the authors applied the International Warfarin Dosing Algorithm to determine the appropriate dose.

All told, this is an interesting study that clearly involved a substantial amount of work (the pre-print PDF totaled more than 100 pages). Undoubtedly, many of the strategies presented here will be useful as whole-genome sequencing moves into the clinic.

References

Frederick E. Dewey, Rong Chen, Sergio P. Cordero, Kelly E. Ormond, Colleen Caleshu, Konrad J. Karczewski, Michelle Whirl-Carrillo, Matthew T. Wheeler, Joel T. Dudley, Jake K. Byrnes, Omar E. Cornejo, Joshua W. Knowles, Mark Woon, Katrin Sangkuhl, Li Gong,, Madeleine P. Ball, Alexander W. Zaranek, Heidi L. Rehm, George M. Church, John S. West, Carlos D. Bustamante, Michael Snyder, Russ B. Altman, Teri E. Klein, Atul J. Butte, & Euan A. Ashley (2011). Phased whole genome genetic risk in a family quartet using a major allele reference sequence PLoS Genetics, 7 (9)

In 2000, Hanahan and Weinberg published a landmark article in which they described the “hallmarks of cancer” – six biological capabilities acquired during the multi-step development of human tumors. It went on to become the most-cited Cell article of all time. In a follow-up article this year, the authors revisit their conceptual framework for cancer biology, incorporating the remarkable progress in cancer research that was made over the last decade.

The authors conclude that their six hallmarks – sustained proliferative signaling, evading growth suppression, resisting cell death, replicative immortality, induction of angiogenesis, and invasion/metastasis – continue to provide a useful conceptual framework for understanding the biology of cancer. Further, they present two new hallmarks – reprogramming of energy metabolism and evasion of immune destruction – that have emerged as critical capabilities of cancer cells.

I like to think that these new additions were perhaps inspired by two articles on Massgenomics earlier this year – Cancer versus the Metabolism (1/11/2011) and Cancer versus the Immune System (1/21/2011) – which I posted a couple of months before the new Cell paper. Clearly, these two concepts are drawing substantial attention from researchers and clinicians. Hanahan and Weinberg also elaborate on two “enabling characteristics” – properties of neoplastic cells that facilitate acquisition of hallmark capabilities. These include genome instability/mutation, which they discussed previously, and tumor-promoting inflammation, mediated by immune system cells that are recruited to the site of a developing tumor.

Credit: Hanahan and Weinberg, Cell, 2011

The Metabolic Switch: Aerobic Glycolysis in Cancer Cells

It was Otto Warburg who first observed that cancer cells seem to favor glycolysis as a metabolic program over mitochondrial oxidative phosphorylation. This preference is normal in oxygen-poor environments. However, Warburg observed that tumors prefer glycolysis even in the presence of oxygen (aerobic glycolysis). This “metabolic switch” seems counter-intuitive, as the efficiency of glycolysis is 18-fold lower that oxidative phosphorylation. Cancer cells compensate for this, at least in part, by up-regulating glucose receptors (GLUT1) to import more glucose into the cytoplasm. Indeed, increased uptake and utilization of glucose has been reported for many human tumors.

The functional rationale for this metabolic switch has not yet been elucidated. Preferential glycolysis has been associated with activated oncogenes and loss of tumor suppressors, both of which confer other hallmark capabilities on cancer cells. One possible explanation for the switch is that it enables diversion of glycolytic intermediates into other pathways, such as those responsible for synthesizing amino acids, nucleosides, and macromolecules. You know, the things necessary for making new cancer cells.

Intriguingly, some tumors have two subpopulations of cells with different energy metabolism programs. One subpopulation exhibits the Warburg effect, favoring glycolysis and generating lactate along with those useful intermediates. The other subpopulation preferentially imports lactate, using it as the main energy source by harnessing part of the citric acid cycle. This apparently symbiotic relationship within tumors has provocative implications for the study and treatment of human cancers. It’s particularly because components of the citric acid cycle, specifically isocitrate dehydrogenases 1 and 2 (IDH1 and IDH2) have recently emerged as oncogenes that are recurrently mutated in gliomas and leukemias.

Immunoevasion: Avoiding Immune Destruction

In recent years, a substantial body of evidence from clinical epidemiology and mouse models has revealed that the immune system presents a significant barrier to tumor formation and progression. In immunodeficient mice, for example, carcinogen-induced tumors develop more often and progress more rapidly than in wildtype mice. Further, depleting either NK cells or T-cells in mice led to increased tumor incidence, suggesting that both innate and adapative immunity contribute to immune surveillance.

Another important aspect of the relationship between immunity and cancer is the concept of “immunoediting”. In mouse models, carcinogen-induced tumors that arise in immunodeficient mice, when transplanted to wild-type mice, are usually eliminated by the intact immune system of the new host. In contrast, tumors induced in wild-type mice often grow when transplanted to other wild-type mice. Presumably, tumors induced in immunodeficient mice are highly immunogenic, and therefore easily identified and removed by a healthy immune system. However, tumors induced in wild-type mice have arisen despite an intact immune system. The selective pressure of the competent immune system “edits” the tumor by selecting for cells that can avoid immune destruction. Thus, by the time they are macroscopic, these tumors are poorly immunogenic, and more resistant to immune-mediated destruction.

Inflammation as an Enabling Characteristic

For years, pathologists have observed that most tumors are infiltrated by host immune cells, presumably ones that are attempting to destroy them. However, we now know that tumors actively recruit certain cells to aid in their growth and progression. Specifically, inflammatory cells of the innate immune system have been shown definitively to have tumor-promoting activity. In healthy tissues, inflammation serves a number of critical functions – fighting infections, wound healing, repair of damaged tissue and cells. To accomplish these duties, inflammatory cells produce an array of biochemicals that can benefit tumor growth, notably growth factors, survival factors, and matrix-modification enzymes. Furthermore, inflammatory cells can release mutagenic chemicals, such as reactive oxygen species, that increase the mutation rates in tumor cells, further accelerating their evolution towards more aggressive growth.

Additional evidence supporting the tumor-promoting role of inflammation is the observation that individuals with chronic inflammation are more susceptible to cancers at the site of that inflammation – e.g. patients with Crohn’s disease show increased incidence of colorectal cancer. Indeed, it is now evident that inflammation is present even at the very early stages of some tumors, and capable of promoting their development into full-blown cancer.

The Next Decade

The ten years since Hanahan and Weinberg’s seminal article have seen remarkable progress in cancer research. Notably, the six hallmarks of cancer were further supported and refined, was was the enabling characteristic of genome instability. Two new emerging principles (reprogramming cellular energetics and avoiding immune destruction) became evident, as did a second enabling characteristic (inflammation). There’s a lot more in the current review that I didn’t touch on, such as the tumor-promoting activities of non-tumor cells (e.g. fibroblasts, pericytes) and the importance of cancer stem cells (CSCs).

In coming years, thousands of tumors will be characterized by ever-more high-throughput technologies, such as massively parallel sequencing. Collecting the data is no longer the obstacle; instead, the true challenges lie in analysis and interpretation. Hanahan and Weinberg humbly describe their hallmarks as “organizing principles” for thinking about why cancer cells do what they do. Conceivably, fitting new catalogues of genetic alterations to this model of acquired capabilities will help us better understand the relationship between genotype (genetic susceptibility and somatic mutation) and phenotype (tumor development, growth, and metastasis).

References

Hanahan D, & Weinberg RA (2011). Hallmarks of cancer: the next generation. Cell, 144 (5), 646-74 PMID: 21376230

Whole-genome sequencing and clinical annotation

Advantages of an Ethnically-Concordant Reference Sequence

Mendelian Inheritance and Long-Range Haplotyping

Synonymous But Not the Same

Clinical Annotation and Interpretation

Emerging Hallmarks of Cancer

The Metabolic Switch: Aerobic Glycolysis in Cancer Cells

Immunoevasion: Avoiding Immune Destruction

Inflammation as an Enabling Characteristic

The Next Decade

Archives for September 2011

Advantages of an Ethnically-Concordant Reference Sequence

Mendelian Inheritance and Long-Range Haplotyping

Synonymous But Not the Same

Clinical Annotation and Interpretation

The Metabolic Switch: Aerobic Glycolysis in Cancer Cells

Immunoevasion: Avoiding Immune Destruction

Inflammation as an Enabling Characteristic

The Next Decade