Friday, September 30, 2011

1000 Genomes, Clan Genomics and Cancer Biomarkers

The publication of human genome blueprint in 2000 was a great block party, summer Olympics coming to town, but the next morning saw most revelers picking up their drunken selves and going back to their old day jobs.  Some kept the flames burning by taking up more and more sequencing of whatever came their way labeled as a model organism.  The human genome blueprint was like a great idea which won a patent but still needed a lot of development work to morph into a cool product.

Post-human genome project

The launching of the 1000 Genomes Project in January 2008 was a true effort to translate the human genome blueprint into something that can really impact clinical practice and health care on a national scale.  The 1000 Genome Project was designed to sequence and catalog genetic variations from different ethnic groups across the globe.  The results of the pilot phase were published in Nature last year.

A related project is called genome-wide association studies (GWAS) which evolved from its precursor Haplotype Mapping (HapMap) project.  The basis of this project was common disease/common variants (CDCV) hypothesis which states that a combination of different alleles are over-represented in a particular disease.  The goal is to find association of numerous single-nucleotide polymorphisms (SNPs) or other chromosomal variants with diseases.  This first such association was reported in 2005 between age-related  macular degeneration and CFH gene variants.

Clan Genomics

Richard A. Gibbs of Baylor College of Medicine, and his colleagues from Baylor and U. Texas at Houston, published an essay in the journal Cell calling for "clan genomics" approach to address the issue of integrating the tsunami of genetic variants coming through the hose of 1000 Genome and other projects.

The clan genomics model is built on the following ideas: individual variants are of less significance; the combination of all variants determine phenotype; recently acquired rare variants are more significant; common variants in a clan (extended family or ethnic group) are of less significance, but influence the effect of recent rare variantsthe phenotype of high penetrance variants can only be understood in the context of an ecosystem of all genetic variants. 
Genomic variants include,
  • Small nucleotide variants (SNV)human genome has an average of 3.5 million SNPs
  • Short insertions or deletions (indels)
  • Structural variants
  • Copy number variants (CNV)human genome has ~100 CNVs (>500 bp) which is an underestimate.  The current technology cannot adequately detect CNVs in the 100-500 bp range.
(Clan Genomics. From Lupski et DOI:10.1016/j.cell.2011.09.008)

Cancer drug discovery is benefiting from a combination of whole genome sequencing,
1000 Genome Project and other cancer specific projects, such as genotyping arrays for genome-wide detection of indels, and genome-wide expression studies.  The Cancer Genome Atlas and the International Cancer Genome Consortium plan to study 500 tumours per cancer type.

(From Lander 2011: Cancer genome maps.  The left panel shows an image of colon cancer (Wellcome Trust). The right panel shows the genome of a colon cancer sample (Broad Institute), including interchromosomal translocations (purple), intrachromosomal translocations (green) and amplifications and deletions (red and blue, on the inner ring). Individual nucleotide mutations are not shown.)

Eric Lander's February 2011 review in Nature lists several notable advances in cancer biomarkers which were identified using genome-wide analyses:

  • Discovery of 150 genes with copy number alterations in a genomic study of 3000 tumors across 26 cancers.  Only one-quarter of these genes are known cancer genes. 
  • Amplifications of a new class of lineage-specific transcription factors required for survival: MITF in melanoma, NKX2.1 in lung cancer and SOX2 in esophageal cancer. 
  • Deletions in PAX5, IKZF1 and other regulators of lymphocyte differentiation in pediatric acute lymphoblastic leukemia. 
  • Translocations involving one of several ETS transcription factors in 50% of prostate tumors, and ALK in lung cancer—these findings overturn the dogma that tranlocations are only found in leukemias and not epithelial solid cancers. 
  • Genomic amplification on IKBKE in breast cancer, CDK8 in colorectal cancer and the nuclear export protein XPO4 in hepatocellular cancer.
  • Exome-wide and genome-wide sequencing of acute myelogenous leukemia and clear-cell ovarian
    cancer samples identified recurrent mutations in DNMT3A and ARID1A, respectively.  These mutations suggest epigenomic dysregulation. 
  • Also by genome-wide sequencing of multiple myeloma samples: discovery of mutations in DIS3 and FAM46C (genes involved in protein translation and homeostasis; not previously implicated in cancer) as well as NF-kB activation.
  • **Genomically characterized cell lines were used be identify ‘synthetic lethals’—that is, genes essential only in the presence of a particular recurrent cancer mutations—such as PLK1 , STK33 and TBK1, which are oncogenic only in the presence of KRAS mutations.

My crystal ball shows a technology just like my car.  Whenever, my car thinks that it's in distress, it flashes the "check engine" light.  I have absolutely no clue what's under the hood--my dealer takes a long cable, hooks my car to a machine which spits out codes and diagnosis.

Yes, before the next decade, don't be surprised if an oncologist sends out a sample which gets sequenced and queried by a black box which spits out prognostic and diagnostic output, a clinically useful score, based on the biomarkers and variant information we are putting in today.  We are heading to the era of black box medicine.  And, I'm not scared; I am excited!

  • A map of human genome variation from population-scale sequencing. The 1000 Genomes Project Consortium.  Nature. 2010 Oct 28; 467:1061–73 | Free Full Text | DOI | PubMed |
  • Initial impact of the sequencing of the human genome. Lander ES. Nature. 2011 Feb 10;470;187–97 | PDF | DOI | PubMed | 
  • Clan Genomics and the Complex Architecture of Human Disease. Lupski JR, Belmont JW, Boerwinkle W, Gibbs RA. Cell. 2011 Sep 30;147(1):32-43 | Free Full Text | DOI | Lander ES (2011). Initial impact of the sequencing of the human genome. Nature, 470 (7333), 187-97 PMID: 21307931


  1. Very interesting article.

    thanks for sharing it.

  2. An update on Cancer Genome mapping was published a few days ago in the journal "Nature." Read summary at Common Mutations Drive 12 Different Cancer Types on this blog.