Advanced Sequencing Technologies: methods and goals

Sunday, 8 April 2012
Posted by Crystal

Hybridization sequencing.  There are several efforts to develop Sequencing By Hybridization (SBH) into a robust and genome-scale sequencing method.  One approach is to immobilize the DNA to be sequenced on a membrane or glass chip, and then perform serial hybridizations with short probe oligonucleotides (e.g. 7-mers).  The extent to which specific probes bind can be used to decode the sequence.  The strategy has been applied to both genome resequencing and de novo sequencing33,34.  Affymetrix and Perlegen have pioneered a different approach to SBH by hybridizing sample DNA to microfabricated arrays of immobilized oligonucleotide probes.  The current maximum density of Affymetrix arrays is about one oligonucleotide "feature" per 5 micron square; each feature contains approximately 100,000 copies of a defined 25 base pair oligonucleotide.  For each base pair of a reference genome to be resequenced, there are four features on the chip.  The middle base pair of these four features is either an “A”,”C”,”G”, or “T”.  The sequence that flanks the variable middle base is identical for all four features and matches the reference sequence.  By hybridizing labeled sample DNA to the chip and determining which of the four features yields the strongest signal for each base pair in the reference sequence, a DNA sample can be rapidly resequenced.  This approach to genome resequencing was first commercialized in the Affymetrix HIV chip in 199535.  Miniaturization, bioinformatics, and the availability of a reference human genome sequence permitted Perlegen to greatly extend this approach and develop an oligonucleotide array for resequencing of human chromosome 2136.   

This technology possesses a unique set of advantages and unique challenges.  The experiments impressively apply sequencing-by-hybridization (SBH) to obtain a non-trivial amount of sequence from multiple distinct chromosomes (> 109 bases).  Although specific numbers on “bases per second” are not provided, the method of data-collection imaging, via scanning fluorescence of target DNA hybridized to a wafer-array of probe sequences, seems compatible with the necessary throughput.  Read-length requirements are entirely avoided, as probes designed to query specific genomic bases are synthesized at defined positions.  The primary challenges that SBH will face is designing probes or strategies that avoid cross-hybridization of probe to the incorrect targets due to repetitive elements or chance similarities.  These factors render a substantial fraction of Chromosome 21 (30-60%) inaccessible36, and may also contribute to the 3% false-positive SNP detection rate.  It is also worth noting that sequencing-by-hybridization does not escape sample preparation steps, as the relevant fraction of the genome must be PCR-amplified prior to hybridization.  In the near-term, SBH may have the greatest potential as a technology to query the genotype of a focused set of genomic positions; for example, the ~10 million "common" SNPs in the human population37,38.

Cyclic array sequencing (Pyrosequencing; FISSEQ; MPSS).  Key unifying features of these approaches, including multiplexing in space and time and the avoidance of bacterial clones, emerged as early as 198439.  Although early methods in this class led to the first commercially sold genome40, a dependence on electrophoresis ultimately proved limiting.  Cyclic sequencing methods that have developed since have been non-electrophoretic.  In both FISSEQ and Pyrosequencing, progression through the sequencing reaction is externally controlled by stepwise (i.e. cyclical), polymerase-driven addition of a single type of nucleotide triphosphate to an array of amplified, primed templates.  Pyrosequencing, introduced in 1996, detects extension via the luciferase-based real-time monitoring of pyrophosphate release41,42.  In FISSEQ (fluorescent in situ sequencing), extensions are detected off-line (i.e. not real-time) via fluorescent groups reversibly coupled to deoxynucleotides43.  In both cases, repeated cycles of nucleotide extension are used to progressively infer the sequence of individual array features (based on patterns of extension / non-extension over the course of many cycles).  We note that both FISSEQ and Pyrosequencing have previously been classified as “sequencing-by-synthesis” methods.  However, as nearly all of the methods reviewed here have critical “synthesis” steps, we choose to emphasize “cycling” as the distinguishing feature of this class. 

A third method in this class is based not on cycles of polymerase extension, but instead on cycles of restriction digestion and ligation.  In Massively Parallel Signature Sequencing (MPSS), array features are sequenced at each cycle by employing a Type IIs restriction enzyme to cleave within a target sequence, leaving a four base-pair overhang.  Sequence-specific ligation of a fluorescent linker is then used to query the identity of the overhang.  The accuracy is quite high and the achievable 16 to 20 base-pair read-lengths (i.e. 4 to 5 cycles) are adequate for many purposes44.

An additional uniting feature of these methods, one that distinguishes them from several of the single-molecule projects discussed below, is that all rely on some method of isolated, i.e. clonal, amplification.  After amplification, each feature to be sequenced contains thousands to millions of copies of an identical DNA molecule (thus clonal), but features must be spatially distinguishable.  The amplification is necessary to achieve sufficient signal for detection.  Although the method for clonal amplification is generally independent of the method for cyclic sequencing, all groups seem to have taken different (and creative) routes.  In scaling up Pyrosequencing, 454 Corp. employed a PicoTiter plate to simultaneously perform hundreds of thousands of picoliter volume PCR reactions45.  This was recently applied to the resequencing of the adenovirus genome, but cost and accuracy estimates for this project are not available46.  For FISSEQ, clonal amplification was achieved via the polony technology, in which PCR is performed in situ within an acrylamide gel47.  Because the acrylamide restricts the diffusion of the DNA, each single molecule included in the reaction produces a spatially distinct micron-scale colony of DNA (a polony) which can be independently sequenced48.  For MPSS, each single molecule of DNA in a library is labeled with a unique oligonucleotide tag.  After PCR amplification of the library mixture, a proprietary set of paramagnetic “capture beads” (with each bead bearing an oligonucleotide complimentary to one of the unique oligonucleotide tags) is used to separate out identical PCR products.  The Vogelstein group recently developed BEAM, a fourth method for achieving clonal amplification that has great potential49. 

It is worth emphasizing that in the above implementations of cyclic array sequencing, the methods developed for amplification and sequencing are potentially independent.  It is therefore interesting to contemplate possibilities for mixing and matching.  For example, one could imagine signature-sequencing polonies, or Pyrosequencing DNA-loaded paramagnetic beads. 

The success or failure of these methods to achieve ULCS will depend on a variety of factors.  Pyrosequencing is close to the required read-lengths, while FISSEQ has only been demonstrated to 5 to 8 base-pairs.  Methods that rely on real-time monitoring or manufactured arrays of wells may be difficult to multiplex and miniaturize to the required scale.  Crucially, both Pyrosequencing and FISSEQ-based methods must contend with discerning the lengths of homopolymeric sequences (i.e. consecutive runs of the same base).  Although Pyrosequencing has made significant progress in tackling this challenge via signal quantification, the best answer may lie in development of reversible terminators (defined as a nucleotide that terminates polymerase extension, e.g. through modification of the 3’ hydroxyl group, but is designed in such a way that the termination-properties can be chemically or enzymatically reversed).  Reversible terminators would also be required for any system in which all four dNTPs (labeled with different fluorophores) could be used simultaneously.  As development of reversible terminators with the necessary properties has proven to be a non-trivial problem50,51, recent progress by several groups (see below) is quite exciting.

0 comments: