RSS 2.0
  • Home
  • About
  • Aligners
  • Genomes
  • VarScan
  •  

    Outsourced Sequencing and Analysis

    May 21st, 2010

    A company in Malaysia is offering to map whole-genome sequencing data and call variants in one week’s time for $4,000.

    I readily admit that I have not taken sequencing-as-a-service companies very seriously. The idea of sending precious samples off to a third party and getting back the sequence and variants doesn’t appeal to me for a number of reasons. Outsourcing just the analysis of sequence data is even more anathema. Why would anyone want to do that? Analysis is the best part! Then again, I’m fairly biased in this matter because (1) I work at a major genome center with significant in-house sequencing resources, and (2) sequence analysis and variant detection are among my job responsibilities. Obviously I don’t want those to go away.

    That said, there seems to be a growing interest in outsourcing sequencing and/or analysis in the wider research community. Complete Genomics had a strong presence at Marco Island this year, and has a growing customer list that includes (perhaps surprisingly) at least two genome centers. Beijing Genomics Institute (BGI) announced a purchase of 128 Illumina HiSeq2000 instruments in January; a month later in Science magazine I saw a full-page ad indicating that they’re open for business as a sequencing provider. No big deal, they’re half a world away, right? So I thought, until I heard whispers of a BGI facility in San Francisco.

    Second and third-generation sequencing technologies are bringing about volatile changes in the fields of genetics and genomics. Throughput continues to skyrocket, while the costs of sequencing plummet.  It’s now possible to sequence a complete human or mammalian genome to high coverage on a single instrument run at ~$20,000. This has had two effects on the research community:

    1. Genomes abound. At least a dozen individual human genomes have been published, but NGS technologies are being applied to a wide range of studies – exomes, transcriptomes, model organisms, you name it.
    2. Everyone wants to sequence. Thanks to a lot of press and some high-profile publications, massively parallel sequencing is known to every corner of the biomedical research world. Suddenly every clinician with a patient cohort wants in, because if they don’t find the disease-causing genes, someone else will.
    3. Not everyone can buy an NGS instrument. Commercially-available sequencers currently cost a quarter to a half million dollars or more each, which is a significant purchase even for labs flush with ARRA funding. This means that a lot of small labs will not be looking to buy a machine, but rather to rent space from someone who has one. Music, no doubt, to the ears of BGI and Complete Genomics.

    One thing is clear. These new sequencers and service providers are going to put high-throughput sequencing into the hands of many investigators. Investigators, I might add, who likely have never dealt with NGS data. I think that’s potentially very exciting, and I hope that the experiences of major genome centers will help newcomers address the challenges of massively parallel sequencing.

    AddThis Social Bookmark Button

    AGBT 2010: First Impressions

    February 25th, 2010
    p_00010

    Only in Florida: Jellyfish Aquarium

    I’m in the midst of my first full day at Marco Island. More than any other meeting that I’ve attended, AGBT has a remarkable corporate presence. Life Technologies seems to be the biggest sponsor; you can’t look anywhere without seeing a banner that promotes the new SOLiD4 system. Apparently I’m doing a poor job at keeping up with SOLiD, as I’d only just heard about SOLiD3. I spoke to Richard Gibbs at a coffee break, and he mentioned that SOLiD4 is an upgrade, not a new machine. Must be nice.

    Caliper Life Sciences, a maker of microfluidics equipment for next-generation sequencing, won favor with many attendees by hanging chocolate “chips” (mini bars) on the doorknobs of every AGBT attendee’s room in the hotel to promote their recently-launched LabChip XT. I learned of this company only a week or so ago, when my colleague Vince Magrini was named to their scientific advisory board.

    PacBio Instrument Unveiled

    Pacific Biosciences unveiled their coveted SMRT sequencing instrument last night in a small, invitation-only event in their suite. Sadly, I wasn’t invited, but I’m told the guest list was very exclusive. Most likely it was restricted to directors from the ten initial PacBio customers that were announced last week. Tonight, PacBio hosts a roundtable called Global Challenges, Genomic Solutions that will be moderated by Charlie Rose.

    Other Players in the Field

    This morning at breakfast, Agilent Technologies was trading SureSelect T-shirts for surveys that assessed respondents’ interest in exome capture, which (thus far) seems to be the recurrent hot topic at AGBT. Things have been quiet from some of the other large sponsors, including Illumina, Complete Genomics, Roche, and others. I’m sure that their hour of glory will come soon enough.

    AddThis Social Bookmark Button

    NGS Informatics: Hail to the Chief

    September 17th, 2009

    Bio-IT World’s Kevin Davies has a nice interview with David Dooling, who heads informatics here at the Genome Center and still finds time for his PolITiGenomics blog.  Dooling joined the center in 2001, as the Human Genome Project was wrapping up.  Now, he oversees about half of our informatics group – including IT personnel as well as the developers of our LIMS and automated data pipelines.

    All three groups, now that I think about it, have had to address significant challenges during our transition to a next-generation sequencing center.  Our LIMS deals with tens of millions of transactions per month, with a back-end database whose tables sometimes have billions of records.  Our automated pipeline (or APIPE) group develops all of the data pipelines that make whole-genome sequencing feasible – primary data analysis, alignment, coverage reporting, mutation detection, etc.  And the IT group must address the exponentially growing needs of data transfer and compute time for all of it – not an easy job.

    Despite these monumental tasks, under the leadership of David and others we’re currently “on a good path” to handle the current generation of sequencing tools.  Of course, that may change in the next couple of years, when technologies like Pac Bio’s SMRT platform begin cranking out single-molecule sequences 1,000 bases long or longer.

    In-House and Open Source

    Bio-IT World is heavily read by providers of commercial informatics tools, and this is reflected somewhat in the interview.  Davies often asks whether we’re working with any specific vendors, or considering any commercial tools.  Often enough we are – certainly for storage and data transfer systems, things that can’t be built from the ground up.  Yet whenever possible, we opt for the open-source solution.  Every workstation here, for example, is Linux.  We have but one Windows PC, and it’s not allowed to connect to the internet.  Most of our LIMS system and many of our in-house tools were written in Perl.

    A Tough Nut for Commercial Vendors

    There are, of course, commercial alternatives to anything.  Yet vendors face significant hurdles in marketing products to large genome centers.  The tools that we use are often highly customized, and must continually evolve to address new technological developments.  Take aligners for example.  In the early days of Illumina sequencing, we licensed some commercial software – SLIMsearch and SXOG, for example – because there simply were no good alternatives to ELAND.  Then Maq came along, offering better functionality and performance in a free and open source program (offered, no less, by our trusted friends across the pond).  Exorbitantly priced licenses, needless to say, were quickly not renewed.

    Now there are numerous commercial solutions, and we’re often wooed by companies like CLC bio.  Yet for every commercial aligner there’s half a dozen free/open-source alternatives, developed by academic groups that we respect and trust (Maq/BWA from Sanger, Bowtie from UMD, etc.), and many of these tools are pretty damn good.  A commercial option would have to be so incredible, so vastly superior to what’s currently available for us to consider a paid license.  With Bowtie and BWA mapping lanes of 15 million reads in just a couple of hours, the bar is already set pretty high.

    Outsourcing Sequencing?

    David offers, I think, a polite response to the question of whether we’d ever outsource our sequencing to a third party.  Personally, I can offer two reasons why this will probably never happen.  First, we’re already pretty happy with Illumina, a platform that can deliver whole human genomes at high coverage in just a few weeks.  All available evidence suggests that throughput will only continue to grow, and before long I expect we’ll be doing a genome on a single flowcell or less.  Of course, cost is a consideration (Illumina runs aren’t cheap).  It’s very possible that a company like Complete Genomics might be able to offer similar yields at a substantially reduced cost.  We do use companies like IDT and Agilent, for example, to synthesize oligo sequences that we might make in house.  They can make them cheaper, and faster, than we can.

    There is a second, and perhaps more compelling reason to keep sequencing in-house – because we’re in the business of research, and data is precious.  With our current capacity we can track the progress of sequencing runs in real-time, monitor error rates and alignment rates, and assess results the moment data is off of machines.  We maintain a forensics-lab-like “chain of custody” on the data from start to finish.  Doing so offers a certain sense of security, and confidence, when we use the results to tackle some of the most fundamental questions in biology.

    AddThis Social Bookmark Button

    Help Wanted at the Genome Center

    October 2nd, 2008

    The WashU Genome Center is hiring! Well, they’re almost always hiring, but one of the current open positions is in my group.  So I thought I’d put the word out here on Massgenomics.

    The basic requirements of the staff scientist position are outlined on the GC web site. We’re looking for someone with a degree (preferably graduate degree) and 4+ years of experience in computer science, biology, or a similar field. This person must have solid programming abilities, ideally in Perl. Most of these guidelines apply to just about any non-laboratory position at the GC, so they’re not terribly informative. Since we’re hiring someone in my group, however, I can probably offer some advice about what we’re looking for.

    Our work centers around analysis. We develop, test, and apply algorithms for sequence analysis, mutation detection, and similar tasks. We work on several projects concurrently. As one of the big three genome centers in the U.S., we play a significant role in major initiatives like the Tumor Sequencing Project (TSP), the Cancer Genome Atlas (TCGA), and the 1000 genomes project. Our analysis pipeline for traditional capillary-based resequencing is largely in place, so the focus is on next-gen technologies (Roche/454, Illumina/Solexa, ABI/Solid).

    For this position, programming abilities are not the only requirement. Simply put, we’re looking for a scientist. This means that the strong candidate will have all three of the following:

    1. Technical skills. Experience with multiple programming languages including Perl. The experts here will test you and probably ask for some code samples. Familiarity with common bio-informatics tools like BLAST, BLAT, BioPerl, etc.
    2. Scientific rigor. Your CV should list publications in peer-reviewed journals, scientific meetings attended (with talks/posters given), etc. Be prepared to talk about them, and if things go well, to give a brief talk.
    3. An interest in biology. This will come across both in the CV and in the interviews. We’re looking for someone who’s self-motivated and passionate about biological questions.

    There are some important realities about working in academia.  First, it usually won’t make you a millionaire.  Genome Technology’s annual salary survey will tell you that salaries are almost always higher in the private sector.  However, pay at the GC is very competitive for an academic setting, and the benefits are excellent.  Second, we’re proponents of open source and open access. I hope you know your way around Linux, because we only have one Windows workstation and it’s not allowed to access the internet.

    Why Bother?

    It’s just my opinion, but I find this a pretty exciting and rewarding place to work. The GC early access to lots of new cutting-edge technologies. WashU consistently ranks in the top 5 for medical schools and the top 10 for biomedical research. I like to think that we tackle some of the biggest problems in biology and human genetics. If you’re interested, the instructions for applying are on the employment opportunities page.

    AddThis Social Bookmark Button