Question

Forum:Commercial All-In-One Versus Custom Pipeline For Clinical Applications

1

Entering edit mode

10.3 years ago

DG 7.3k

There have been a couple of older posts similar to this topic, although they are older (over 2 years) and lacking in any real comparisons. I am hoping that given how much NGS sequencing has matured in the last few years, in particular clinical applications, some people may have some more concrete experience and info to provide. I am currently part of a working group for a regional hospital that is looking to implement NGS-based testing in their molecular diagnostics laboratory. Will be starting with a bench-top sequencer (either MiSeq or Ion Torrent most likely) and will be dual use (research/clinical). Most clinical applications will likely be in oncology at least initially. As the local bioinformatics expert for human NGS applications I am evaluating what will fit their needs best. I believe there is support for bioinformatics staffing as part of the setup. As a bioinformatician I am leary of unpublished algorithms and black boxes. While commercial packages like CLC Workbench and Nextgene are very easy to use I don't like not knowing what is going on with my data.

Does anyone have any direct experience comparing the results of packages like Nextgene for instance with pipelines using published open-source software? I think open source solutions like bcbio-nextgene are particularly suited for this type of thing. Especially if thinking with long-term scalability and growth in mind.

annotation mutation alignment variant-calling next-gen • 2.5k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 10.3 years ago by DG 7.3k

score 0 · Answer 1 · 2013-12-28

It depends - for variant calling, I would probably go open-source all the way. This is what I do for exome-capture data:

BWA alignmment --> Picard remove duplicates (and targeted sequencing QC stats)--> samtools reformat (sort .bam files, create pileup) and QC stats --> VarScan variant calls --> ANNOVAR annotations.

I like CoNIFER best for copy number calls (but I don't think it captures everything) and I haven't been very satisfied with any structural variant callers. So, I would probably only return SNPs and small indels back to patients (in addition to raw data, if they want to explore on their own).

The one thing I like better in CLC Bio is the de novo assembly algorithm. Currently, no published paper that I can point you towards, but I can tell you it is fast and I have liked the contigs the best (even compared to algorithms specifically designed for RNA-Seq - namely, Trinity and Oases / Velvet). I had a viral assembly algorithm that used SSAKE that I liked better, but it was an entire pipeline optimized specifically for herpesvirus assembly:

http://genomics-pubs.princeton.edu/prv/

However, I assume de novo assembly won't be too important for most clinical applications.

I've also analyzed both Illumina (MiSeq and HiSeq) and Proton data, and I would definitely recommend sticking with Illumina. Proton data has more problems.

However, it may be worth noting that this is for research purposes. You may want something else to use for a clinical application. Not much experience with testing tools in this area,but I did notice this published relatively recently.

http://www.ncbi.nlm.nih.gov/pubmed/24220144