We will be getting data from human whole exome sequencing done on Illumina GAIIx. I am planning to put together a pipeline to align (BWA?)(optional as we may get the sequence already aligned)|variant detection (Samtools?)|variant classification-deleteriousness prediction(something like SIFT/PolyPhen)|. I know that there are various options out there but I would like to know based on your experiences what the best collection of these tools is? If you had the chance to start from scratch what kind of a pipeline would you put together? I also know that big sequencing centers have their in house tool sets for these for squeezing the last drop but what is available to the general public is what I am looking for.
Thanks
You seem to have listed all the essential components of an exome sequencing pipeline. Exome sequencing is not yet sufficiently well-established to have a single "best-practice" pipeline available. It's still in the roll-your-own stage. You're going to have to experiment with the options for each component (aligner, SNP-caller, functional annotator, etc) to see which give the best results. You'll probably have to write a lot of glue to make the components fit together.
Thanks for the answer. I do realize that there is no single best practice pipeline but I was hoping people might chip in with their own combinations of "this works best for me" collections. I think I 'll just have to get the answer myself at this point.
This is really wonderful thread.
I am little bit confused in using this pipeline if i have say 10 samples.
Should I first align all files to the genome independently,combine as single bam file and proceed further.? or should I need to process every file independently through all these steps?
Thanks Santhosh
This is not a discussion forum. Questions like yours usually go as a separate item on this site. Link to this question when asking your question.
Since I was very unsatisfied with confirming variants produced by CASAVA, I tried to setup this pipeline described abouve by Quigley and L for our HiSeq/SeqCapEZ exome data.
After a long weekend and some hassle with incompatible Java versions and a reference concatenated (hGRC37, but copied together in a non-karyotypic way) in the wrong way, I now make it to the 'samtools calmd...' part.
But all I get back is the error message 'Floating point exception', which I have no idea how to deal with. I feel short of going beserk. Anyone seen this before that can tell me how to deal with it?
Best regards,
Michael Gombert
Michael, this is not a discussion thread, although it may appear that way, if you have a question to ask please post it as a new question so it can be answered and archived to help people in the future