Analysing Multiple Exome Datasets
1
4
Entering edit mode
12.9 years ago
User 9126 ▴ 50

Dear All, We have whole exome sequencing data of ten samples from illumina. While I was looking for a pipeline to analyse this data i found a nice thread http://biostar.stackexchange.com/questions/1269/what-is-the-best-pipeline-for-human-whole-exome-sequencing

But, I am little bit confused in using this pipeline for all 10 samples together.

Should I first align each file to the genome independently,combine as single bam file and proceed further.?

or should I need to process every file independently through all these steps? If so what should I do finally to understand the result out of all 10 samples?

Thanks

Santhosh

exome sequencing • 2.8k views
ADD COMMENT
5
Entering edit mode
12.9 years ago

There are (at least) two places in processing and analyzing exome data that benefit from borrowing information from other samples. The first is when aligning around indels since one may borrow information form reads in all samples when searching for support for indels. The second is when calling variants; several variant callers allow one to specify multiple BAMs when calling variants to capitalize on all information when calling variants. That said, in our group, we process all samples pretty much independently and only combine for variant calling where having one large BAM file is not necessary.

At the end of the day, though, it pays to really think about your study and study design when deciding how to proceed with an analysis. While most people talk about a "pipeline" for exome sequencing, it is easy to define situations (related individuals, for example) where a "general pipeline" is not optimal.

ADD COMMENT

Login before adding your answer.

Traffic: 2203 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6