Wgs: Variant Calling Per Chr Or Region?
1
1
Entering edit mode
11.0 years ago
mathiasf52 ▴ 20

Hello,

I have a whole human genome sequence data from illumina and I am developing the pipeline to analyze it. I would like to call variants with samtools mpileup. To speed up this step I just wondering if i should call variants per chromosome or I divide the chromosomes in different region and then call the variants? which one is faster or convenient?

Thank you Jean

mpileup • 3.2k views
ADD COMMENT
1
Entering edit mode

You should consider unified genotyper - it has built in threading. I switch back and forth between samtools and unified genotyper depending on the project. Both have pros and cons. If you are working with human data, you probably want to use the Broad's "best practices" variant calling pipeline.

ADD REPLY
1
Entering edit mode

I've seen an increase in samtools variant calling pipelines since GATK's new commercial license policy.

ADD REPLY
0
Entering edit mode

my experience: I split the output of bwa per chomosome and process each chunk on our cluster in parallel. I merge the results (BAMs/VCFs) at the end. I don't have any log to say if it's much faster

ADD REPLY
1
Entering edit mode
11.0 years ago

Well theoretically, smaller the input files faster will be the processing. But I usually call SNPs per chromosome which is less complicated and faster. I have 20 nodes in my cluster and that means each node can take one chromosome. For mpileup, if you dont specifiy a region but only chromosome for e.g. "-r chr2" parameter it would process the whole chromosome. The bam index file will help the mpileup program to retrieve related sequences for any chromosome.

ADD COMMENT
0
Entering edit mode

Thank you so much for your reply. that is a great help... thank you

ADD REPLY

Login before adding your answer.

Traffic: 2206 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6