Question

Wgs: Variant Calling Per Chr Or Region?

1

Entering edit mode

11.0 years ago

mathiasf52 ▴ 20

Hello,

I have a whole human genome sequence data from illumina and I am developing the pipeline to analyze it. I would like to call variants with samtools mpileup. To speed up this step I just wondering if i should call variants per chromosome or I divide the chromosomes in different region and then call the variants? which one is faster or convenient?

Thank you Jean

mpileup • 3.2k views

ADD COMMENT • link updated 11.0 years ago by Ashutosh Pandey 12k • written 11.0 years ago by mathiasf52 ▴ 20

1

Entering edit mode

You should consider unified genotyper - it has built in threading. I switch back and forth between samtools and unified genotyper depending on the project. Both have pros and cons. If you are working with human data, you probably want to use the Broad's "best practices" variant calling pipeline.

ADD REPLY • link 11.0 years ago by Zev.Kronenberg 12k

1

Entering edit mode

I've seen an increase in samtools variant calling pipelines since GATK's new commercial license policy.

ADD REPLY • link 11.0 years ago by Vivek ★ 2.7k

0

Entering edit mode

my experience: I split the output of bwa per chomosome and process each chunk on our cluster in parallel. I merge the results (BAMs/VCFs) at the end. I don't have any log to say if it's much faster

ADD REPLY • link 11.0 years ago by Pierre Lindenbaum 162k

score 1 · Answer 1 · 2013-06-13

1

Entering edit mode

11.0 years ago

Ashutosh Pandey 12k

Well theoretically, smaller the input files faster will be the processing. But I usually call SNPs per chromosome which is less complicated and faster. I have 20 nodes in my cluster and that means each node can take one chromosome. For mpileup, if you dont specifiy a region but only chromosome for e.g. "-r chr2" parameter it would process the whole chromosome. The bam index file will help the mpileup program to retrieve related sequences for any chromosome.