Question: Variant calling using samtools
0
gravatar for banerjeeshayantan
2.9 years ago by
banerjeeshayantan160 wrote:

I have cancer dataset containing 10 tumor and 10 control pairs. Each tumor or control dataset is 100 GB in size. I have a refrence sequence too. (mm9.fa). SO I need to do some variant calling using these available data. What I do is the following:-

samtools mpileup -g -f mm9.fa *.bam | bcftools view -bvcg - > var.raw.bcf (mpileup takes all the tumor and control pairs) bcftools view var.raw.bcf | vcfutils.pl varFilter -D100 > var.flt.vcf

But the entire process is agonizingly slow (>10 hrs and still nothing). What should I do? Can I find the variants individually and then merge them into one large bcf file? P.S: I am very new to this field and pardon my ignorance in some words written above. Thanks in advance.

alignment • 2.0k views
ADD COMMENTlink modified 2.9 years ago by Brian Bushnell17k • written 2.9 years ago by banerjeeshayantan160

Hey, What about this MutScan: detect and visualize target mutations by just scanning FastQ, 50X faster than normal pipelines ? They said 50x faster than classic pipeline :)

Best

ADD REPLYlink written 2.9 years ago by Titus910
0
gravatar for vmicrobio
2.9 years ago by
vmicrobio250
vmicrobio250 wrote:

you have to try it with submission to a computer cluster. It will save you some time and computer access.

ADD COMMENTlink written 2.9 years ago by vmicrobio250
0
gravatar for Brian Bushnell
2.9 years ago by
Walnut Creek, USA
Brian Bushnell17k wrote:

BBMap's variant-caller is roughly 80x faster than the pipeline you are using. Samtools has to be in your path (and I recommend samtools 1.4, which is much faster than older versions). The command for multiple samples would be:

callvariants.sh in=sample1.bam,sample2.bam,sample3.bam ref=mm9.fa ploidy=2 out=variants.vcf multisample
ADD COMMENTlink modified 2.9 years ago • written 2.9 years ago by Brian Bushnell17k

Thanks for the suggestion. I have 20 bam files. Can I write *.bam here? Its giving an error here. Or do I need to write each and every file name?

ADD REPLYlink written 2.9 years ago by banerjeeshayantan160

Sorry :) I'll modify it to allow *.bam but right now you have to list all of them. Also, please note I modified the command to add the term "multisample".

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by Brian Bushnell17k

Thanks! Will do. This really helped.

ADD REPLYlink written 2.9 years ago by banerjeeshayantan160
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1196 users visited in the last hour