Question: Varscan2: run in parallel / merge mpile output from diff run into single mpileup
1
gravatar for Chirag Nepal
4.8 years ago by
Chirag Nepal2.2k
Copenhagen
Chirag Nepal2.2k wrote:

Hi all,

I am using varscan2 to identify SNPs in our data (exome study comparing normal/tumor from 5 patients).

For testing i used only two pairs,

samtools mpileup -f assembly.fa ST1_normal.bam ST1_tumor.bam ST2_normal.bam ST2_tumor.bam > myTestData.mpileup

 

It takes long time, is there a way how can i run in parrallel. Or if i run it in single pair (tumor/normal), can i simply concatenate all mpileup files into single output file, which i can input to varscan2 for SNP calling.

 

Thanks for your help !

cheers

Chirag

 

 

snp varcsan exome • 1.7k views
ADD COMMENTlink modified 4.8 years ago by Pierre Lindenbaum119k • written 4.8 years ago by Chirag Nepal2.2k
1
gravatar for Pierre Lindenbaum
4.8 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum119k wrote:

run one mpileup in parallel for each chromosome, region, using the option -l or -r

-l FILE 	BED or position list file containing a list of regions or sites where pileup or BCF should be generated
- r STR 	Only generate pileup in region STR
samtools mpileup -f assembly.fa  -r chr1 (...) >  myTestData1.mpileup
samtools mpileup -f assembly.fa  -r chr2 (...) >  myTestData2.mpileup
samtools mpileup -f assembly.fa  -r chr3 (...) >  myTestData2.mpileup
(...)
ADD COMMENTlink written 4.8 years ago by Pierre Lindenbaum119k

Dear Pierre,

Thanks for the answer. So just to be clear i need to run separately for each chromosome

 

samtools mpileup -f assembly.fa  -r chr1 Nor_1.bam Tum_1.bam Nor_2.bam Tum_2.bam Nor_3.bam Tum_3.bam Nor_N.bam Tum_N.bam > mplie_N1.mpileup
samtools mpileup -f assembly.fa  -r chr2 Nor_1.bam Tum_1.bam Nor_2.bam Tum_2.bam Nor_3.bam Tum_3.bam Nor_N.bam Tum_N.bam > mplie_N2.mpileup

So to make these run in parallel, i need to submit the jobs separately in different bash scripts ? I guess if i put all these commands in a single bash script they will run sequentially. Right ? Though it might be faster when chr is separated.

 

Next, when we have this multiple mpileup results, can we simple concatenate in single file :

like cat mplie_N1.mpileup mplie_N2.mpileup >  mplie_total.mpileup

Or does samtools have some functions to concat them.

 

thanks !

cheers

Chirag

 

 

 

ADD REPLYlink written 4.8 years ago by Chirag Nepal2.2k

"So to make these run in parallel, i need to submit the jobs separately in different bash scripts ?" use GNU paralel or GNU make with option -j  How To Run Muscle In Batch?

ADD REPLYlink written 4.8 years ago by Pierre Lindenbaum119k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1280 users visited in the last hour