Is it possible to run variant calling software in parallel, are there any Shell and Python/R code examples available?
3
0
Entering edit mode
4.3 years ago
bioinform ▴ 20

Is it possible to run variant calling software in parallel for a 1000 of bacterial BAMs, are there any Shell and Python/R code examples available?

parallel vcf variant calling • 1.8k views
ADD COMMENT
1
Entering edit mode

Variant calling tools, like almost all tools, can be parallized with GNU parallel. A little more information about which tool you are using would be helpful though.

ADD REPLY
7
Entering edit mode
4.3 years ago

One VCF per BAM ? create the following Makefile

SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run

$$(addsuffix .vcf,$$(basename $(1))): $(1)
    samtools mpileup  -u -f ref.fa $$< | bcftools call -vcg - > $$@
endef

all: $(addsuffix .vcf,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))

run it in parallel using the option -j <jobs> of make

make -j 16
ADD COMMENT
0
Entering edit mode

Dear Pierre, I get an error with separators :

makefile:13: * missing separator. Stop.

For this input makefile:

SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run

$$(addsuffix .vcf,$$(basename $(1))): $(1)

samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar
/home/fil/VarScan.v2.3.9.jar  mpileup2indel --min-coverage 10
--min-avg-qual 15 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.1 --output-vcf 1 > $$@

endef

all: $(addsuffix .vcf,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))
  

Could you help me where could be a problem? I know about make file problem with tab del. But if I used

cat -e -t -v makefile

I did not see any problem. Thank you.

ADD REPLY
1
Entering edit mode

there must be a tabulation before samtools mpileup

ADD REPLY
0
Entering edit mode

It works, thank you.

ADD REPLY
0
Entering edit mode

Peirre , can I have another question, please? Is it possible to run two makefiles in once time, because varscan calls separately snps and indels. If I tried to add another row to makefile with "samtools...... | varscan mpileup2snp..." second command was not parallelized.

I tired this, but doesn't work:

SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run

$$(addsuffix .vcf,$$(basename $(1))): $(1)

  samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar
/home/fil/VarScan.v2.3.9.jar mpileup2snp --min-coverage 10
--min-avg-qual 25 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.05 --output-vcf 1 > $$@

  samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar
/home/fil/VarScan.v2.3.9.jar mpileup2indel --min-coverage 10
--min-avg-qual 25 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.05 --output-vcf 1 > $$@

endef

all: $(addsuffix .vcf,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))
  
ADD REPLY
3
Entering edit mode
4.3 years ago
h.mon 33k

Yes, it is possible. One method would be using parallel, see F: Got Multiple Cpus? Activate Them With Gnu Parallel and GNU Parallel for mutlithreaded tasks for a primer.

bcbio is another option, it is a scalable pipeline for variant analysis.

ADD COMMENT
1
Entering edit mode
4.3 years ago
ivivek_ngs ★ 5.1k

This is an exome variant calling pipeline per sample to generate the bam and the SNP and indels VCF with GATK per sample, you can stop at the step of last bam file generation and use something like mutect2 or varscan for somatic calls and this can be easily parallelised with shell/bash. The workflow is here.

ADD COMMENT
0
Entering edit mode

need a code example using GNU parallel

ADD REPLY
0
Entering edit mode

Have you tried reading the manual?

ADD REPLY
0
Entering edit mode

sorry, this is something you have to read from manuals as @Wouter said. Also, you should try to learn. for GATK now you have WDL which works on Cromwell engine. You can take a look at the GATK 3.7 and WDL to come up with your code for parallelization. If you have problems their forum will help you.

ADD REPLY

Login before adding your answer.

Traffic: 2383 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6