Question: Is it possible to run variant calling software in parallel, are there any Shell and Python/R code examples available?
0
gravatar for bioinform
20 months ago by
bioinform20
bioinform20 wrote:

Is it possible to run variant calling software in parallel for a 1000 of bacterial BAMs, are there any Shell and Python/R code examples available?

parallel variant calling vcf • 913 views
ADD COMMENTlink modified 20 months ago by ivivek_ngs4.7k • written 20 months ago by bioinform20
1

Variant calling tools, like almost all tools, can be parallized with GNU parallel. A little more information about which tool you are using would be helpful though.

ADD REPLYlink written 20 months ago by ATpoint14k
7
gravatar for Pierre Lindenbaum
20 months ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum118k wrote:

One VCF per BAM ? create the following Makefile

SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run

$$(addsuffix .vcf,$$(basename $(1))): $(1)
    samtools mpileup  -u -f ref.fa $$< | bcftools call -vcg - > $$@
endef

all: $(addsuffix .vcf,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))

run it in parallel using the option -j <jobs> of make

make -j 16
ADD COMMENTlink written 20 months ago by Pierre Lindenbaum118k

Dear Pierre, I get an error with separators :

makefile:13: * missing separator. Stop.

For this input makefile:

SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run

$$(addsuffix .vcf,$$(basename $(1))): $(1)

samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar
/home/fil/VarScan.v2.3.9.jar  mpileup2indel --min-coverage 10
--min-avg-qual 15 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.1 --output-vcf 1 > $$@

endef

all: $(addsuffix .vcf,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))
  

Could you help me where could be a problem? I know about make file problem with tab del. But if I used

cat -e -t -v makefile

I did not see any problem. Thank you.

ADD REPLYlink modified 13 months ago • written 13 months ago by Korsocius100
1

there must be a tabulation before samtools mpileup

ADD REPLYlink written 13 months ago by Pierre Lindenbaum118k

It works, thank you.

ADD REPLYlink written 13 months ago by Korsocius100

Peirre , can I have another question, please? Is it possible to run two makefiles in once time, because varscan calls separately snps and indels. If I tried to add another row to makefile with "samtools...... | varscan mpileup2snp..." second command was not parallelized.

I tired this, but doesn't work:

SHELL=/bin/bash
BAMS=$(shell ls -1 *.bam)
define run

$$(addsuffix .vcf,$$(basename $(1))): $(1)

  samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar
/home/fil/VarScan.v2.3.9.jar mpileup2snp --min-coverage 10
--min-avg-qual 25 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.05 --output-vcf 1 > $$@

  samtools mpileup -d 18000 -L 18000 -f /home/fil/Desktop/new_reference_hg38/hg38.fa $$< | java -jar
/home/fil/VarScan.v2.3.9.jar mpileup2indel --min-coverage 10
--min-avg-qual 25 --min-reads2 2 --min-var-freq 0.1 --min-freq-for-hom 0.85 --strand-filter 1 --p-value 0.05 --output-vcf 1 > $$@

endef

all: $(addsuffix .vcf,$(basename ${BAMS}))

$(eval $(foreach B,${BAMS},$(call run,${B})))
  
ADD REPLYlink modified 13 months ago • written 13 months ago by Korsocius100
3
gravatar for h.mon
20 months ago by
h.mon24k
Brazil
h.mon24k wrote:

Yes, it is possible. One method would be using parallel, see F: Got Multiple Cpus? Activate Them With Gnu Parallel and GNU Parallel for mutlithreaded tasks for a primer.

bcbio is another option, it is a scalable pipeline for variant analysis.

ADD COMMENTlink modified 20 months ago • written 20 months ago by h.mon24k
1
gravatar for ivivek_ngs
20 months ago by
ivivek_ngs4.7k
Seattle,WA, USA
ivivek_ngs4.7k wrote:

This is an exome variant calling pipeline per sample to generate the bam and the SNP and indels VCF with GATK per sample, you can stop at the step of last bam file generation and use something like mutect2 or varscan for somatic calls and this can be easily parallelised with shell/bash. The workflow is here.

ADD COMMENTlink written 20 months ago by ivivek_ngs4.7k

need a code example using GNU parallel

ADD REPLYlink written 19 months ago by bioinform20

Have you tried reading the manual?

ADD REPLYlink written 19 months ago by WouterDeCoster37k

sorry, this is something you have to read from manuals as @Wouter said. Also, you should try to learn. for GATK now you have WDL which works on Cromwell engine. You can take a look at the GATK 3.7 and WDL to come up with your code for parallelization. If you have problems their forum will help you.

ADD REPLYlink written 19 months ago by ivivek_ngs4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1996 users visited in the last hour