Question: Using Gnu Parallel For Bedtools
3
gravatar for geek_y
7.0 years ago by
geek_y11k
Barcelona
geek_y11k wrote:

I am trying to run gnu:parallel on bedtools multicov function where the original command is

bedtools multicov -bams bam1 bam2 bam3.. -bed anon.bed  > Q1_Counst.bed

I would like to implement the above command using gnu parallel. But when I run the command below

parallel -j 25 "bedtools multicov -bams {1} -bed {2} > Q1_Counst.bed" ::: minus_1_common_sorted_q1.bam minus_2_common_sorted_q1.bam minus_3_common_sorted_q1.bam plus_1_common_sorted_q1.bam plus_2_common_sorted_q1.bam plus_3_common_sorted_q1.bam ::: '/genome/genes_exon_2.bed'

each bam file is taken as separate argument , hence the processes starting are like

bedtools multicov -bams  bam1 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam2 -bed anon.bed  > Q1_Counst.bed
bedtools multicov -bams  bam3 -bed anon.bed  > Q1_Counst.bed

instead of taking all files as separate arguments. Hence Q1_Counst.bed is overwritten randomly. Could any one help me in getting exact command ? My server has around 30 cores.

bedtools linux parallel bash • 3.7k views
ADD COMMENTlink modified 7.0 years ago by ole.tange4.0k • written 7.0 years ago by geek_y11k
3
gravatar for Pierre Lindenbaum
7.0 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum133k wrote:

split your bed using split

split -l100 anon.bed TMPBED

and then call multiBamCov witch each bed

ls TMPBED* | parallel   multiBamCov -bams f1.bam  f2.bam -bed '{}'  '>' out.{}.bed
ADD COMMENTlink written 7.0 years ago by Pierre Lindenbaum133k
2

But it is more like

split -l100 anon.bed TMPBED

for bed in TMPBED*; do multiBamCov -bams f1.bam  f2.bam -bed $bed > $bed_out.bed & done

which create <int TMPBED*> number of sub processes in shell. Is there any other advantage here in running gnu parallel ?

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by geek_y11k
1

you can limit the number of parallel jobs, you can use a remote server, and then fetch the result back , you can re-analyze only the jobs that failed, ...

ADD REPLYlink written 7.0 years ago by Pierre Lindenbaum133k

Thanks.. It is working. :)

ADD REPLYlink written 7.0 years ago by geek_y11k
2
gravatar for ole.tange
7.0 years ago by
ole.tange4.0k
Denmark
ole.tange4.0k wrote:

If you can get multiBamCov to read from stdin, you can avoid the tmp files:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  '>' out.{#}.bed

Or if you just want all output merged into a single file:

cat anon.bed | parallel -l100 --pipe multiBamCov -bams f1.bam  f2.bam -bed stdin  >out.bed

I have never used multiBamCov, so if -bed stdin does not work, you might try:

-bed /dev/stdin
-bed '<( cat )'
-bed -
ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by ole.tange4.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1424 users visited in the last hour
_