Question

Using xargs to tabix each line of a bed to a vcf

0

Entering edit mode

4.4 years ago

jvijai ★ 1.2k

I want to write out each line of BED (region) to its own vcf using tabix

Here is what I was attempting, but its not working

awk '{print $1":"($2+1)"-"$3}' CHR21_RegionsforBeagle.bed | xargs -n1 tabix -fh {} 21.ACANAFCR_sorted.vcf.gz >Chr{}.sorted.vcf

I am not sure I am using the {} in xargs properly.

The error I get

[E::hts_open_format] Failed to open file {}

Could not read {}

tabix xargs awk vcf • 1.1k views

ADD COMMENT • link updated 4.4 years ago by ATpoint 82k • written 4.4 years ago by jvijai ★ 1.2k

0

Entering edit mode

Removed by the author.

ADD REPLY • link 4.4 years ago by massa.kassa.sc3na ▴ 600

score 0 · Answer 1 · 2019-12-06

The syntax of tabix is wrong. It needs to be tabix (options) file.vcf.gz {regions}:

awk '{print $1":"($2+1)"-"$3}' CHR21_RegionsforBeagle.bed | xargs -n1 tabix -fh 21.ACANAFCR_sorted.vcf.gz {}

but I am not sure if the redirection to the file will work. What will work is the same with parallel:

awk '{print $1":"($2+1)"-"$3}' CHR21_RegionsforBeagle.bed | parallel "tabix -fh 21.ACANAFCR_sorted.vcf.gz {} > Chr{}.sorted.vcf"

You can use the parallel parameter -j to limit the number of parallel jobs to something reasonable like maybe 10 to avoid excessive I/O operations on the same file.