Using xargs to tabix each line of a bed to a vcf
1
0
Entering edit mode
4.4 years ago
jvijai ★ 1.2k

I want to write out each line of BED (region) to its own vcf using tabix

Here is what I was attempting, but its not working

awk '{print $1":"($2+1)"-"$3}' CHR21_RegionsforBeagle.bed | xargs -n1 tabix -fh {} 21.ACANAFCR_sorted.vcf.gz >Chr{}.sorted.vcf

I am not sure I am using the {} in xargs properly.

The error I get

[E::hts_open_format] Failed to open file {}

Could not read {}

tabix xargs awk vcf • 1.1k views
ADD COMMENT
0
Entering edit mode

Removed by the author.

ADD REPLY
0
Entering edit mode
4.4 years ago
ATpoint 82k

The syntax of tabix is wrong. It needs to be tabix (options) file.vcf.gz {regions}:

awk '{print $1":"($2+1)"-"$3}' CHR21_RegionsforBeagle.bed | xargs -n1 tabix -fh 21.ACANAFCR_sorted.vcf.gz {}

but I am not sure if the redirection to the file will work. What will work is the same with parallel:

awk '{print $1":"($2+1)"-"$3}' CHR21_RegionsforBeagle.bed | parallel "tabix -fh 21.ACANAFCR_sorted.vcf.gz {} > Chr{}.sorted.vcf"

You can use the parallel parameter -j to limit the number of parallel jobs to something reasonable like maybe 10 to avoid excessive I/O operations on the same file.

ADD COMMENT

Login before adding your answer.

Traffic: 3067 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6