More Efficient: Whole Genome VCF splitting with and without tbi file
1
0
Entering edit mode
13 months ago
S • 0

Hello,

I am currently writing code to split a whole genome vcf file by chromosome. Right now, I do so with bcfTools to output 22 .vcf.gz files with the flag --target such that I can avoid the necessity of using --region with its mandatory index tbi file. However, this process is rather slow and incurs high expenses.

Looking towards alternatives, I am considering adding a more upstream step to my pipeline that creates a tbi file from my initial whole genome vcf which can be used in the splitting stage.

Does this addition make sense? Would this reduce time and costs? And if not, are there any alternatives that I should consider?

Thank you.

This is my current code:

for i in {1..22}; do bcftools view "$(input)" --targets chr$i --output "$(output)"-chr-$i.vcf.gz --output-type z ; done
bcftools vcf tbi • 676 views
ADD COMMENT
0
Entering edit mode
13 months ago

Alternatives ? Split by chromosome in parallel, or write a one-pass program that scan the variants and dispatch each variant to its chromosome writer.

ADD COMMENT
0
Entering edit mode

Something like this? parallel -j 10 bcftools view "{}" --targets chr{} --output "{}-chr-{}.vcf.gz" -Oz ::: myInput.vcf.gz ::: {1..22}

ADD REPLY

Login before adding your answer.

Traffic: 2557 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6