Entering edit mode
8 months ago
optimistsso4co3 ▴ 100
I would like to extract specific samples from vcf with speed of bcftools query. Is it possible? Here is an example that obviously does not work:
bcftools query -f '%CHROM %POS %Sample1 %Sample2'
Bcftools viewappears to be very slow for extracting individual samples, e.g. for my 70 gigabit vcf it takes 1.5h to extract one sample.
split per regions, run in parallel, run bcftools concat at the end.
I this case 70 segmets of 600 samples would mean 42 000 jobs, which seems risky. But i will try.
uh ? these are only 70 jobs (70x extract two samples) , unless I didn't understand your question.
I didn't frame the problem correctly. The goal is to make individual VCF for each sample, of which there are 600 which is extremely slow with bcftools view.
I figured i could accomplish my aim with bcftools query and then substract genotypes that are present in my target sample. However, i don't know how to do that.
I will reframe the question and make another post.