How to extract sample IDs from VCF into new VCF?
1
0
Entering edit mode
3.8 years ago
dec986 ▴ 370

I've written a script that extracts VCF IDs from a large VCF, in order to get a subset of the sample IDs (people).

This script copies the header from the original VCF and simply pastes it into the new one. I then extract only the sample IDs that I need for each line in the original VCF.

However, I want to validate that this new VCF is valid, using vcf-validator.

vcf-validator ALL.chr18.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf 2>&1 >/dev/null | head
18:10083 .. AN is 5008, should be 124
18:10083 .. AC is 1, should be 0
...

These errors appear to be caused by the INFO field:

grep -A1 -m1 ^#CHROM ALL.chr18.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf | cut -f 1-19
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  HG00096 NA19625 NA19700 NA19701NA19703  NA19704 NA19707 NA19711 NA19712 NA19713
18  10083   esv3641458  A   <CN0>   100 PASS    AC=1;AF=0.000199681;AN=5008;CIEND=-500,1000;CIPOS=-1000,500;CS=DEL_union;END=63656;NS=2504;SVTYPE=DEL;DP=69995;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;VT=SV;EX_TARGET GT  0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0 0|0

so then I try to just replace the INFO field AC=1;AF=0.000199681;AN=5008;CIEND=-500,1000;CIPOS=-1000,500;CS=DEL_union;END=63656;NS=2504;SVTYPE=DEL;DP=69995;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;VT=SV;EX_TARGET GT with just . but then I get bizarre errors like

column NA19923 at 18:10083 .. FORMAT tag [.] not listed in the header

which I don't know how to fix.

Is there a way that I can easily extract given sample IDs from a VCF?

vcf • 1.7k views
ADD COMMENT
2
Entering edit mode
3.8 years ago

I've written a script that extracts VCF IDs from a large VCF, in order to get a subset of the sample IDs (people).

isn't it a job for bcftools view --samples ?

ADD COMMENT
0
Entering edit mode

Indeed, this is the solution

ADD REPLY

Login before adding your answer.

Traffic: 2964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6