VCF FORMAT column
1
0
Entering edit mode
7.8 years ago
fire_water ▴ 80

I have a question about the FORMAT column. So, let's say the first row of data looks like this:

------------------------------
FORMAT       |    SAMPLE_001  |
------------------------------
GT:GQ:DP:HQ    0|0:99:23:23,34

Does that mean all of the following rows will have the exact same value for the FORMAT column? For example, every row that follows will have GT:GQ:DP:HQ in its FORMAT column. Or, will it vary? For example, the next row could have GT:HQ:DP:GQ (different order) or GT:GQ:DP (no HQ).

Thanks!

sequencing • 2.1k views
ADD COMMENT
2
Entering edit mode
7.8 years ago

Or, will it vary?

yes

$ gunzip -c my.vcf.gz | grep -v "#" | head -n 1000 | cut -f 9 | uniq | sort | uniq
GT:AD:DP:GQ:PGT:PID:PL
GT:AD:DP:GQ:PL

the VCF spec doesn't say that the FORMAT MUST be sorted. And GT must be the first field.

If genotype information is present, then the same types of data must be present for all samples. First a FORMAT eld is given specifying the data types and order (colon-separated alphanumeric String). This is followed by one eld per sample, with the colon-separated data in this eld corresponding to the types speci ed in the format. The rst sub- eld must always be the genotype (GT) if it is present.

ADD COMMENT
0
Entering edit mode

Thank you, kind Sir :)

ADD REPLY

Login before adding your answer.

Traffic: 1940 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6