I have a question about the FORMAT column. So, let's say the first row of data looks like this:
------------------------------
FORMAT | SAMPLE_001 |
------------------------------
GT:GQ:DP:HQ 0|0:99:23:23,34
Does that mean all of the following rows will have the exact same value for the FORMAT column? For example, every row that follows will have GT:GQ:DP:HQ in its FORMAT column. Or, will it vary? For example, the next row could have GT:HQ:DP:GQ (different order) or GT:GQ:DP (no HQ).
the VCF spec doesn't say that the FORMAT MUST be sorted. And GT must be the first field.
If genotype information is present, then the same types of data must be present for all samples. First a FORMAT
eld is given specifying the data types and order (colon-separated alphanumeric String). This is followed by one eld
per sample, with the colon-separated data in this eld corresponding to the types speci ed in the format. The rst
sub- eld must always be the genotype (GT) if it is present.
Thank you, kind Sir :)