Interpreting allele values in VFC file
16 months ago

Hi All,

Please could someone help me with interpreting these allele values in the VFC file below.

# [1]CHROM  [2]POS  [3]REF  [4]ALT  [5]ALT  [6]QUAL [7]DP   [8]RO   [9]AO   [10]Par-1_DHT02696-8_L6:GT  [11]Par-1_DHT02696-8_L6:DP  [12]Par-1_DHT02696-8_L6:RO  [13]Par-1_DHT02696-8_L6:AO  [14]Par-1_DHT02696-8_L6:AO  [15]Par-2_DHT02696-9_L6:GT  [16]Par-2_DHT02696-9_L6:DP  [17]Par-2_DHT02696-9_L6:RO  [18]Par-2_DHT02696-9_L6:AO  [19]Par-2_DHT02696-9_L6:AO

1.Chr1A 61556   G   A   .   26.7544 4   2   2   0/1 4   2   2   .   .   .   .   .   .
2.Chr1A 95880   C   T   .   57.0319 2   0   2   1/1 2   0   2   .   .   .   .   .   .
3.Chr1A 1156169 G   T   .   1.59189e-14 90  88  2   0/0 35  33  2   .   0/0 55  55  0   .
4.Chr1A 1159646 G   A   .   0.0185916   162 149 13  0/0 67  67  0   .   0/1 95  82  13  .
5.Chr1A 1940398 TG  CG  CA  306.879 27  12  8   1/2 8   0   1   7   0/1 19  12  7   0


Here, I aligned two bam files (one from Par-1 and second from Par-2) to Reference using freebayes. Initial aim was to have VFC file with three columns for REF, Par-1 and Par-2, but as you can see the third column is mostly empty (why?). Anyway, I tried to understand it. So, I wonder if someone could help me with allele values 0/0, 0/1, 1/1, 1/2.

Does line-1 say that REF is "G", Par-1 (ALT-1) is "A". If so, what about Par-2 represented by many dots? Why in line-2 allele value is 1/1? Does line3 say that REF is "G", Par-1 (ALT-1) is "T"? and what about Par-2? The same question for line-4 and line-5.

Thanks a lot Kanat

16 months ago
b.bearmi ▴ 10

0/0, 0/1, 1/1 refer to the genotype, a reference base called in a position would be 0/0, an alternative in one copy (presuming diploid) would be 0/1 and substitution in both alleles 1/1. Sometimes an alternative is called but the ratio between alternative and reference is less than 0.5 (either sequencing errors or copy number variation), I assume in this situation you would get a 0/0 for a genotype in Freebayes. A dot means reference did not change. Are Par-1 and Par2 refer to the same sequenced sample or different samples? Anyway looking at the sequence at the exact position of a called SNV might help (Tablet, if you are a Mac user, IGV if not, also samtools can do it, but in a less user-friendly manner (I think it is something like samtools view aln.sorted.bam chr2:20,100,000-20,200,000, but you would have to check)

So, it means line-1 should look like "Chr1A 61556 G A G (instead of dot)". What about 1/2 in the line-5? Sorry for stupid question, can you give an example for substitution in both alleles 1/1? In my understanding 0/1 is GG vs AA(line-1). Par-1 and Par-2 are different samples and refers to Parent -1 and Parent-2.

Unless you are dealing with bacteria, you would have two copies of the same gene (plants - could be more, see "ploidy") 0/0 == G | G (G in copy one and G in copy 2, thinking humans: "one from mum and one from dad", unless you are studying something like X chromosome, then only females will have two copies... ) 0/1 == A | G 1/1 == A | A

1/2 confuses me a bit, but if your variant caller was running two samples simultaneously, it might have a notation for "alt in both samples", but alternatively it might refer to the depth of coverage (how many reads support the ref /how many for alt, but I think former rather then the latter.