Interpreting allele values in VFC file
1
0
Entering edit mode
4.4 years ago

Hi All,

Please could someone help me with interpreting these allele values in the VFC file below.

# [1]CHROM  [2]POS  [3]REF  [4]ALT  [5]ALT  [6]QUAL [7]DP   [8]RO   [9]AO   [10]Par-1_DHT02696-8_L6:GT  [11]Par-1_DHT02696-8_L6:DP  [12]Par-1_DHT02696-8_L6:RO  [13]Par-1_DHT02696-8_L6:AO  [14]Par-1_DHT02696-8_L6:AO  [15]Par-2_DHT02696-9_L6:GT  [16]Par-2_DHT02696-9_L6:DP  [17]Par-2_DHT02696-9_L6:RO  [18]Par-2_DHT02696-9_L6:AO  [19]Par-2_DHT02696-9_L6:AO

1.Chr1A 61556   G   A   .   26.7544 4   2   2   0/1 4   2   2   .   .   .   .   .   .
2.Chr1A 95880   C   T   .   57.0319 2   0   2   1/1 2   0   2   .   .   .   .   .   .
3.Chr1A 1156169 G   T   .   1.59189e-14 90  88  2   0/0 35  33  2   .   0/0 55  55  0   .
4.Chr1A 1159646 G   A   .   0.0185916   162 149 13  0/0 67  67  0   .   0/1 95  82  13  .
5.Chr1A 1940398 TG  CG  CA  306.879 27  12  8   1/2 8   0   1   7   0/1 19  12  7   0

Here, I aligned two bam files (one from Par-1 and second from Par-2) to Reference using freebayes. Initial aim was to have VFC file with three columns for REF, Par-1 and Par-2, but as you can see the third column is mostly empty (why?). Anyway, I tried to understand it. So, I wonder if someone could help me with allele values 0/0, 0/1, 1/1, 1/2.

Does line-1 say that REF is "G", Par-1 (ALT-1) is "A". If so, what about Par-2 represented by many dots? Why in line-2 allele value is 1/1? Does line3 say that REF is "G", Par-1 (ALT-1) is "T"? and what about Par-2? The same question for line-4 and line-5.

Thanks a lot Kanat

alignment snp • 868 views
ADD COMMENT
0
Entering edit mode
4.4 years ago
b.bearmi ▴ 10

0/0, 0/1, 1/1 refer to the genotype, a reference base called in a position would be 0/0, an alternative in one copy (presuming diploid) would be 0/1 and substitution in both alleles 1/1. Sometimes an alternative is called but the ratio between alternative and reference is less than 0.5 (either sequencing errors or copy number variation), I assume in this situation you would get a 0/0 for a genotype in Freebayes. A dot means reference did not change. Are Par-1 and Par2 refer to the same sequenced sample or different samples? Anyway looking at the sequence at the exact position of a called SNV might help (Tablet, if you are a Mac user, IGV if not, also samtools can do it, but in a less user-friendly manner (I think it is something like samtools view aln.sorted.bam chr2:20,100,000-20,200,000, but you would have to check)

ADD COMMENT
0
Entering edit mode

Thanks for your quick response.

So, it means line-1 should look like "Chr1A 61556 G A G (instead of dot)". What about 1/2 in the line-5? Sorry for stupid question, can you give an example for substitution in both alleles 1/1? In my understanding 0/1 is GG vs AA(line-1). Par-1 and Par-2 are different samples and refers to Parent -1 and Parent-2.

ADD REPLY
0
Entering edit mode

Unless you are dealing with bacteria, you would have two copies of the same gene (plants - could be more, see "ploidy") 0/0 == G | G (G in copy one and G in copy 2, thinking humans: "one from mum and one from dad", unless you are studying something like X chromosome, then only females will have two copies... ) 0/1 == A | G 1/1 == A | A

1/2 confuses me a bit, but if your variant caller was running two samples simultaneously, it might have a notation for "alt in both samples", but alternatively it might refer to the depth of coverage (how many reads support the ref /how many for alt, but I think former rather then the latter.

ADD REPLY

Login before adding your answer.

Traffic: 1440 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6