I have generated a gVCF (genomic VCF) with GATK using this command line:
gatk HaplotypeCaller -R GRCh38.p13.genome.fa.gz -I bam.bam -D dbsnp_146.hg38.vcf.gz -O genome.g.vcf.gz -ERC GVCF --output-mode EMIT_ALL_ACTIVE_SITES
I'm using GRCh38 patch 13, my BAM file is also based on it.
I want to analyze my genotype for rs4633: https://www.snpedia.com/index.php/Rs4633
As I understand from SNPedia, the reference allele is "C". As I have two variants from my mother and my father, I can have either (C;C), (C;T) or (T;T)
In my gVCF I have the corresponding line:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT bar
chr22 19962712 rs4633 C T,<NON_REF> 1117.03 . DB;DP=29;ExcessHet=3.0103;MLEAC=2,0;MLEAF=1.00,0.00;RAW_MQandDP=95148,29
The reference allele in chromosome 22 at position 19962712 is "C". But what exactly does "T,<non_ref>" mean in the ALT column? How should I interpret this? How can I extract the two alleles (father/mother) from this line and map them to the ones from SNPedia: (C;C), (C;T) or (T;T)?
To ask the question in a different way: how can I obtain both my alleles from the ALT column?
To add to this confusion. Another example. I want to know my genotype for rs1801131. SNPedia says: https://www.snpedia.com/index.php/rs1801131 Possible alleles: (A;A), (A;C), (C;C)
Why the "letters" are different than rs4633? Is it because of A=T and G=C?
The line in the gVCF is:
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT bar
chr1 11794419 rs1801131 T G,<NON_REF> 1193.60 . BaseQRankSum=0.306;DB;DP=54;ExcessHet=3.0103;MLEAC=1,0;MLEAF=0.500,0.00;MQRankSum=-4.833;RAW_MQandDP=151742,54;ReadPosRankSum=1.170 GT:AD:DP:GQ:PGT:PID:PL:PS:SB 0|1:19,35,0:54:99:0|1:11794400_G_A:1201,0,583,1258,688,1945:11794400:12,7,20,15
So reference allele "T" and "alternative" is "G,<non_ref>". So what are my two alleles which I can map to the possible alleles from SNPedia: (A;A), (A;C), (C;C)?
My assumption is that REF column is the allele from GRCh38 and ALT is somehow my own, but there should be two - one from mom and one from dad. What I'm doing wrong?
Thank you very much for this answer! I ran the GenotypeGVCFs tool and the result for rs4633 for example is:
So the reference allele is C and "mine" is T. The problem is that I still need two "mine" alleles - one from my mother and one from my father. Isn't this right? Otherwise, I can't map this "T" that I got from the conversion to the two alleles tuple described in SNPedia: (C;C), (C;T) or (T;T). What am I missing? Thanks in advance!
The last column shows the genotype
the
1/1
means the genotype isT/T
a
0/0
would beC/C
a
0/1
would beC/T
the
0,29
means there were 0 reads with a C and 29 with a THave a look at the VCF specifications to understand the other columns
Thank you so much! This is exactly what I was missing! rbagnall and RamRS, I'm really grateful. This was the last confusion I was having. Now everything is clear. I've accepted the answer. Thank you, thank you so much!
Hi, thanks but this answer was all rbagnall. I merely edited their (and your) formatting.