I am looking at a file VCF from whole exome sequencing of a male.
On chrX I find several variants with AC=1, AF=0.5, and AN=2.
How should I consider these variants?
I am looking at a file VCF from whole exome sequencing of a male.
On chrX I find several variants with AC=1, AF=0.5, and AN=2.
How should I consider these variants?
Do you have copy number information for this sample? In my cancer samples, it is common to have X duplications in XY-originating tumors that can cause something like this, which is why I do additional downstream analysis to consider copy number context in my variant calling.
I have a conflict of interest because this is my lab's software, but TitanCNA is a tool I use on a weekly basis for identifying variants using SNPs (from GATK or other variant callers) and copy number context (from ichorCNA) together: https://github.com/gavinha/TitanCNA
AN is the number of alleles at that position, and AF is the allele frequency, so I would interpret your data as meaning that your subject is heterozygous at that position. See the VCF Specification linked below:
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It might be useful to plot some quality metrics for these SNPs in comparison with other non-heterozygous SNPs to see if they are aberrant in some way.
Do they by any chance fall within the pseudoautosomal regions of the chromosome?
How was the VCF called? With GATK? By default, GATK has no notion of sexual chromosome
I have not generated the VCF myself, but yes, it has been called with GATK 3.7. What does it mean? Why do I find some variants on ChrX that have AN=2 and AC=1 (an example below), while the remaining variants on ChrX have AN=2 and AC=2 (which would make more sense)?
chrX 13618410 . T C 52.78 PASS AC=1;AF=0.5;AN=2
a BAM is not perfect. If there is any region with a lot of mismatches (bad reads, duplicated regions, low complexity region...) , GATK will do is best to call the variant as diploid . See the FORMAT/AD field to get more information about the calls.
Again GATK treats all chromosomes as autosomal+diploid.
I have 248 calls on Chr X with AF = 0.5 (out of 1595 total calls on X). If we look at the FORMAT/AD field of ChrX with AF = 0.5, we see that both the alleles have been called. I paste here three calls, as an example, including the format field.
chrX 118605047.00 . G A 597.77 PASS AC=1 AF=0.5 AN=2 GT:AD:DP:GQ:PGT:PID:PL 0/1:66,17:83:99:0|1:118605001_G_T:626,0,3932
chrX 118605051 . A G 486.77 PASS AC=1 AF=0.5 AN=2 GT:AD:DP:GQ:PGT:PID:PL 0/1:65,15:80:99:0|1:118605001_G_T:515,0,3735
chrX 118605056 . C T 375.77 PASS AC=1 AF=0.5 AN=2 MLEAC=1 GT:AD:DP:GQ:PGT:PID:PL 0/1:60,12:72:99:0|1:118605001_G_T:404,0,3534
66,17 65,15 60,12
we expect 50%,50% for a nice Het... for 6 REF you have only one ALT.
So, can I consider calls like these of good quality and assume the REF allele?
Are these variants in the pseudoautosomal region?
My calls for chrX are all in the region between PAR1 and PAR2, so they are not in the pseudoautosomal region.