A Newbie'S (Non-Bioinformatician) Question On The "Interpretation" Of The Exome Seq Data
1
3
Entering edit mode
11.2 years ago

New to this Biostar site and thus please accept my apology if the following question is considered "inappropriate" for this forum.

Not sure if the question is more genetics or bioinformatics (or neither) - Did a lot of googling but still could not figure it out Possibility - (a) my misunderstanding of the definition of AD (allelic depths for the ref and alt alleles in the orders listed) (b) my misunderstanding of the definition of GT (genotype where 0=ref, and 1=first alt allele) (c) others

I have a trio exome seq data. One of the SNPs is listed below -

REF    ALT    QUAL    FORMAT            Child                            Father                         Mother
A    G    583.14    GT:AD:DP:GQ:PL    1/1:208,34:244:14.96:118,15,0    0/1:186,51:241:8.72:80,0,9    0/0:226,1:236:3.01:0,3,30

The child's GT is 1/1. This means the child's alleles are G/G. However, the child's AD is 208,34, based on the "definition", it means to me that the child has more ref allele (A) count than alt allele (G) count. So does AD 208,34 means the child's GT should be A/G ? In contrast, the Dad's GT is 0/1. So he has A/G, which is consistent with AD 186,51. The mother's GT is 0/0. This means she has A/A. Her AD is 226,1. To me, this AD and GT are still consistent and they both mean she has a A/A alleles (the 1 count for G might be artifact).

Can anyone please help clarify this obvious discrepancy between child's GT and parents' GT? Any insight will be greatly appreciated (hope the question is relevant enough to this forum that it would not be closed)

Great many thanks

exome • 2.4k views
ADD COMMENT
6
Entering edit mode
11.2 years ago

My bet is you've got some lousy reads there. First of all, you've got allelic imbalance for all three members of the trio: Child: 208, 34; Father 186, 51; Mother: 226,1. And your GQ scores are pretty low across all members of the trio: Child: 14.96; Father: 8.72; Mother: 3.01. In my experience those kind of scores often don't confirm with Sanger, and you've likely got some artifact going on.

ADD COMMENT
0
Entering edit mode

I've also seen something similar in locations that match a region of a pseudogene.

ADD REPLY
0
Entering edit mode

Agreed, the GQ and PL scores give me no confidence that those genotypes will stand up to closer scrutiny. You might want to revisit your filtering processes for screening variants.

ADD REPLY
0
Entering edit mode

Great many thanks. Yeah, I will need to "clean up" the reads a bit and remove those that are not called with high confidence based on GQ and PL.

ADD REPLY

Login before adding your answer.

Traffic: 2410 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6