Entering edit mode
6.4 years ago
TP
▴
10
Hi, Can someone explain Genotype quality as well as the other quality scores required for proper variant analysis of exome sequencing data? I have values like 4, 13, or even "-" all the the way up to 99. I've read the posts about it being Phred scaled and on a log scale but the numbers I see are not coherent with that explanation.
Can you elaborate on the exact file type that you have, presumably VCF?
Not many of these scoring metrics make sense in bioinformatics because different programs interpret the meaning of the scores in different ways. A trusted source that relates the VCF format states the following:
[source: http://www.internationalgenome.org/wiki/Analysis/vcf4.0/]
Hi. Yes I am using VCF files. I am trying to prioritize which variants to study further but get bogged down by how to prioritize and understand the quality of each variant. I have some variants that are very rare in the general population or absent from any database. The gene is in the family of genes related to my phenotype of interest. The sample's GQ's are 13, 33, 4, "-", 4, 15, 42, 27 for the 8 samples I have. It's parents, proband, and 5 unaffected siblings. The zygosity coincides with our hypothesis of recessive homozygous as well.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.From where did you obtain the VCFs? Where specifically are you looking at these GQs - in the FORMAT column, right?
I would not just look at GQ for deciding whether or not to include a variant in downstream analysis. The variant position should have good read-depth too, good allelic balance between ref and variant alleles, good mapping quality, etc. A lot of information goes into the decision of quality.
If you want to literally paste the line from your VCF here, then I can make my own interpretation.
I'm not sure that pasted correctly. Let me know if you are able to see that when you open with excel.
Hi! That does not look like output from a VCF (?). It looks like annotation from Variant Effect Predictor (VEP), perhaps? This particular call is also an insertion of a T, it seems.
Yes I pasted the info from my variant analysis software. It's what I am using to analyse the variants. Yes it is an insertion of a T that causes a frameshift mutation.
It's difficult to look over the information because there's so much and I cannot align the columns. Take a look at Allelic Depths (AD), Read Depths (DP), and Filter. Firstly, if 'Filter' is PASS, then you can immediately have confidence that it's a good call. The Read Depth (DP) should then ideally be >18.
Thanks for your help. What would your opinion be regarding a variant found through denovo filtering that has <10 reads and bad quality but is a known disease gene for the phenotype? Would you validate it through resquencing; ignore it? Also can you suggest any good courses/lectures or other material that would provide the information I need to analyse exome data proficiently? I don't have any computer science background, and I am more interested in the gene hunting and biological end of it. However, I want to be able to understand well what I am doing when working with variant analysis software.
Hello, I would definitely recommend the confirming of such a variant through, for example, Sanger sequencing. With NGS, sometimes even very high quality variants can be false-positives for different reasons.
We would function very well together because my main interest is in the analysis of NGS data to a very high standard and also in better understanding variant calls (their likely effects).
With regards to exome analysis, generally, people have their own opinions on what is and what is not important for filtering variants. I have actually written a review manuscript with various colleagues that, in one section, specifically addresses these issues. I wish that I could share the material here but, as it's currently under peer review, I cannot.
That said, good places to start include:
These are all citations in my review. I know that this is already a lot but these are all very interesting reads, as is my review (according to those who've read it)! Aside from all of this, there are also upward of 60 in silico tools in existence that attempt to predict the likely damaging effects of variants. I'm currently working on that particular area in one of my projects with a clinically-oriented colleague (like yourself).
Thank you this is great. I will definitely look these over, and please let me know when you paper is out.