Understanding Exome Analysis
0
0
Entering edit mode
6.5 years ago
TP ▴ 10

Hi, Can someone explain Genotype quality as well as the other quality scores required for proper variant analysis of exome sequencing data? I have values like 4, 13, or even "-" all the the way up to 99. I've read the posts about it being Phred scaled and on a log scale but the numbers I see are not coherent with that explanation.

next-gen genome gene sequencing • 1.8k views
ADD COMMENT
0
Entering edit mode

Can you elaborate on the exact file type that you have, presumably VCF?

Not many of these scoring metrics make sense in bioinformatics because different programs interpret the meaning of the scores in different ways. A trusted source that relates the VCF format states the following:

GQ genotype quality, encoded as a phred quality -10log_10p(genotype call is wrong) (Numeric)

[source: http://www.internationalgenome.org/wiki/Analysis/vcf4.0/]

ADD REPLY
0
Entering edit mode

Hi. Yes I am using VCF files. I am trying to prioritize which variants to study further but get bogged down by how to prioritize and understand the quality of each variant. I have some variants that are very rare in the general population or absent from any database. The gene is in the family of genes related to my phenotype of interest. The sample's GQ's are 13, 33, 4, "-", 4, 15, 42, 27 for the 8 samples I have. It's parents, proband, and 5 unaffected siblings. The zygosity coincides with our hypothesis of recessive homozygous as well.

ADD REPLY
0
Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

ADD REPLY
0
Entering edit mode

From where did you obtain the VCFs? Where specifically are you looking at these GQs - in the FORMAT column, right?

I would not just look at GQ for deciding whether or not to include a variant in downstream analysis. The variant position should have good read-depth too, good allelic balance between ref and variant alleles, good mapping quality, etc. A lot of information goes into the decision of quality.

If you want to literally paste the line from your VCF here, then I can make my own interpretation.

ADD REPLY
0
Entering edit mode
Chr:Pos Ref/Alt Identifier  Primary Findings    Incidental Findings Gene Names  Sequence Ontology (Combined)    Gene Region (Combined)  Effect (Combined)   Transcript Name (Clinically Relevant)   Exon Number (Clinically Relevant)   HGVS c. (Clinically Relevant)   HGVS p. (Clinically Relevant)   Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Filter  Variant Allele Freq Allelic Depths (AD) Read Depths (DP)    Genotype Qualities (GQ) 0/1 Genotypes (GT)  Zygosity    Genotypes   Mendel Error    Zygosity    Ref/Alt Identifier  All AAF European American AAF   African American AAF    All MAF European American MAF   African American MAF    All Ref Allele Counts   All Alt Allele Counts   European American Ref Allele Counts European American Alt Allele Counts African American Ref Allele Counts  African American Alt Allele Counts  All HomoVar GTC All Het GTC European American HomoVar GTC   European American Het GTC   African American HomoVar GTC    African American Het GTC    Transcript Name Exon Number % Dist of Tx    Gene Name   Transcript Name Type    Transcript Type Strand  Product ID  Protein ID  GeneID  HGNC    MIM CCDS    HPRD    Summary of Product  LRG ID  Gene Names  Compound Het?   Inherited From  Has Compound Het?   Inherited from Father   Inherited from Mother   In Genes?   Ref/Alt Accession   RSID    Gene Names  Gene IDs    1000Genomes Allele frequencies  Common? HGVS g. Name    Clinical Allele Clinical Channels   Allele Origin   Clinical Significance   MedGen  SNOMED_CT   OMIM    Orphanet    Disease Name    ClinVar Review Status   HGVS g. Name (GRCh38)   HGVS c. Name    HGVS p. Name    UniProtKB   Citations   Allele Counts   Allele Frequencies  # Alleles   # Het   # HomoVar   Allele Counts   Allele Frequencies  # Alleles   # Het   # HomoVar   Unaffected - Allele Counts  Unaffected - Allele Frequencies Unaffected - # Alleles  Unaffected - # Het  Unaffected - # HomoVar  Affected - Allele Counts    Affected - Allele Frequencies   Affected - # Alleles    Affected - # Het    Affected - # HomoVar    Compound Het?   Inherited From  Has Compound Het?   Inherited from Father   Inherited from Mother   In dbSNP?   Ref/Alt Identifier  Flags   dbSNPBuildID    Variation Class (VC)    1kG Variant Frequencies (CAF)   Common?
13:52718058 -/T ?           NEK3    frameshift_variant  exon    LoF NM_001146099.1  10  NM_001146099.1:c.869_870insA    NP_001139571.1:p.Gln293fs   PASS    ?   ?,? 5   13  1/1 Homozygous Variant  T/T Transmitted Homozygous Variant  PASS    ?   ?,? 5   4   0/1 Heterozygous    -/T ?   Heterozygous    PASS    ?   ?,? 16  33  0/1 Heterozygous    -/T ?   Heterozygous    PASS    ?   ?,? 13  27  0/1 Heterozygous    -/T ?   Heterozygous    PASS    ?   ?,? 16  42  0/1 Heterozygous    -/T ?   Heterozygous    PASS    ?   ?,? 17  15  0/1 Heterozygous    -/T ?   Heterozygous    PASS    ?   ?,? 5   4   0/1 Heterozygous    -/T ?   Heterozygous    ?   ?   ?,? ?   ?   ./. ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   NM_001146099.1  10  59.1156 NEK3    NM_001146099.1  mRNA    mRNA    -   NP_001139571.1  NP_001139571.1  4752    7746    604044  CCDS53871.1 ?   serine/threonine-protein kinase Nek3 isoform b  ?   NEK3    False   NA  False   1   0   False   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   ?   8   0.5 16  6   1   8   0.5 16  6   1   6   0.428571    14  6   0   2   1   2   0   1   False   NA  False   0   0   False   ?   ?   ?   ?   ?   ?   ?
ADD REPLY
0
Entering edit mode

I'm not sure that pasted correctly. Let me know if you are able to see that when you open with excel.

ADD REPLY
0
Entering edit mode

Hi! That does not look like output from a VCF (?). It looks like annotation from Variant Effect Predictor (VEP), perhaps? This particular call is also an insertion of a T, it seems.

ADD REPLY
0
Entering edit mode

Yes I pasted the info from my variant analysis software. It's what I am using to analyse the variants. Yes it is an insertion of a T that causes a frameshift mutation.

ADD REPLY
0
Entering edit mode

It's difficult to look over the information because there's so much and I cannot align the columns. Take a look at Allelic Depths (AD), Read Depths (DP), and Filter. Firstly, if 'Filter' is PASS, then you can immediately have confidence that it's a good call. The Read Depth (DP) should then ideally be >18.

ADD REPLY
0
Entering edit mode

Thanks for your help. What would your opinion be regarding a variant found through denovo filtering that has <10 reads and bad quality but is a known disease gene for the phenotype? Would you validate it through resquencing; ignore it? Also can you suggest any good courses/lectures or other material that would provide the information I need to analyse exome data proficiently? I don't have any computer science background, and I am more interested in the gene hunting and biological end of it. However, I want to be able to understand well what I am doing when working with variant analysis software.

ADD REPLY
0
Entering edit mode

Hello, I would definitely recommend the confirming of such a variant through, for example, Sanger sequencing. With NGS, sometimes even very high quality variants can be false-positives for different reasons.

We would function very well together because my main interest is in the analysis of NGS data to a very high standard and also in better understanding variant calls (their likely effects).

With regards to exome analysis, generally, people have their own opinions on what is and what is not important for filtering variants. I have actually written a review manuscript with various colleagues that, in one section, specifically addresses these issues. I wish that I could share the material here but, as it's currently under peer review, I cannot.

That said, good places to start include:

These are all citations in my review. I know that this is already a lot but these are all very interesting reads, as is my review (according to those who've read it)! Aside from all of this, there are also upward of 60 in silico tools in existence that attempt to predict the likely damaging effects of variants. I'm currently working on that particular area in one of my projects with a clinically-oriented colleague (like yourself).

ADD REPLY
1
Entering edit mode

Thank you this is great. I will definitely look these over, and please let me know when you paper is out.

ADD REPLY

Login before adding your answer.

Traffic: 1861 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6