Question: What are the range or ceiling of metrics like DP4 , MQP, etc to filter variants?
0
gravatar for Dayna
17 months ago by
Dayna20
Dayna20 wrote:

Hi

Do you know what are the range of the following metrics? When it says bigger is better, I don't know the ceiling to decide, like if the maximum possible value is 1 then 0.9 is big, and if the maximum is 10 or 50 then 0.9 is low. I am sorry, I am a very beginner.

##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Mann-Whitney U test of Read Position Bias (bigger is better)">
##INFO=<ID=MQB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality Bias (bigger is better)">
##INFO=<ID=BQB,Number=1,Type=Float,Description="Mann-Whitney U test of Base Quality Bias (bigger is better)">
##INFO=<ID=MQSB,Number=1,Type=Float,Description="Mann-Whitney U test of Mapping Quality vs Strand Bias (bigger is better)">
##INFO=<ID=SGB,Number=1,Type=Float,Description="Segregation based metric.">
##INFO=<ID=MQ0F,Number=1,Type=Float,Description="Fraction of MQ0 reads (smaller is better)">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##INFO=<ID=ICB,Number=1,Type=Float,Description="Inbreeding Coefficient Binomial test (bigger is better)">
##INFO=<ID=HOB,Number=1,Type=Float,Description="Bias in the number of HOMs number (smaller is better)">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Average mapping quality">

Thanks

variant calling • 538 views
ADD COMMENTlink modified 17 months ago by Kevin Blighe48k • written 17 months ago by Dayna20

See the GATK recommendation for applying hard filters to a dataset.

ADD REPLYlink written 17 months ago by WouterDeCoster40k

Some values doesn't exist because this is not gatk pipeline, this samtools and bcftools, like dp4 ..etc. Yes, I understand gatk is better but as a start and for benchmarking, i need to start with samtools.

ADD REPLYlink written 17 months ago by Dayna20
1
gravatar for Kevin Blighe
17 months ago by
Kevin Blighe48k
Kevin Blighe48k wrote:

This is somewhat an open-ended question that could make for a philosophical debate regarding infinities, etc..

Generally, you could say the following:

Metrics that are based on depth of coverage or read depth:

  • min = 0
  • max = roughly the target depth of coverage of the sequence run (number of cycles)

Metrics that are based on probabilities (P values):

  • min = 0
  • max = 1

Regarding the first class of metric (i.e. depth of coverage or read depth), in order to make the analysis more streamline, a variant caller will generally only look at the first 500-1000 reads that it finds (which is biased, as I'm sure you're imagining right now).

Regarding the metrics based on probabilities, these may be represented as the negative log base 10 of the P value, i.e., Phred scores, in which case larger numbers signify a greater chance that we can shun the null hypothesis. The QUAL scores in a VCF, for example, are Phred-scores.

Kevin

ADD COMMENTlink modified 17 months ago • written 17 months ago by Kevin Blighe48k

Thanks Kevin a lot . But this seems fuzzy logic to me as a beginner, when I look at a number, and I can't even judge to discard or keep as no rule

ADD REPLYlink written 17 months ago by Dayna20
1

If you are a beginner at this but you have used a 'trusted' analysis pipeline to process the data, then (most likely), any variants that have failed a particular metric will have a value other than PASS in the FILTER column of the VCF.

ADD REPLYlink written 17 months ago by Kevin Blighe48k
1

that's really helpful Kevin, thank you a lot

ADD REPLYlink written 17 months ago by Dayna20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 886 users visited in the last hour