required annotations for VQSR
1
1
Entering edit mode
10 days ago
Mahdi&Christ ▴ 20

Hi, I have a sample that doesn’t have the required annotations for Variant Quality Score Recalibration (VQSR). Since I don’t have any other files to regenerate the VCF with these annotations, I’m wondering how I should proceed.

Is it possible to continue without applying VQSR and its recalibration? If yes, how should I filter this sample?

Also, besides hard-filtering, are there any other filtering methods I could use in this case?

Thanks a lot for your help

VQSR vcf annotation • 378 views
ADD COMMENT
4
Entering edit mode
10 days ago
BEST ▴ 40

Hi, you're in a common scenario when working with variants that lack the necessary annotations for Variant Quality Score Recalibration (VQSR). Here's a clear breakdown of your options:

Can you proceed without VQSR?**

Yes, you can absolutely proceed without VQSR. VQSR is ideal, especially for large datasets and high-quality cohorts, but it requires:

  • A large number of variants (typically >30 samples)
  • Known resources (e.g., HapMap, Omni, dbSNP)
  • Specific annotations (e.g., QD, MQ, FS, ReadPosRankSum, etc.)

If your sample doesn't have those annotations and you can't regenerate the VCF, then VQSR is off the table.


How should you filter your sample instead?

1. Hard Filtering (GATK Recommended)

For single samples or datasets lacking the necessary annotations/resources, GATK recommends hard filtering — applying fixed thresholds to annotations to separate true variants from artifacts.

Here is a typical hard-filtering thresholds for SNPs and indels:

For SNPs:

FILTER: QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0

For Indels:

FILTER: QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0

Use GATK VariantFiltration to apply these filters. Note: If your VCF lacks these annotations entirely (e.g., QD, FS, MQ), then you'll need to filter based on what's available (e.g., QUAL score and depth).


Alternative Filtering Methods

Besides hard filtering, here are some other options depending on your setup:

2. bcftools filter

You can apply custom expressions on fields like QUAL, DP, and AF:

bcftools filter -e 'QUAL < 30 || DP < 10' input.vcf -o filtered.vcf

3. Machine learning / ensemble filtering

If you're working with tools like DeepVariant, Octopus, or Strelka, they often include internal filtering or confidence scores which can be used.

4. Custom scoring

Some pipelines develop sample-specific filtering heuristics (e.g., based on known variant concordance, sample depth distribution, or variant allele frequency). This can be done in Python, R, or with tools like vcftools.

5. Use of population-level knowledge (if applicable)

If you're genotyping a sample from a well-characterized population (e.g., 1000 Genomes, gnomAD), you can cross-reference known variants to boost confidence.


Hope this helps friend

ADD COMMENT
1
Entering edit mode

I truly appreciate your clear and comprehensive explanation. I learned a lot of important and practical points.

ADD REPLY

Login before adding your answer.

Traffic: 3128 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6