How can I know what is a good and bad variant call?
2
2
Entering edit mode
2.4 years ago
vaay ▴ 20

Hi, this is my first post/question, sorry if it is not allowed.

I am generating variant calls with a variant caller. The variant caller I am using is bcftools. I am analyzing/looking the VCF in Excel and the BAM in IGV.

But I would like to know which is the most reliable and the least reliable variant call. That is, what information from my results could help me to know.

If I should go by some information generated (numbers?/formats) in my VCF file or something I should visually perceive from my alignment (BAM file) in IGV, or both?

I am very confused about this, I hope you can understand my question/doubt and help me.

Thank you.

variantcalls variantcaller bcftools • 955 views
ADD COMMENT
1
Entering edit mode
2.4 years ago
lethalfang ▴ 140

First of all, bcftools is probably not considered a modern variant caller, and you may try something that's widely used nowadays like GATK or DeepVariant.

When you look at the IGV, there are a number of indicators that the variant call is a false positive just on top of my head, e.g.,

1) there are a bunch indels, structural variants (marked by soft-clipped reads), or other mismatches right next to the variant position,

2) all the reads supporting variants are biased in some ways when compared to the reads that support reference call, e.g., a) they are all forward or all reverse reads, or b) they all have low mapping qualities, or c) the reads all start/end at the exact same spot, etc.

3) the reference sequence shows very low complexity, e.g. ACACACACACACACACAC

4) sequencing depths multiple times the average of your data.

Some of those information should be in the VCF file.

ADD COMMENT
1
Entering edit mode
2.4 years ago

The variant caller assigns a phred-based QUAL score using whatever internal heuristics they choose. QUAL = 20 means there is 99% probability that there is a variant at the site.

There is still considerable insight to be gained from looking at IGV. DeepVariant is trained to identify the type of jumbled pileups that would not be reflected in a QUAL score.

ADD COMMENT

Login before adding your answer.

Traffic: 1702 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6