I am in the process of analyzing the results of a GWAS with Drosophila Melanogaster and I've been finding discrepancies between variants identified by Ensembl's Variant Effect Predictor and what I find when I look at the same location in a genome browser (specifically Integrated Genome Viewer). Some of the variants do not show up at all whereas others do not have the effect predicted by VEP. For example, at one location VEP states that there will be a frameshift variant caused by the insertion of 2 bases (CAA/CAGCA). However, when looking into this location in IGV I find that there isn't a frameshift but rather a point mutation that changes CAA to CAG (the next two bases in the genome are CA). At other locations, it seems that VEP is using the wrong sequence entirely. For example, the VEP output claims that a point mutation will change ATT to GTT but IGV shows that instead there is a change from AAT to AAC; Neither the starting sequence nor the mutation match. My reads are aligned with the most recent Drosophila Melanogaster BDGP6 build genome. I have made sure that both VEP and IGV are running on the same genome. If anyone has any ideas as to why I may be encountering these issues I'd be grateful for the help.
Hello and welcome ijk8qd,
maybe I miss something. But the input for VEP is a variant list/file like vcf, isn't it? So you've done variant calling on your bam file before. The discrepancy than have nothing to do with VEP but with your variant caller.
The interesting question are now:
- How did you do the variant calling
- How does the vcf entry looks like for these position
- A screenshot of IGV would be useful as well