I've used a variants calling pipeline to produce variants vcf file from non-model organism sequencing data. The vcf file have good variant numbers as I predicted, however, using SnpEff for the prediction seems to gave inaccurate number of effects in ann.vcf file. I've followed the manual instruction to build SnpEff database using two different ways:
sequences.fa + genes.gff file (with no intron or intergenic regions).
sequences.fa + genes.gtf file that converted from the previous gff file using gffread tool.
Both ways produced inaccurate number of effects in ann.vcf, but the second way gave less warnings with much better results. I've read previous post about producing a gtf file with only the longest transcript which did't solve my problem.
Anyone can help me?
Thank you all