4.4 years ago by
United States
Ref.fa is my own assembled genome and align.bam is reads to map file of my own reads to reference genome (downloaded from NCBI).
This doesn't make sense, and I'm assuming the error lies here. It's unclear what you did. You made a reference genome, but then you took those same reads and mapped them back to the reference itself? You would expect an extremely low number of variants (even less if you've masked or included hets) because your sample is literally your reference and variants are only calculated relative to a reference. The -M argument (output sites where the reference is masked) argument would help alleviate this.
Alternatively, I don't understand what you downloaded from NCBI. If you or others used an additional reference to generate the BAM and then are trying to use your own reference at other points, there will be, without appropriate handling, a mismatch between the contig names in the header and the reference itself. I have a vague memory of trying something years ago using bcftools variant calling; I was provided a BAM and reference with different contig names and the outputs were empty. I can't imagine this would be the case now using htslib, but it's something to check.
As far as other approaches, the GATK or freebayes works well for non-model systems. You might need to tweak some filtering parameters depending on your dataset.
Have you looked at the data in IGV to see where there should be a couple variants? Have you played around with thresholds to see if one of them needs to be tweaked for your data?