I am working with RNA-seq of a organism (Plasmodium) that does not have reference genome. Which is readily available is a genome of a very related species. So I willl use the reads and the reference genome for determining the SNPs and differences between these two species. The problem I am considering is that since I am working with RNA-seq not all the genes will be expressed and in most of the cases some genes (like the one in the middle --- picture) would get zero SNPs just because they are not expressed or to low coverage. If there are accumulation of such genes it might look like that there are no SNP in those genes but in fact I just don't know.
What is the proper way of dealing with this issue? Maybe to choose a threshold? and in this case...how to decide which one?
Thank you so much in advance