Hi, I have NGS DNA sequence reads for two different plant species (A and B) in the same genera. I did sequence alignment of species A for its reference genome(G1) found for example 10000 SNPs. However, species B doesn't have a reference genome. Therefore, I selected a genome (G2), which is closely related to species B, and did the alignment and found 9500 SNPs. As cross-validation, I aligned the species A sequence reads to G2 genome. Apparently, G1 genome is twice bigger than the G2 genome. After alignment with species A reads to G2 genome, I found 30000 SPNs.

I appreciate if someone can help me to figure out the following questions.

Q1) why I get 30000 SNPs with G2 with species A? Q2) is this due to the random alignment of seq reads to a small genome? Q3) I got 30000 SNPs as the G1 and G2 are more evolutionary distance? Q4) or due to any other reasons?

As a general critic, the variant you are getting in G2 are wrong, even when the species is close, all calls are mostly no useful. Just compare A. thaliana and A. lyrata if you want to see how much impact it has.

Hi JC, Thanks for the reply. May be silly things to ask. You mentioned that the variants calling for G2 is not useful. Is it due to the small genome size compared to G1?

it's because is not the same species, even on plants with "similar" genome, there are many variants created by evol distance

