I have a consensus genome created by incorporating only biallelic SNPs into the reference genome. I want to get the protein sequence of a particular gene from my consensus genome.
I tried using the reference gene CDS taken from NCBI through blastn to do this. I got a single hit, spanning multiple ranges. I concatenated all the aligned nucleotides from the consensus and tried to translate them but the reference protein is not found in its entirety in any frame. The reference protein is split across several frames, something I did not expect, because there are only SNPs present in the consensus.
Any idea why this is happening and solutions to get the protein sequence?