Question

My translated A.A seq longer than my original Nucleotide seq!! Could someone explain the results?

0

Entering edit mode

7.0 years ago

zaineb • 0

Hello everyone!

I am in the process of identifying an uncharacterized protein, when I convert my nucleotide sequence to Amino Acid using blastx, I get a long Amino Acid sequence. I am not sure if I am doing something wrong or missing a step or there is a biological explanation for it that I don't know of.

Nucleotide sequence / Accession Number X79334 region 333-548

GTTATTGTGTTCGCCGTTTTGCTGACGGCTTCTTGTCTGATGGTCTCCTTTGCCAACAGCTTTACGCTGC
TATTGCTGGACCGCGCCTGTCTTGGGTTGGCGCTGGACGGATTCTGGGCGATGTCGGCGTCGCTGACCAT
GCGACTGGTTCCCGCGCGTACCGTGCCGAAAGCGCTGTCGGTGATTTTTGGCGCGGTCTCCATCGCGTTA
GTGATC

Amino Acid Sequence / Protein accession number ERH37120.1

MNENIAEKFRADGVARPNWSAVFAVAFCVACLITVEFLPVSLLTPMAQDLGISEGIAGQSVTVTAFVAMF
SSLFITQIIQATDRRYIVILFAVLLTASCLMVSFANSFTLLLLGRACLGLALGGFWAMSASLTMRLVPAR
TVPKALSVIFGAVSIALVIAAPLGSFFGCXGISWSGRXALRPSAVMGVLC

I am new to Bioinformatics, but I am fascinated by this developing field. I was researching my question and came across this amazing website. I hope someone can help.

blast • 1.7k views

ADD COMMENT • link updated 7.0 years ago by Michael 54k • written 7.0 years ago by zaineb • 0

score 3 · Accepted Answer · 2017-04-28

3

Entering edit mode

7.0 years ago

Chris Miller 22k

The search is returning the entire protein's sequence, just part of which matches your nucleotide sequence. If you translate your sequence using something like this site: http://web.expasy.org/translate/

You'll see that part of the sequence (starting with VSFANSFT...) is present in the aa sequence you pasted.

ADD COMMENT • link 7.0 years ago by Chris Miller 22k

0

Entering edit mode

Thank you for your prompt response! I used expasy website. interestingly when I run PSI-BLAST for the sequence obtained from expasy website, which is "MVSFANSFTLLLLDRACLGLALDGFWAMSASLTMRLVPARTVPKALSVIFGAVSIALVI" I get the same hypothetical protein as the one from blastx which is "MNENIAEKFRADGVARPNWSAVFAVAFCVACLITVEFLPVSLLTPMAQDLGISEGIAGQSVTVTAFVAMF SSLFITQIIQATDRRYIVILFAVLLTASCLMVSFANSFTLLLLGRACLGLALGGFWAMSASLTMRLVPAR TVPKALSVIFGAVSIALVIAAPLGSFFGCXGISWSGRXALRPSAVMGVLC"

I don't understand how do I get the same result from two different Amino acid sequences!

Thank you again

ADD REPLY • link 7.0 years ago by zaineb • 0

1

Entering edit mode

You are not using 2 different AA sequences

Run a multiple sequence alignment on the AA input and output sequences using Clustal Omega

You will see the comparison between all your sequences...that might give you a better understanding.

More importantly....you may not want to focus on hypothetical proteins...these are not validated experimentally....and may or may not exist in reality...unless you have just produced it in the lab of course!

ADD REPLY • link 7.0 years ago by BioinfGuru ★ 1.7k

0

Entering edit mode

Thank you for the suggestion. Multiple alignment was a great tool for clarification

ADD REPLY • link 7.0 years ago by zaineb • 0