Question: How Can Different 3 Reading Frames Have Similarity For The Same Sequence Using Blastx?
2
gravatar for Worldalive
8.9 years ago by
Worldalive20
Worldalive20 wrote:

Hi, I have a DNA sequence ( it's about 388 bp), which I am comparing with GenBank sequences using Blastx. I understand that Blastx looks into all possible 6 reading frames when translating a DNA seq, but the outcome is puzzling me because it is showing that 3 different reading frames show similarity to the same protein (it's in a conserved region of a Peptidase M1 superfamily). Also, when I look closely at the alignments, the similarities ( in the 3 frames) occur within the same region. The similarity is approx 76% of maximum identity and an E-value of 2e-11 .

Is this "similarity" of my sequence, most likely due to chance?

There are 2 things that make me think this:

1) I am aware that my sequence is too short compared to the >1000bp of the M1 peptidase sequence in GenBank.

2) When I look at the reading frames of my translated sequence, there are stop codons spread throughout... or can this be due to errors in sequencing?

Thanks for any help!

alignment blast • 3.2k views
ADD COMMENTlink modified 8.8 years ago by Larry_Parnell16k • written 8.9 years ago by Worldalive20

Repeating this comment regarding use of BlastX with frame shift penalty(-w option): I've found an interesting discussion here. I wonder typically what frame shift penalty value(s) for BlastX can be generally used.

ADD REPLYlink modified 4 months ago by RamRS25k • written 8.9 years ago by Woa2.7k

I bet the 3 reading frames are in the same direction, right?

ADD REPLYlink written 8.8 years ago by Chris Evelo10.0k
4
gravatar for Ketil
8.9 years ago by
Ketil4.0k
Germany
Ketil4.0k wrote:

This is probably too obvious, but if it is a low complexity or repeat region, this could happen. Normally LCRs are masked by BLAST, but perhaps you were using -F F?

ADD COMMENTlink written 8.9 years ago by Ketil4.0k
2
gravatar for Marina Manrique
8.9 years ago by
Marina Manrique1.3k
Granada
Marina Manrique1.3k wrote:

Errors in sequencing can cause indels that change the reading frame. It's frequent that the same nucleotide sequence has several Blast high-scoring segment pairs (HSPs) in different reading frames with the same reference protein. I'd like to know if your sequence comes from a 454 experiment. The typical errors in 454 usually cause frameshifts that could explain your situation. It would be useful too to see the blast result you get

ADD COMMENTlink written 8.9 years ago by Marina Manrique1.3k
2
gravatar for Larry_Parnell
8.9 years ago by
Larry_Parnell16k
Boston, MA USA
Larry_Parnell16k wrote:

It is not just a low-complexity region that will give the result you describe, but any repetitive sequence. This becomes a problem when the repeat sequence is falsely incorporated into a gene model, thereby taking what should be annotated as a genomic repeat/low-complexity region and putting it into the protein database.

Try it yourself - take a human Alu sequence and run it against a protein db. I'm sure many of those hits are from bad gene models.

ADD COMMENTlink written 8.9 years ago by Larry_Parnell16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1045 users visited in the last hour