Two simillarly annotated sequence has no alignment similarity. Why?
0
0
Entering edit mode
5.2 years ago
Farbod ★ 3.3k

Dear Biostars, Hi (not English. So, be ready for some language flaws)

I have two sequences (from de novo RNA-seq assembly), after blastN (and also blastX), they show similar results and annotations (vasotocin related),

but, when I use NCBI online "Align two or more sequences " ; the answer is: "No significant similarity found".

Why is that? My ssumption is this that as they show the same annotation and same protein products, there should be some similarity (I even hoped for exact 100% match!). Am I wrong?

Thanks

NOTE: my 2 SEQs:

>seq1

TCTGGGGAGCCCACGTAGCAGCCCATCCCTTCCCCACAGCAGATACTGGGGCCAAAGCAG
AGGCCCCGGTTTCCGGGGCCACATGACATGCACGGTCTTTGCAGATCAGGAAAAGAGCGC
TTTCCGCCTCGCGGACAGTTCTGGATGTAGCATGCAGATGAGAGCGCGAGGAGCCCCAGG
ACGCACAGAAGTGGAATAGTAGAATCTGGCATCTTGTTCCGTCTAAAGTGCTTCGTCCAA
TTTGTACTGTAGGCTACGGCTACAACTGAACTCCTTCAAATGGCTCGT

>seq2

GTGGTTGGTTACTGAGGTCTCCCTCTGCTGGTGGCATGTAGGATCCGCAGCAGGTCTCCT
GCCAAACCACCCATTAAGGCAGCGTTCTGTTCGCTGGGTGACTGACGTTTACTGTCCTCT
AGGCAGTCTGGGTCTAGCACACAACTCTCTGAGTCACAGCAGACTCCGGATGCAGCACAG
CTTCCCTCAGAGCCACACACTCTTCCTCCAGCCTCGCAGGGGGAGGGCAGGTAGTTCTCC

blast alignment gene • 1.7k views
1
Entering edit mode

Are these full length sequences? If not you may be looking at two different parts of two sequences that code for protein you refer to.

0
Entering edit mode

Hi @genomax, maybe I did not understand your answer clearly. these are the Trinity transcripts that had Blast hit with Vasotocin. they are complete as Trinity assembly output and they have completely different IDs (meaning that are different genes or loci)

2
Entering edit mode

I am not sure if having two different ID's in Trinity can be considered as evidence that they are complete and are different genes. Here is a clustalO alignment of the two. Ideally we should do an alignment of the translations.

CLUSTAL O(1.2.4) multiple sequence alignment

seq1      TCTGGGGAGCCCACGTAGCAGCCCATCCCTTCCCCACAGCAGATACTGGGGCCAAAGCAG
seq2      -GTG-GTTGGTTACTGAGGTCTCC----------CTC---TGCTGGTGGCATGTAGGATC
** *  *   **  **    **          * *    * *  ***     * *

seq1      AGGCCCCGGTTTCCGGGGCCACATGACATGCACGGTCTTTGCAGAT--CAGGAAAAGAGC
seq2      CGCAGCAGGTCTCCTGCCAAACCACCCATTAAGGCAGCGTTCTGTTCGCTGGGTGACTGA
*   * *** *** *    **    ***  * *     * * * *  * **   *  *

seq1      GCTTTCCGCCTCGCGGACAGTTCTGGATGTAGCATGCAGATGAG--AGCGCGAGGAGCCC
seq2      CGTTTACTGTCCTCTAGGCAGTCTGGGTCTAGCACACAACTCTCTGAGTCACAGCAGACT
*** *    * *       ***** * *****  **  *     **    ** ** *

seq1      CAGGACGCACAGAAGTGGAATAGTAGAATCTGGCATCTTGTTCCGTCTAAAGTGCTTCGT
seq2      CCGGATGCAGCACA------------------GCTTCCCTCAGAGCCACACACTCTTCCT
* *** ***    *                  ** **       * *  *    **** *

seq1      CCAATTTGTACTGTAGGCTACGGCTACAACTGAACTCCTTCAAATGGCTCGT
seq2      CCAGCCTCGCAGGGGGAGGGCAGGTAGTTCTCC-------------------
***   *     *  *    * * **   **

0
Entering edit mode

Oh, thanks for your efforts and fast support!

Does it tell that NCBI is correct is showing "no similarity"? or it is showing that there is some similarity?

1
Entering edit mode

I took one of the common hits (from individual blast searches from those two sequences) and aligned (Oncorhynchus kisutch vasotocin-neurophysin VT 1.) One would need to spend some time on this. You ideally should a similar exercise with translations and a common protein blast hit.

CLUSTAL O(1.2.4) multiple sequence alignment

seq1                ------------------------------------------------------------
seq2                ------------------------------------------------------------
XM_020465836.1      AATACCGGAAAGTTCCTAGCAGACATTCGAAAAGAAAAACCGAGCCCTTTGAAAGAGTTC

seq1                ------------------------------------------------------------
seq2                ------------------------------------------------------------
XM_020465836.1      AGTTGTAGCCGACAGTATCAATTGGACGAAGCACTTCAGACTGAACAAGATGCCATATTC

seq1                --------------------TCTGGGGAGCCCACGTAGCAGCCCATCCCTTCCCCACAGC
seq2                ------------------------------------------------------------
XM_020465836.1      TACGTTTCCACTGCTGTGGGTCCTGGGGCTCCTCGCGCTATCCT--CCGCGTGCTACATC

seq1                AGATACTGGGGCCAAAGCAG-AGGCCCCGGTTTCCGGGG-CCACATGACATGCACGGTCT
seq2                ------------------------------------------------------------
XM_020465836.1      CAGAACTGTCCGCGAGGCGGGAAGCGCTCTTTTCCTGATCTTCCACGACAGTGCATGTCG

seq1                TT-----------------------------------------G--CAGATCAGGAAAAG
seq2                ---------------------------------------------------------GT-
XM_020465836.1      TGTGGCCCCGGGGACAGGGGCCGCTGCTTTGGCCCCAATATCTGCTGTGGGGAGGGAATG

seq1                AGCGCTTTCCGCCTCGCGGACAGTTCTGGATGTAGCATGCAGATGAGAGCG-CGAGGAGC
seq2                GGTTGGTTACTGAGGTCTCCCTCTGCTGGTGGCATGTAGGATCCGCAGCAGGTCTCCTGC
XM_020465836.1      GGCTGTTACATGGGCTCCCCAGAGGCAGCTGGTTGTGTGGAGGAGAACTACCTGCCCTCC
*    *         *        * *   *      * *   *              *

seq1                CCCAGGACGCACAGAAGTGGAATAGTAGAATCTGGCATCTTGTTCCGTCTAAAGTGCTTC
seq2                CAAACCACCCATTAAGGCAGCGTTCTGTTCGCTG----------GGTGA-----------
XM_020465836.1      CCCTGCGAGGCTGGAGGAAGAGTGTGTGGCTCTG----------AGGGAAGCTGTGCTGC
*             * *  *  *        ***

seq1                GTCCAATTTGTACTGTAGGCTAC---------GGCTACAACTGAACTCCT----------
seq2                --CTGACGTTTACTGTCCTCTAGGCAGTCTGGGTCTAGCACACAACTCTCTGAGTCACAG
XM_020465836.1      ATCCGGAGTCTGCTGTGACTCAGAGAGTTGTGCGCTAGACCCAGACTGCCTAGAGGACAG
*     * * ****     *            ***   *   ***

seq1                ------------------------------------------------------------
seq2                ------------------------------------------------------------
XM_020465836.1      TAAACGTCAGTCACCCAGCGAACAGAACGCTGCCTTAATGGGTGGTTTGGCAGGAGACCT

seq1                ---TCAAATGGCTCGT--------------------------------------------
seq2                ----------------------------------------------CAGACTCCGGAT--
XM_020465836.1      GCTGCAGATCCTACATGCCACCAGCAGAGGGAGACCTCAGTAACCAACCACTGCCCATCC

seq1                ------------------------------------------------------------
seq2                ----------------------GCAGCACAGCTTCCCTCA--------------GAGCCA
XM_020465836.1      CTCACCTGAACACACCCAGAATAGAGCTTAAATTCACCATTTCACATGCACTACTACAAA

seq1                ------------------------------------------------------------
seq2                CACACTCTTCCTCCAGCCTCGCAGGGGGAGG-------------GCAGGT----------
XM_020465836.1      AACAAACCTCACACAGATTCACAGACACACAGCAGAAGTAGAGAGCAGGCTTGCTACATA

seq1                ------------------------------------------
seq2                ---------------AGTTCTCC-------------------
XM_020465836.1      AGGGGGAAATTTATCAGCTCTACATGAATGTTTACTGTGTGC

0
Entering edit mode

Oh cool, we did the same >.<

0
Entering edit mode

You are telling me that the "tail" of a transcripts code for vasotocin and the "head" of another transcripts code for vasotocin, too. And I am looking at that "tail" and "head" -AND- these "tail" and "head" that code for the same thing, has no sequence similarity?

Am I getting your point correctly?

1
Entering edit mode

See mine (and @Wouter's) new answers. It could be simple like that but would need you to look at this carefully.

1
Entering edit mode

I just did standard blast for both sequences and find this:

So both sequences indeed have a hit on the same gene.

Running clustal omega for the identified gene and your two sequences looks like this:

Looks like they both belong to the same gene, but to different parts (partially overlapping?).
I'm not sure what's the best conclusion for this.

1
Entering edit mode

We did a similar exercise but with two different hits :)

There is some kind of shared domain/site but would need @Farbod to spend time looking at it more closely.

0
Entering edit mode

Thank you @WouterDeCoster, but how? translating the nucleotide in Expassy and align the proteins, for example?

1
Entering edit mode

That would be a start. Translate into all 6 frames. You may need to try all to see which works best with alignments to common protein hits. Q07662.1 and P16041.1 look like good candidates. They are from swissprot.

1
Entering edit mode

Final _1/_2 refer to seq1/seq2. (had to split in two posts).

CLUSTAL O(1.2.4) multiple sequence alignment

5'3'_Frame_3_1      XWGAHVAAHPFPTADTGAKAEAPVSGAT-HARS---------------------------
3'5'_Frame_3_1      --------EPFEGVQL-P-PTVQIGRSTLDGTRC---------------QILLFHFCASW
3'5'_Frame_3_2      ------------------------------RTTCPPPARL--EEEC--------------
3'5'_Frame_1_1      -------------------------TSHLKEFSCSRSLQYKLDEAL-TEQDARFYYSTSV
5'3'_Frame_2_2      ----------------------------------------------------------XW
5'3'_Frame_1_1      ------------------------------------------------------------
5'3'_Frame_3_2      ------------------------------------------------------------
5'3'_Frame_2_1      ------------------------------------------------------------
5'3'_Frame_1_2      ------------------------------------------------------------
3'5'_Frame_1_2      -------------------------------------------------GELPALPLRGW
P16041.1            ------------------------------------------------------------
3'5'_Frame_2_2      ------------------------------------------------------------
*

5'3'_Frame_3_1      ------LQ--IRKRALSASRTVLD------------------------------VACR--
3'5'_Frame_3_1      GSSRSHLH--ATSRTVREAESALF------------------------------LICKD-
3'5'_Frame_3_2      ----VALREAVLHPESAVTQRVVC-TQTA-RTVNVSHPANRTLP-WV----VWQETCCGS
3'5'_Frame_1_1      RPGAPRALICMLHPELSARRKALFS---SAKTVHVMWPRKPGPLLWPQYLL-WGRDGLLR
5'3'_Frame_2_2      ------------------------L--VT-----------EVSLCWWHVG---SAAGLLP
5'3'_Frame_1_1      ------------------------------------------------------------
5'3'_Frame_3_2      ------------------------------------------------------------
5'3'_Frame_2_1      ------------------------------------------------------------
5'3'_Frame_1_2      ----------------------------------XVVGY-GLPL---------LVACRIR
3'5'_Frame_1_2      ------------------------R--KSV--------WL-GKLCCIRSLL-LRELCARP
3'5'_Frame_2_1      ------------------------T--KHF--RRNKMPDSTIPLLCVLGLLALSSACYIQ
P16041.1            ------------------------------------MPYSTFPLLWVLGLLALSSACYIQ
3'5'_Frame_2_2      ------------------------------------------------------------

5'3'_Frame_3_1      ----------------------------------------------EREEPQDAQKWNS-
3'5'_Frame_3_1      ------------------------------------------------------------
3'5'_Frame_3_2      YMPPAEGDLSNQPXX---------------------------------------------
3'5'_Frame_1_1      GLPRX-------------------------------------------------------
5'3'_Frame_2_2      NHPL--------RQRSVRWVTDVYCPLGSLGLAHNSLSHSRL--RMQHSFPQSHTL----
5'3'_Frame_1_1      --XSGEPT-QPIPSPQQI------LGPKQRPRFPGPHDMHGLCRSGKERFPPRGQFWM-H
5'3'_Frame_3_2      -------------------------------------------XGWLLRSPSAGGM-D--
5'3'_Frame_1_2      SR------SPAKPPIKAA-----FCSLGD-RLLSSRQSGSSTQLSESQQTPDAAQLPS--
3'5'_Frame_1_2      RLPRGQ-TSVTQRTERCL-----NGWFGRRP---AA------------------------
3'5'_Frame_2_1      NCPRGGKRSFPDLQRPCM-----SCGPGNRGLCFGPSICCGEGMGCYVGSPXX-------
P16041.1            NCPRGGKRSFPDLPRQCM-----SCGPGDRGRCFGPNICCGEGMGCYMGSPEAAGCV---
3'5'_Frame_2_2      ------------------------------------------------------------

5'3'_Frame_3_1      -RIWHLVP----------------------------------SKVLRPICTVGYGYN-TP
3'5'_Frame_3_1      -RACHVAPETG-------------------------------ASALAPVSAVGKGWAATW
3'5'_Frame_3_2      ------------------------------------------------------------
3'5'_Frame_1_1      ------------------------------------------------------------
5'3'_Frame_2_2      ----FLQPRRGRAGSS--------------------------------------------
5'3'_Frame_2_1      -QMRAR----GAPGRTEVE--NLASCSV-SASSNLYCR--LRLQL---------------
5'3'_Frame_1_2      -EPHTLPPASQGEGR-F---S---------------------------------------
3'5'_Frame_1_2      ----------------------------------------------DPTCHQQRETS-VT
3'5'_Frame_2_1      ------------------------------------------------------------
P16041.1            -EENYLPSPCEAGGRVC---GSEGSCA----ASGVCCD--SESCVLDPDCLEDSKRQ-SP
3'5'_Frame_2_2      --ENYLPSPCEAGGRVC---GSEGSCA----ASGVCCD--SESCVLDPDCLEDSKRQ-SP
*

0
Entering edit mode
5'3'_Frame_3_1      SNGS-----------------------------
3'5'_Frame_3_1      APQXX----------------------------
3'5'_Frame_3_2      ---------------------------------
3'5'_Frame_1_1      ---------------------------------
5'3'_Frame_2_2      ---------------------------------
5'3'_Frame_1_1      --------ATATTELLQMAR-------------
5'3'_Frame_3_2      S--SSSLAGGGQ----VVL--------------
5'3'_Frame_2_1      -------------NSFKWL--------------
5'3'_Frame_1_2      ---------------------------------
3'5'_Frame_1_2      NHX------------------------------
3'5'_Frame_2_1      ---------------------------------
P16041.1            SEQNAALMGGLAGDLLRILHATSRGRPQ-----
3'5'_Frame_2_2      SEQNAALMGGLAGDLLRILHATSRGRPQ-PTXX
*

1
Entering edit mode

One other thing,

teleost has experienced Whole Genome Duplication event (in salmonids it is even more than one WGD), does it has any effect on this situation? are we encountering paralogous genes or duplicated genes or some other evolutionary phenomenon instead of watching different parts of a LONG gene?

0
Entering edit mode

Oh, I guess that's certainly possible. Also possible that one of the copies starts acquiring mutations and is functionally no longer doing the same as the other copy.

0
Entering edit mode

This is part2 right? After upvoting your order is gone :)

0
Entering edit mode

I really appreciate that.

0
Entering edit mode

thank you, it is not the same thing and both of your helps has unique taste for me.

So, I would ask same thing that I have asked from @genomax:

You are telling me that the "tail" of a transcripts code for vasotocin and the "head" of another transcripts code for vasotocin, too. And I am looking at that "tail" and "head" -AND- these "tail" and "head" that code for the same thing, has no sequence similarity?

Am I getting your point correctly?

0
Entering edit mode

To be honest, I have no idea. I just did a blast and an alignment to see what came up.

0
Entering edit mode

[OFFTOPIC]

Hi Farbod,

Long time no see!
By the way, I think your English is good enough to remove that warning ;-)

Cheers,
Wouter

0
Entering edit mode

[OFFTOPIC RESPONSE]Dear @WouterDeCoster, Hi. thank you for your "miss you" sweet phrase. My English is not so good but as I had some unhappy discussion about using "dear" in one of my very first posts in Biostars, they suggested to me to mention that warning in order to reduce misunderstanding situations.

0
Entering edit mode

Oh yes, I certainly remember that discussion. Well, fair enough, doesn't hurt to mention it.