Two simillarly annotated sequence has no alignment similarity. Why?
0
0
Entering edit mode
4.9 years ago
Farbod ★ 3.3k

Dear Biostars, Hi (not English. So, be ready for some language flaws)

I have two sequences (from de novo RNA-seq assembly), after blastN (and also blastX), they show similar results and annotations (vasotocin related),

but, when I use NCBI online "Align two or more sequences " ; the answer is: "No significant similarity found".

Why is that? My ssumption is this that as they show the same annotation and same protein products, there should be some similarity (I even hoped for exact 100% match!). Am I wrong?

Thanks

NOTE: my 2 SEQs:

>seq1

TCTGGGGAGCCCACGTAGCAGCCCATCCCTTCCCCACAGCAGATACTGGGGCCAAAGCAG
AGGCCCCGGTTTCCGGGGCCACATGACATGCACGGTCTTTGCAGATCAGGAAAAGAGCGC
TTTCCGCCTCGCGGACAGTTCTGGATGTAGCATGCAGATGAGAGCGCGAGGAGCCCCAGG
ACGCACAGAAGTGGAATAGTAGAATCTGGCATCTTGTTCCGTCTAAAGTGCTTCGTCCAA
TTTGTACTGTAGGCTACGGCTACAACTGAACTCCTTCAAATGGCTCGT

>seq2

GTGGTTGGTTACTGAGGTCTCCCTCTGCTGGTGGCATGTAGGATCCGCAGCAGGTCTCCT
GCCAAACCACCCATTAAGGCAGCGTTCTGTTCGCTGGGTGACTGACGTTTACTGTCCTCT
AGGCAGTCTGGGTCTAGCACACAACTCTCTGAGTCACAGCAGACTCCGGATGCAGCACAG
CTTCCCTCAGAGCCACACACTCTTCCTCCAGCCTCGCAGGGGGAGGGCAGGTAGTTCTCC
blast alignment gene • 1.6k views
ADD COMMENT
1
Entering edit mode

Are these full length sequences? If not you may be looking at two different parts of two sequences that code for protein you refer to.

ADD REPLY
0
Entering edit mode

Hi @genomax, maybe I did not understand your answer clearly. these are the Trinity transcripts that had Blast hit with Vasotocin. they are complete as Trinity assembly output and they have completely different IDs (meaning that are different genes or loci)

ADD REPLY
2
Entering edit mode

I am not sure if having two different ID's in Trinity can be considered as evidence that they are complete and are different genes. Here is a clustalO alignment of the two. Ideally we should do an alignment of the translations.

CLUSTAL O(1.2.4) multiple sequence alignment


seq1      TCTGGGGAGCCCACGTAGCAGCCCATCCCTTCCCCACAGCAGATACTGGGGCCAAAGCAG
seq2      -GTG-GTTGGTTACTGAGGTCTCC----------CTC---TGCTGGTGGCATGTAGGATC
            ** *  *   **  **    **          * *    * *  ***     * *   

seq1      AGGCCCCGGTTTCCGGGGCCACATGACATGCACGGTCTTTGCAGAT--CAGGAAAAGAGC
seq2      CGCAGCAGGTCTCCTGCCAAACCACCCATTAAGGCAGCGTTCTGTTCGCTGGGTGACTGA
           *   * *** *** *    **    ***  * *     * * * *  * **   *  * 

seq1      GCTTTCCGCCTCGCGGACAGTTCTGGATGTAGCATGCAGATGAG--AGCGCGAGGAGCCC
seq2      CGTTTACTGTCCTCTAGGCAGTCTGGGTCTAGCACACAACTCTCTGAGTCACAGCAGACT
            *** *    * *       ***** * *****  **  *     **    ** ** * 

seq1      CAGGACGCACAGAAGTGGAATAGTAGAATCTGGCATCTTGTTCCGTCTAAAGTGCTTCGT
seq2      CCGGATGCAGCACA------------------GCTTCCCTCAGAGCCACACACTCTTCCT
          * *** ***    *                  ** **       * *  *    **** *

seq1      CCAATTTGTACTGTAGGCTACGGCTACAACTGAACTCCTTCAAATGGCTCGT
seq2      CCAGCCTCGCAGGGGGAGGGCAGGTAGTTCTCC-------------------
          ***   *     *  *    * * **   **
ADD REPLY
0
Entering edit mode

Oh, thanks for your efforts and fast support!

So would you please help me "explain" this alignment?

Does it tell that NCBI is correct is showing "no similarity"? or it is showing that there is some similarity?

ADD REPLY
1
Entering edit mode

I took one of the common hits (from individual blast searches from those two sequences) and aligned (Oncorhynchus kisutch vasotocin-neurophysin VT 1.) One would need to spend some time on this. You ideally should a similar exercise with translations and a common protein blast hit.

CLUSTAL O(1.2.4) multiple sequence alignment


seq1                ------------------------------------------------------------
seq2                ------------------------------------------------------------
XM_020465836.1      AATACCGGAAAGTTCCTAGCAGACATTCGAAAAGAAAAACCGAGCCCTTTGAAAGAGTTC


seq1                ------------------------------------------------------------
seq2                ------------------------------------------------------------
XM_020465836.1      AGTTGTAGCCGACAGTATCAATTGGACGAAGCACTTCAGACTGAACAAGATGCCATATTC


seq1                --------------------TCTGGGGAGCCCACGTAGCAGCCCATCCCTTCCCCACAGC
seq2                ------------------------------------------------------------
XM_020465836.1      TACGTTTCCACTGCTGTGGGTCCTGGGGCTCCTCGCGCTATCCT--CCGCGTGCTACATC


seq1                AGATACTGGGGCCAAAGCAG-AGGCCCCGGTTTCCGGGG-CCACATGACATGCACGGTCT
seq2                ------------------------------------------------------------
XM_020465836.1      CAGAACTGTCCGCGAGGCGGGAAGCGCTCTTTTCCTGATCTTCCACGACAGTGCATGTCG


seq1                TT-----------------------------------------G--CAGATCAGGAAAAG
seq2                ---------------------------------------------------------GT-
XM_020465836.1      TGTGGCCCCGGGGACAGGGGCCGCTGCTTTGGCCCCAATATCTGCTGTGGGGAGGGAATG


seq1                AGCGCTTTCCGCCTCGCGGACAGTTCTGGATGTAGCATGCAGATGAGAGCG-CGAGGAGC
seq2                GGTTGGTTACTGAGGTCTCCCTCTGCTGGTGGCATGTAGGATCCGCAGCAGGTCTCCTGC
XM_020465836.1      GGCTGTTACATGGGCTCCCCAGAGGCAGCTGGTTGTGTGGAGGAGAACTACCTGCCCTCC
                     *    *         *        * *   *      * *   *              *

seq1                CCCAGGACGCACAGAAGTGGAATAGTAGAATCTGGCATCTTGTTCCGTCTAAAGTGCTTC
seq2                CAAACCACCCATTAAGGCAGCGTTCTGTTCGCTG----------GGTGA-----------
XM_020465836.1      CCCTGCGAGGCTGGAGGAAGAGTGTGTGGCTCTG----------AGGGAAGCTGTGCTGC
                    *             * *  *  *        ***                          

seq1                GTCCAATTTGTACTGTAGGCTAC---------GGCTACAACTGAACTCCT----------
seq2                --CTGACGTTTACTGTCCTCTAGGCAGTCTGGGTCTAGCACACAACTCTCTGAGTCACAG
XM_020465836.1      ATCCGGAGTCTGCTGTGACTCAGAGAGTTGTGCGCTAGACCCAGACTGCCTAGAGGACAG
                      *     * * ****     *            ***   *   ***             

seq1                ------------------------------------------------------------
seq2                ------------------------------------------------------------
XM_020465836.1      TAAACGTCAGTCACCCAGCGAACAGAACGCTGCCTTAATGGGTGGTTTGGCAGGAGACCT


seq1                ---TCAAATGGCTCGT--------------------------------------------
seq2                ----------------------------------------------CAGACTCCGGAT--
XM_020465836.1      GCTGCAGATCCTACATGCCACCAGCAGAGGGAGACCTCAGTAACCAACCACTGCCCATCC


seq1                ------------------------------------------------------------
seq2                ----------------------GCAGCACAGCTTCCCTCA--------------GAGCCA
XM_020465836.1      CTCACCTGAACACACCCAGAATAGAGCTTAAATTCACCATTTCACATGCACTACTACAAA


seq1                ------------------------------------------------------------
seq2                CACACTCTTCCTCCAGCCTCGCAGGGGGAGG-------------GCAGGT----------
XM_020465836.1      AACAAACCTCACACAGATTCACAGACACACAGCAGAAGTAGAGAGCAGGCTTGCTACATA


seq1                ------------------------------------------
seq2                ---------------AGTTCTCC-------------------
XM_020465836.1      AGGGGGAAATTTATCAGCTCTACATGAATGTTTACTGTGTGC
ADD REPLY
0
Entering edit mode

Oh cool, we did the same >.<

ADD REPLY
0
Entering edit mode

You are telling me that the "tail" of a transcripts code for vasotocin and the "head" of another transcripts code for vasotocin, too. And I am looking at that "tail" and "head" -AND- these "tail" and "head" that code for the same thing, has no sequence similarity?

Am I getting your point correctly?

ADD REPLY
1
Entering edit mode

See mine (and @Wouter's) new answers. It could be simple like that but would need you to look at this carefully.

ADD REPLY
1
Entering edit mode

I just did standard blast for both sequences and find this: blast

So both sequences indeed have a hit on the same gene.

Running clustal omega for the identified gene and your two sequences looks like this: clustal

Looks like they both belong to the same gene, but to different parts (partially overlapping?).
I'm not sure what's the best conclusion for this.

ADD REPLY
1
Entering edit mode

We did a similar exercise but with two different hits :)

There is some kind of shared domain/site but would need @Farbod to spend time looking at it more closely.

ADD REPLY
0
Entering edit mode

Thank you @WouterDeCoster, but how? translating the nucleotide in Expassy and align the proteins, for example?

please help me and name some related tools for "looking more closely".

ADD REPLY
1
Entering edit mode

That would be a start. Translate into all 6 frames. You may need to try all to see which works best with alignments to common protein hits. Q07662.1 and P16041.1 look like good candidates. They are from swissprot.

ADD REPLY
1
Entering edit mode

Final _1/_2 refer to seq1/seq2. (had to split in two posts).

CLUSTAL O(1.2.4) multiple sequence alignment

5'3'_Frame_3_1      XWGAHVAAHPFPTADTGAKAEAPVSGAT-HARS---------------------------
3'5'_Frame_3_1      --------EPFEGVQL-P-PTVQIGRSTLDGTRC---------------QILLFHFCASW
3'5'_Frame_3_2      ------------------------------RTTCPPPARL--EEEC--------------
3'5'_Frame_1_1      -------------------------TSHLKEFSCSRSLQYKLDEAL-TEQDARFYYSTSV
5'3'_Frame_2_2      ----------------------------------------------------------XW
5'3'_Frame_1_1      ------------------------------------------------------------
5'3'_Frame_3_2      ------------------------------------------------------------
5'3'_Frame_2_1      ------------------------------------------------------------
5'3'_Frame_1_2      ------------------------------------------------------------
3'5'_Frame_1_2      -------------------------------------------------GELPALPLRGW
3'5'_Frame_2_1      -------------------------------------------RAI-RSSVVAVAYSTNW
P16041.1            ------------------------------------------------------------
3'5'_Frame_2_2      ------------------------------------------------------------
                                                                  *             

5'3'_Frame_3_1      ------LQ--IRKRALSASRTVLD------------------------------VACR--
3'5'_Frame_3_1      GSSRSHLH--ATSRTVREAESALF------------------------------LICKD-
3'5'_Frame_3_2      ----VALREAVLHPESAVTQRVVC-TQTA-RTVNVSHPANRTLP-WV----VWQETCCGS
3'5'_Frame_1_1      RPGAPRALICMLHPELSARRKALFS---SAKTVHVMWPRKPGPLLWPQYLL-WGRDGLLR
5'3'_Frame_2_2      ------------------------L--VT-----------EVSLCWWHVG---SAAGLLP
5'3'_Frame_1_1      ------------------------------------------------------------
5'3'_Frame_3_2      ------------------------------------------------------------
5'3'_Frame_2_1      ------------------------------------------------------------
5'3'_Frame_1_2      ----------------------------------XVVGY-GLPL---------LVACRIR
3'5'_Frame_1_2      ------------------------R--KSV--------WL-GKLCCIRSLL-LRELCARP
3'5'_Frame_2_1      ------------------------T--KHF--RRNKMPDSTIPLLCVLGLLALSSACYIQ
P16041.1            ------------------------------------MPYSTFPLLWVLGLLALSSACYIQ
3'5'_Frame_2_2      ------------------------------------------------------------


5'3'_Frame_3_1      ----------------------------------------------EREEPQDAQKWNS-
3'5'_Frame_3_1      ------------------------------------------------------------
3'5'_Frame_3_2      YMPPAEGDLSNQPXX---------------------------------------------
3'5'_Frame_1_1      GLPRX-------------------------------------------------------
5'3'_Frame_2_2      NHPL--------RQRSVRWVTDVYCPLGSLGLAHNSLSHSRL--RMQHSFPQSHTL----
5'3'_Frame_1_1      --XSGEPT-QPIPSPQQI------LGPKQRPRFPGPHDMHGLCRSGKERFPPRGQFWM-H
5'3'_Frame_3_2      -------------------------------------------XGWLLRSPSAGGM-D--
5'3'_Frame_2_1      --XLGSPRSSPSLPHSRYWG--QSRGPGFRGHMTCTVFADQE--KSAFRLADSSGCSM--
5'3'_Frame_1_2      SR------SPAKPPIKAA-----FCSLGD-RLLSSRQSGSSTQLSESQQTPDAAQLPS--
3'5'_Frame_1_2      RLPRGQ-TSVTQRTERCL-----NGWFGRRP---AA------------------------
3'5'_Frame_2_1      NCPRGGKRSFPDLQRPCM-----SCGPGNRGLCFGPSICCGEGMGCYVGSPXX-------
P16041.1            NCPRGGKRSFPDLPRQCM-----SCGPGDRGRCFGPNICCGEGMGCYMGSPEAAGCV---
3'5'_Frame_2_2      ------------------------------------------------------------


5'3'_Frame_3_1      -RIWHLVP----------------------------------SKVLRPICTVGYGYN-TP
3'5'_Frame_3_1      -RACHVAPETG-------------------------------ASALAPVSAVGKGWAATW
3'5'_Frame_3_2      ------------------------------------------------------------
3'5'_Frame_1_1      ------------------------------------------------------------
5'3'_Frame_2_2      ----FLQPRRGRAGSS--------------------------------------------
5'3'_Frame_1_1      ADESARSPRTHRSGIVE--SG-------------ILF---------RLKCFVQFVL----
5'3'_Frame_3_2      --PQQVSCQTTH-GSVLFA-G-L-TFTV---L-AVWV-HTTL-VTADSGCSTASLRA-TH
5'3'_Frame_2_1      -QMRAR----GAPGRTEVE--NLASCSV-SASSNLYCR--LRLQL---------------
5'3'_Frame_1_2      -EPHTLPPASQGEGR-F---S---------------------------------------
3'5'_Frame_1_2      ----------------------------------------------DPTCHQQRETS-VT
3'5'_Frame_2_1      ------------------------------------------------------------
P16041.1            -EENYLPSPCEAGGRVC---GSEGSCA----ASGVCCD--SESCVLDPDCLEDSKRQ-SP
3'5'_Frame_2_2      --ENYLPSPCEAGGRVC---GSEGSCA----ASGVCCD--SESCVLDPDCLEDSKRQ-SP
                                                *
ADD REPLY
0
Entering edit mode
5'3'_Frame_3_1      SNGS-----------------------------
3'5'_Frame_3_1      APQXX----------------------------
3'5'_Frame_3_2      ---------------------------------
3'5'_Frame_1_1      ---------------------------------
5'3'_Frame_2_2      ---------------------------------
5'3'_Frame_1_1      --------ATATTELLQMAR-------------
5'3'_Frame_3_2      S--SSSLAGGGQ----VVL--------------
5'3'_Frame_2_1      -------------NSFKWL--------------
5'3'_Frame_1_2      ---------------------------------
3'5'_Frame_1_2      NHX------------------------------
3'5'_Frame_2_1      ---------------------------------
P16041.1            SEQNAALMGGLAGDLLRILHATSRGRPQ-----
3'5'_Frame_2_2      SEQNAALMGGLAGDLLRILHATSRGRPQ-PTXX
                                                *
ADD REPLY
1
Entering edit mode

One other thing,

teleost has experienced Whole Genome Duplication event (in salmonids it is even more than one WGD), does it has any effect on this situation? are we encountering paralogous genes or duplicated genes or some other evolutionary phenomenon instead of watching different parts of a LONG gene?

ADD REPLY
0
Entering edit mode

Oh, I guess that's certainly possible. Also possible that one of the copies starts acquiring mutations and is functionally no longer doing the same as the other copy.

ADD REPLY
0
Entering edit mode

This is part2 right? After upvoting your order is gone :)

ADD REPLY
0
Entering edit mode

I really appreciate that.

ADD REPLY
0
Entering edit mode

thank you, it is not the same thing and both of your helps has unique taste for me.

So, I would ask same thing that I have asked from @genomax:

You are telling me that the "tail" of a transcripts code for vasotocin and the "head" of another transcripts code for vasotocin, too. And I am looking at that "tail" and "head" -AND- these "tail" and "head" that code for the same thing, has no sequence similarity?

Am I getting your point correctly?

ADD REPLY
0
Entering edit mode

To be honest, I have no idea. I just did a blast and an alignment to see what came up.

ADD REPLY
0
Entering edit mode

[OFFTOPIC]

Hi Farbod,

Long time no see!
By the way, I think your English is good enough to remove that warning ;-)

Cheers,
Wouter

ADD REPLY
0
Entering edit mode

[OFFTOPIC RESPONSE]Dear @WouterDeCoster, Hi. thank you for your "miss you" sweet phrase. My English is not so good but as I had some unhappy discussion about using "dear" in one of my very first posts in Biostars, they suggested to me to mention that warning in order to reduce misunderstanding situations.

ADD REPLY
0
Entering edit mode

Oh yes, I certainly remember that discussion. Well, fair enough, doesn't hurt to mention it.

ADD REPLY

Login before adding your answer.

Traffic: 1387 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6