Low Complexity Sequence Alignment
4
8
Entering edit mode
13.4 years ago
Noyk ▴ 80

hi, i am trying to generate a pairwise alignment between two proteins (below) that contain long area that has low complexity. Using blast2seq, which presumably remove these area, I can get a small alignment of about 70 aa with e value 1e-11. However, with software that do not consider complexity, I am able to align up to 357 aa with e value 2e-19. Although I am not sure that this alignment of low complexity region can be considered as evidence for homology? Would be grateful to hear your opinion on this. Thanks!

>15_350     
MPPIRTGNPSAPPGAFPQPPPPPAATTKAKKKKKNKGKKGGDGGGNGANEGRHDDEDEDS
DDLPPLEDIDSDPPAIPLPARPSSRTSHAPPAPPNFVNPPHPHAHSTPSDLLSTASDLYR
QIEAAAASALSTHPSFAASFPPPPPGTGVATDEAYWTSLPQHLRQFIRSALPLAAGLTAP
PAPGGVGVGVGVTGVNGQPLPPLTHDQLSSAAAQLAQVVQSNWGQLGLGPIPAAATAGQN
RTGSATISLGSFPVTMPTREQMEAAIGSMGGLSELGVGGGGEEEQFELTEDEDSKEGGEQ
TKKKNKKKKKKAAAAAAAAAPPPPPPAPAPAPRPPVARPNPPKTPPPPPRAPAAQTNGKQ
PAPPHQQQQQQQKDYPKAPPPGAYPHTPSAPPPAPTPSIAPKAGPSSERERIRDFWLGLE
EGERRALVKVEKEAVLRKMKEQQRSGCSCAVCGRKRTAIEEELEVLYDAYYDELESYANH
QVRYASSGYTIAPPPGPGPFPGSVDAASLPSPAPPPKKPSGIVKTTRDHKNVRPAAKKKP
PQSEGPAHGEPGHTHSQSCPHHPHNHQGHATPATTAAKAKGAAVEEQYEEDEEGDEEEEE
EYDEDEEEYDEEDEEEEYDEEEQEQPAQDEVPKKAAAAKEKDGGADFFGFGKSLTVKGGI
LTVADDLLKNDGQKFLEMMEQLADKRIQREREAADSVNDADDDEDGSEDEYDEDEDEDED
DDDDEGDDEDDEEEVMTEAERMEEGRRMFQIFAARMFEQRVLTAYREKVASERQKQLLRE
LEEEERLAEEKELKKAKENQKKKDKKKDQKQKKDDFRLRKEAEREAEEQAKRQAEDARLE
EERRRNEEQRLKREAERKLKEEERQKKDEKLRIAKEEERLKKEEKLRVAKEARLALEKEQ
REKKAKDDAERKARDAADKKDRQGAPVLAAARLKQQLPKSPPILRPAVNKPQNIPGFPKG
PVAKPMAIPNQQQVPGGSRQPLPPGVLARQLSGPPPPQHSHQQQQMAPHQQQRPVGPPPH
QQLPSFVQRPPASQQGMNGVVPSSRMFPLPSSQSFGNIGMQQQQQQQSFAPHQQQQQQQQ
HQQQQQQLHPQQQQQHLALSPTLNRNAPIGPPPSSGPGPTSIAGTSVPRTNGIPISPIAP
PPASSAQGMPSPSPRPMNVGPIGGTIGRPSSSISMGHSNGRSSSPPPRIFGSSALLEDDE
IVEPSRSSSQSWSASPFSSGIWGAPAIPAQIVTPDRNSVIRDRARVSYVKLDELSNGTNA
PIAIGDIHRALITLWPDSVSVDLKELVESMLQEGSMANGGGSFAFTQRGEGLYAQYNS

>YNL091W  Chr 14  
MPPNSKSKRRKNKSKQHNKKNGNSDPEQSINPTQLVPRMEPELYHTESDYPTSRVIKRAP
NGDVIVEPINTDDDKKERTANLTHNKDSMDSASSLAFTLDSHWESLSPEEKKTILRIEKE
EVFNVIRNYQDDHSCSCSVCGRRHLAMDQEMERIYNTLYAMDKDKDPETNPIKFHLGIIK
ELQISKNQQQNDLSSTKGEVVKNFLSSSTVGSLKEEVLHFKQKQLSKQEQAHNETADNTS
LLEENLNNIHINKTSSEISANFNSVSDEELQQKYSNFTKTFISSHPKIAEEYVQKMMMYP
NIRALTDDLMNSNGQGFLNAIEDFVRDGQIQASKKDDSITEDEASSTDLTDPKEFTTMLH
SGKPLTEDEYADLQRNIAERMTNAYDTASKKFKDVSQLEKELFTRFMSGRDKKSFRELII
QSFKNKFDGELGPSVLAATLSSCFSSQSKDTSLDTDSIYEDEDEEDYDDYSEYAEDSEEV
SEYEGIEAVEKPEHDEKSNGIRETLHLSYDHDHKRQNHPHHHYHSTSTHSEDELSEEEYI
SDIELPHDPHKHFHRDDDILDGDEDEPEEEDENEGDDEEDTYDSGLDETDRLEEGRKLIQ
IAITKLLQSRIMASYHEKQADNNRLKLLQELEEEKRKKREKEEKKQKKREKEKEKKRLQQ
LAKEEEKRKREEEKERLKKELEEREMRRREAQRKKVEEAKRKKDEERKRRLEEQQRREEM
QEKQRKQKEELKRKREEEKKRIREQKRLEQEKLQKEKEEEERQRLIAEDALRKQKLNEEQ
TSANILSAKPFTENGVGNPVSSQSHPNMTNYQEDNSCSINDEILKMVNSVAASKPVSPTG
FNVHDLLLPSTNNQMPAMEQSHLPQPGNQNNHFGTTTIPNALDLATKSSLQTENNYLMNS
QTLENTSLLMHNNSSPTKLLPNDFGLSSWGGLTNTMSINPTCKPPVIQTSEMESQAHKSS
PQATMPSFGLPNGGTHRKSFTDELNTLTSMLSSSGFADTSLSSSGFPPSQRSVWNDQKSS
FSGPSTAGNFNNSSIQSGMLLAPTLGSVESFPNRTSIWDSSTTPMMNKSELSGRNITSTA
QDSPAFMASNIWSSNSQYNSPYLTSNVLQSPQISSGVDESHILDSIYNTYLAISPQDSLN
PYIAIGTLFQNLVGLNLDYSTFINKLISMQGAYNCEFFTDNNGSITHVRFARQTPAGHSK
GLLNQLFSGLNDPTATPFTSRPHTSTRASFPIASSTTQTS*
sequence alignment • 6.1k views
ADD COMMENT
8
Entering edit mode
13.4 years ago

A dot plot is an excellent method to use here: great for two sequences like this where you just want to get a feel whether they are related.

Dotter is what I use by choice. Using it in your case shows several diagonal lines indicating an alignment outside the low complexity region (shown by multiple diagonals forming a box like structure) and hence some evidence for homology.

ADD COMMENT
4
Entering edit mode
13.4 years ago

Low complexity regions are masked out because they don't contain the full complement of amino acids. Therefore, an apparently conserved residue in a low complexity region should not receive the same weight as an residue in a "normal" region. BLAST's solution to this is to remove the low complexity regions. In the end, the lower p-value you get when you don't mask does not give you more evidence that the sequences are homologous, as the underlying statistical assumptions are violated.

See also the BLAST glossary.

ADD COMMENT
4
Entering edit mode
13.4 years ago
None ▴ 40

Filtering: BLAST filters regions of low-complexity (for a description of low-complexity see "What is low-complexity sequence?" below). If you sequence contains large regions of "low complexity" it may not significant hits to the database. You can turn off filtering by setting the "Filter" option to "None" using the pull down tab.

The same is valid for the command line version of bl2seq (assuming you are talking about bl2seq when writing blast2seq). So you might execute this program by disallowing filtering, appending -F F.

ADD COMMENT
1
Entering edit mode
13.4 years ago

Make your both sequence in a single file separated by single line and > symbol. and try out Cobalt. Here is the link.

ADD COMMENT

Login before adding your answer.

Traffic: 1599 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6