Problem About Orthomclblastparser
2
0
Entering edit mode
10.8 years ago
yogi.bioinfo ▴ 10

When i do orthomclBlatParser -

orthomclBlastParser good_out my_orthomcl/compliantFasta >>my_orthomcl/similarSequences.txt

it gives

couldn't find taxon for gene '1_goodblastdb' at ./orthomclBlastParser line 103, <F> line 1.

what is the error ??, could anyone help me

orthomcl • 3.3k views
ADD COMMENT
0
Entering edit mode

Yogi: Did you check line 103 at least ?

ADD REPLY
0
Entering edit mode

I tried multiple times , but getting the same error every time ..actually i am not getting what mistake i am doing in orthomclAdjustFasta or in Blast , here i am giving sample of my fasta file which i get after orthomclAdjustFasta command and blast result

bryo|892-1359 MSRKSIAEKQVAKPDPIYRNRLVNMLVNRILKNGKKSLAYRILYKAMKNIKQKTKKNPLFVLRQAVRKVTPNVTVKARRIDGSTYQVPLEIKSTQGKALAIRWLLGASRKRSGQNMAFKLSYELIDAARDNGIAIRKKEETHKMAEANRAFAHFR bryo|1514-22392776-3555 MKLELDMFFLYGSTILPECILIFSLLIILIIDLTFPKKDTIWLYFISLTSLLISIIILLFQYKTDPIISFLGSFQTDSFNRIFQSFIVFCSILCIPLSIEYIKCAKMAIPEFLIFILTATVGGMFLCGANDLVTIFVSLECLSLCSYLLCGYTKRDIRSNEAAIKYLLIGGTSSSILAYGFSWLYGLSGGETNIQKITNGLLNAETYNSSGTFIAFICILVGLAFKLSLVPFHQWTPDIYEGSPTPVVAFLSVTSKIAGLALATRILNILFSFSPNEWKIFLEILAILSMILGNLVAITQTSMKRMLAYSSISQIGYILIGLITGDLKGYTSMTIYVFFYIFMNLGTFACIILYSLRTGTDNIRDYAGLYIKDPLLSFSLTLCLLSLGGLPPLTGFFGKLYLFWCGWQSGFYLLVFIALITSVISLYYYLKIIKLILTKKNNEINPYIQAYIITSPTFFSKNPIEFVMIFCVLGSTFLGIIINPIFSFFQDSLSLSVFFIK bryo|4001-4105 MEVNILAFIATALFILIPTAFLLILYVQTASQNS bryo|4236-43414827-5128 MNHMELGPSTILGVGLIIIGLFLYALKLREPYVSRDYDFFFSCIGLLCGGILFFQGWRLDPILLLSQILLSGTTIFFIAESLYLRKNLNFVKSKKKYINLAKKNIYKYIYENFKLKKKWNELNYTRHIFYKKKKH bryo|5859-9056 MEIFILPEFGKIQFEGFNRFINQGLSEELSNFPIIEDIDQEFEFQIFGEQYKLAEPLLKERDAVYQSITYSSDVYVPAQLTQKKKGKIQKQIVFLGSIPLMNSQGTFVVNGVARVIINQILRSPGIYYNSELDHNGIPIYTGTLISNWGGRLKLEIDGKTRIWARISKKRKVSILVLLLAMGLNLQNILDSVCYPKIFLEFIKKNTKKEYPNSTEDAIVELYKHLYCIGGDLFFSESIRKELQKKFFQQRCELGKIGRLNLNKKLNLNVPENEIFVLPQDILAAVDYLIKLKFGIGTIDDIDHLKNRRVCSVADLLQDQLKLALNRLENSVLFFFRGATKRKRLPTPKSLVTSTPLIMTFKEFFGSHPLSQFLDQTNPLTEIVHKRRLSSLGPGGLTRRTASFQVRDIHASHYGRICPIETSEGMNAGLIASLAIHAKISILGCLESPFYKISKLSNLEEIINLSAAEDEYYRIATGNCLALDQNSQEEQITPARYRQDFVAIAWEQVHLRSIFPLQYFSVGASLIPFLEHNDANRALMGSNMQRQAVPLLKPEKCIVGTGIESQTALDSGSVTVSSHGGKIEYLDGNQIILSLKKKKIDKNLIIYQRSNNSTCMHQKPKVEKQKYIKKGQILADGAATANGELALGKNILVAYMPWEGYNFEDAILINERLIYEDIYTSIHIERYEIEARVTSQGPEKFTNEIPHLDDYLLRHLDQNGIVLTGSWVETGDVLVGKLTPQETEENLRAPEGKLLQAIFGIQVATSKETCLKVPPGGRGRVIDIRLISQEDNSANTAQIIHIYILQKRKIQIGDKVAGRHGNKGIISKILPRQDMPFLQDGTPIDMILSPLGVPSRMNVGQIFECLLGLAGSFLHKNYRIIPFDERYEREASRKLVFSELYKASKKTTNPWLFEPDNPGKNRLIDGRTGEIFEQPITIGKAYMLKLIHQVDDKIHARSSGPYALVTQQPLRGRSRRGGQRVGEMEVWALEGFGVAYILQEMLTIKSDHIRARYEVLGAIVTGEPIPKPNTAPESFKLLVRELRSLALEINHVIICEKNLKLKLKEI bryo|9087-951810115-11737 MTYQKKHQHLRIELASPEQIRNWAERVLPNGEIVGQVTKPYTLHYKTHKPEKDGLFCEKIFGPIKSGICACGKYQGIEKKKENIKFCEQCGVEFIESRIRRYRMGYIKLACSVTHVWYLKRLPSYIANLLAKPLKELESLVYCDLFLARPITKKPTLLKLQGLFKYEDQSWKDIFPRFFSPRGFEVFQNREIATGGDAIQKQLTNLNLQNVINLAHLEWKEFAEQKSTGNEWEDRKIQRRKDLLVRRIKLAKHFIQTNIKPEWMVLSLLPVLPPELRPMIELGEGELITSDLNELYRRVIYRNNTLLDFLARSGSTPGGLVVCQKRLVQEAVDALIDNGIRGQPMKDSHNRPYKSFSDLIEGKEGRFRENLLGKRVDYSGRSVIVVGPFLPLHQCGLPREMAIELFQAFVIRGLIGRNFAPNLRAAKTMIQNKEPIIWKVLQEVMQGHPILLNRAPTLHRLGIQAFQPILVNGRAIHLHPLVCGGFNADFDGDQMAVHIPLSLEAQAEARLLMLSHKNLLSPATGEPISVPSQDMLLGLYILTIENNQGIYGNKYNPSKKYDSKKKFSQIPYFSSYDNVFRALQQKQIYLHSSLWLRWQINLRIITLLNQEGPIEIQYKSFGNSFQIYEHYQLRKNKNQEIISTYICTTAGRILFNQQIEEAIQGTYKASLKQKTFVQKIEKNG bryo|11811-15971 MAEPVNLIFYNKVMDRTAIKQLISRLIAHFGITYTTHILDQLKTLGFQQATFGAISLGIDDLLTAPSKSWLIEDAEQYGNLSEKHHNYGSLHAVEKLRQLIETWYATSEYLKQEMNPNFRITDPLNPVHMMSFSGARGSTSQVHQLVGMRGLMSDPQGQIIDLPIQSNFREGLSLTEYIISCYGARKGVVDTAVRTSDAGYLTRRLVEVVQHIVVRKVDCGTLYGINVNNLSEKKNNFQQKLIGRVIAENIYIDHRCIAPRNQDIGALLANRLITLKTKQIFLRSPLTCKSMNWICQLCYGWSLSHGNLIEMGEAVGIIAGQSIGEPGTQLTLRTFHTGGVFTGDIAEHVRTPFNGIIEFNENFVYPTRTRHGHPAWMCHTNLFLVIKSKNKVHNLTIPPKSLLLVQNNQYVESKQVIAEIRAKTSPFKEKVQKYIYSNLEGEMHWSTKVRHASEYIHSNIHLILKTCHIWILSGNFHKKNNDLSVLFYKNQDKIDFPISLTKEKNEFSFVKNKTQLNLFLFHFYLYKKNKIFIKSQLTNNILNKINNSKNYNFILQEYNIKKKKNFYFLKNKNLTCPLFLKIKKNGVLKNNEIFAILDDPSYKVKNSGILKYGNIKVDLINQNTNFEDPQTKLFRPRYSIIKEGNFFFIPEEVYVLTQSLSSVFIKNNKFIQAGTLITSNIRSNTNGLVKIQKKGNNNYELKILPGTIYYPNETYKISKQISILIPPGKKLFNEFECKNWTYLQWIMPSKEKPFVLIRPAVEYKISKKLNKSTLFDLLKKNKKVEIKTINYLLYEDDEQIQIINEKNIQLIQTCLLVHWKKKYFFKEANVSFLKIKTKNNFKTFLQISLIEYSNLEKKKEKTISKNVLKKNYYDHFFSISKNELKNKKQGVIRIISNQNNGMQSFIILSSSDLVKTFKFKKLTKNISIKTNTNTSTAKFFEFNKNFKILNKKKKLNLTKKNFSIGLLLFKKLGFLGNLHNIVTNSFSSFYLINYTKLISNKYSIITKFQHTCQNPKWYLIDESKKINKLILGKHINYNLFNWCFPLFSLLKKKIDFQTIKLGQLLFENFVISKYKTSYPSGQIISININYFIIRLAKPYLATGGATIHNNYGEFIKEGDTLITLIYERLKSGDIIQGLPKVEQLLEARPINSVSINLENGFEDWNNDMIKFIGNLWGFFLSTKISMEQGQINLVDQIQKVYQSQGVQISNKHIEIIVRQMTSKVITLEDGMTNVFLPGELIEFSRTQKMNRALEEAVPYKPILLGITKASLNTQSFISEASFQETTRVLAKAALKGRIDWLKGLKENVILGGLVPAGTGSQEVIWQITLEKKKEIYLKKKKEFFTKKINNVFLYQDTFSIFPTTEIIHNVLKESISQNNKNNFSI bryo|16055-16762 MKQKSWNIHLEEMMEAGVHFGHQARKWNPKMAPYIFTERKGIHIINLTQTARFLSEACDLVANASSKGKQFLIVGTKYQAADLIESSALKARCHYVNQKWLGGMLTNWSTIETRLQKFKDLENKKKTGTINRLPKKEAANLKRQLDHLQKYLGGIKYMTSLPDIVIIIDQQKEFTAIQECITLGIPTICLVDTDCDPDMTDIPIPANDDARASIRWILNKLTLAICEGRYNSIKN bryo|16890-17636 MSHTAKMASTFNNFYEISNVEVGQHFYWQLGSFQVHAQVLITSWIVIAILLSLAVLATRNLQTIPMGGQNFVEYVLEFIRDLTRTQIGEEEYRPWVPFIGTMFLFIFVSNWSGALFPWRVFELPNGELAAPTNDINTTVALALLTSVAYFYAGLHKKGLSYFGKYIQPTPVLLPINILEDFTKPLSLSFRLFGNILADELVVAVLISLVPLVVPIPMMFLGLFTSAIQALIFATLAAAYIGESMEGHH bryo|18014-18259 MNPLISAASVIAAGLAVGLASIGPGIGQGTAAGQAVEGIARQPEAEGKIRGTLLLSLAFMEALTIYGLVVALALLFANPFV bryo|18468-1861219200-19609 MENGTYFIISSNFWTIAGSFGLNTNLLETNLINLGVVLGLLVYFGKGVLSNLLNNRKLTILNTIQDAEERYKEATDKLNQARTRLQQAKQKADDIRINGLSQMEKEKQDLINAADEDSKRLEDSKNATIRFEKQRAIEQVRQQVSRLALERALETLKSRLNSELHLRMIDYHIGLLRAMESTIE bryo|19654-21177 MVNIRPDEISSIIRKQIEQYNQEVKIVNIGTVLQVGDGIARIYGLDKVMAGELVEFEDGTVGIALNLESDNVGAVLMGDGLTIQEGSSVKATGKIAQIPVSDAYLGRVVNALAQPIDGKGQIPASEFRLIESPAPGIISRRSVYEPMQTGLIAIDSMIPIGRGQRELIIGDRQTGKTAVAIDTILNQKGQNVVCVYVAIGQKASSVAQVVNTFEDRGALEYTIVVAETANSPATLQYLAPYTGAALAEYFMYRKQHTLIIYDDLSKQAQAYRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKLSSNLGEGSMTALPIVETQAGDVSAYIPTNVISITDGQIFLSADLFNAGIRPAINVGISVSRVGSAAQIKAMKQVAGKLKLELAQFAELEAFAQFASDLDKATQNQLARGQRLRELLKQSQSAPLSVEEQIATIYTGVNGYLDVLETGQVKKFLIQLREYLVTNKPQFAEIIRSTKVFTEQAENLLKEAITEHIELFLFQEEK bryo|c22263-22162 MNLEVIAQLTVLALIVASGPLVIALLAARKGNL bryo|c22425-22333 MELILNKEYRLVIIVLISVYYRYRFFLLLF bryo|22516-22614 MTSISDSQIIVILLSVFITSILALRLGKELYQ bryo|c23107-22997 MLTLKLFVYTVVIFFVSLFVFGFLSNDPGRNPGRKE bryo|c23605-23438 MFNIYLENAFYLNGITFAKLPEAYSIFDPIVDVMPIIPLFFFLLAFVWQASVSFR bryo|24053-25594 MKLAYWMYAGPAHIGTLRVASSFKNVHAIMHAPLGDDYFNVMRSMLERERDFTPVTASIVDRHVLARGSQEKVVDNITKKDKQEHPDLIVLTPTCTSSILQEDLQNFVNRASMSSDSDVILADVNHYRVNELQAADRTLEQVVRYYLEKAHRQEKLNLSLTDKPSANIIGIFTLGFHNQHDCRELKRLLQDLGIMINQIIPEGGFVENLHELPKAWFNLVPYREVGLMTALYLEKEFGMPYISTTPMGIVDIANCIRQIQKQVNIWSPILLGKKFDFEPYIDEQTRFISQAAWFSRSIDCQNLTGKKAVVFGDATHAASITKILACEMGIRVSCTGTYCKHDEEWFREQVQNFCDEILITDDHTEVGDMIARIEPSAIFGTQMERHIGKRLDIPCGVISSPVHIQNFPLGYRPFLGYEGTNQIADLVYNSFTLGMEDHLLEIFGGHDTKEVITKSLSTDTDLTWNSESQLELNKIPGFVRGKIKRNTEKFARQNNITKITVEVMYAAKEDLSA

and this is the output of blast result

bryo |892-1359 1_final_gooddb 100.00 155 0 0 1 155 1 155 1e-116 314 bryo |892-1359 318_final_gooddb 87.74 155 19 0 1 155 1 155 1e-103 282 bryo |892-1359 332_final_gooddb 86.45 155 21 0 1 155 1 155 2e-103 281 bryo |892-1359 409_final_gooddb 85.16 155 23 0 1 155 1 155 3e-101 276 bryo |892-1359 114_final_gooddb 83.23 155 26 0 1 155 1 155 2e-97 266 bryo |892-1359 252_final_gooddb 72.67 150 41 0 6 155 1 150 1e-82 228 bryo |1514-22392776-3555 2_final_gooddb 100.00 501 0 0 1 501 1 501 0.0 979 bryo |1514-22392776-3555 333_final_gooddb 75.95 499 120 0 1 499 1 499 0.0 717 bryo |1514-22392776-3555 115_final_gooddb 70.68 498 146 0 1 498 1 498 0.0 700 bryo |1514-22392776-3555 410_final_gooddb 76.95 499 114 1 1 499 1 498 0.0 699 bryo |1514-22392776-3555 251_final_gooddb 68.27 498 158 0 1 498 1 498 0.0 679 bryo |1514-22392776-3555 478_final_gooddb 27.51 349 228 12 116 449 130 468 1e-24 94.0 bryo |1514-22392776-3555 72_final_gooddb 26.67 330 219 11 116 432 130 449 5e-23 89.0 bryo |1514-22392776-3555 330_final_gooddb 27.46 386 260 12 59 432 72 449 4e-22 86.3 bryo |1514-22392776-3555 260_final_gooddb 26.06 353 231 11 111 446 123 462 5e-22 85.9 bryo |1514-22392776-3555 180_final_gooddb 26.68 386 263 12 59 432 72 449 3e-21 83.6 bryo |1514-22392776-3555 77_final_gooddb 24.61 191 142 2 214 403 212 401 9e-14 59.3 bryo |1514-22392776-3555 399_final_gooddb 24.70 247 181 4 158 403 159 401 7e-13 56.6 bryo |1514-22392776-3555 183_final_gooddb 25.89 224 141 10 110 322 111 320 1e-12 56.2 bryo |1514-22392776-3555 327_final_gooddb 25.00 224 143 9 110 322 111 320 2e-12 55.1 bryo |4001-4105 3_final_gooddb 100.00 34 0 0 1 34 1 34 1e-21 65.5 bryo |4001-4105 116_final_gooddb 97.06 34 1 0 1 34 1 34 8e-21 63.5 bryo |4001-4105 334_final_gooddb 94.12 34 2 0 1 34 1 34 5e-20 61.6 bryo |4001-4105 250_final_gooddb 94.12 34 2 0 1 34 1 34 5e-20 61.6 bryo |4001-4105 317_final_gooddb 91.18 34 3 0 1 34 1 34 1e-19 60.5 bryo |4001-4105 88_final_gooddb 91.18 34 3 0 1 34 1 34 4e-19 59.3 bryo |4001-4105 411_final_gooddb 88.24 34 4 0 1 34 1 34 5e-18 56.6 bryo |4236-43414827-5128 4_final_gooddb 100.00 135 0 0 1 135 1 135 1e-99 270 bryo |4236-43414827-5128 412_final_gooddb 70.71 140 34 3 1 134 1 139 1e-54 157 bryo |4236-43414827-5128 316_final_gooddb 63.57 140 40 4 1 131 1 138 9e-47 136 bryo |4236-43414827-5128 117_final_gooddb 77.32 97 21 1 1 96 1 97 8e-45 131 bryo |4236-43414827-5128 249_final_gooddb 65.71 35 12 0 1 35 1 35 3e-13 48.9 bryo |5859-9056 5_final_gooddb 100.00 1065 0 0 1 1065 1 1065 0.0 2182 bryo |5859-9056 414_final_gooddb 86.02 1066 147 2 1 1065 1 1065 0.0 1875 bryo |5859-9056 336_final_gooddb 85.82 1065 149 2 1 1065 1 1063 0.0 1843

please tell me where i am doing mistake (in AdjustFasta or in blast )

ADD REPLY
0
Entering edit mode

There are a couple of things wrong. First, your fasta file should have ">" in front of the ID because this is expected for the FASTA format. Also, I can tell from your blast report that some of the the sequences are not formatted correctly. For example, look at the subject ID of the first hit. Specifically, the "1_final_gooddb" is not in the correct format. You will need to go back and run orthomclAdjustFasta on every sequence file and then create a database of all your sequences. There are also some issues with both files having records that don't start on a new line, but that is probably just a result of not formatting your post.

A couple of more things, this is not an answer to your question so please edit your original post in the future and you usually only need to post a small sample (a couple of lines). Those things make it easier for people to follow the conversation and find an answer.

ADD REPLY
0
Entering edit mode

Yogi: Please refer above post by SES. its in line 103 in orthomclBlastParser:

ADD REPLY
1
Entering edit mode
10.8 years ago
SES 8.6k

This error indicates that the format of the IDs in the blast file do not match those in the fasta files. Try to find the line in the blast file that is causing the problem and compare that with your IDs in the fasta files. What likely happened is you did not run orthomclAdjustFasta in a way to produce IDs exactly the way the programs expect. Take a look at your fasta IDs and reference the documentation for orthomcl to try and find out if your file formats are correct. If none of those solve your problem, I would encourage you to edit your post and provide a small sample of your files so that people here can try to figure out the issue. Note that this happens quite often, so you may have just overlooked a step in the process, but it's also possible your data was mangled somehow.

EDIT: I'll provide a full description because this actually comes up a lot and I've never seen it fully explained.

That message you see refers to line 103 in the program orthomclBlastParser, not in your data. This is line 103 in orthomclBlastParser:

die "couldn't find taxon for gene '$subject->{subjectId}'" unless $subject->{subjectTaxon};

In a not very nice way, this line is checking to see if the taxon part subject ID in the blast report is found in a hash containing the sequence IDs. You have to look at line 101 to find out how $subject->{subjectTaxon} is defined. Here is line 101 in orthomclBlastParser:

$subject->{subjectTaxon} = $genes->{$subject->{subjectId}}->{taxon};

In the above, the taxon is the name you set with orthomclAdjustFasta (e.g., if your taxon name is "grap" you sould have a file called "grap.fasta" in the "compliant_fasta" directory and your sequence IDs should be ">grap|some_read_id") and you can see this code on lines 73-74:

$fastaFile =~ /(\w+).fasta/ || die "'$fastaFile' is not in 'taxon.fasta' format\n";
my $taxon = $1;

Also, $genes on line 101 is a reference to a hash of the sequence IDs. So, what is happening is the program checks to see if the subject taxon in the blast report is in the sequence file and if not, they tell Perl to just die, which throws an exception and generates a string containing the line where the exception occurred.

Note that 1 and 0 have special meaning to Perl (and most programming languages), so I would suggest you re-do orthomclBlastParser and choose a 3-4 letter word (as the manual explains) as the taxon name.

ADD COMMENT
0
Entering edit mode
6.0 years ago
vjain • 0

I am facing a similar kind of error when I run orthomclBlasrParser:

bin/orthomclBlastParser ortholog/out.tab my_orthomcl_dir/compliantFasta/Blast/ >> similarSequences.txt acquiring genes from arab.fasta couldn't find taxon for gene 'TRINITY_DN10001_c0_g1_i2.p1' at /home/mobashirm/Documents/orthomclSoftware-v2.0.9/bin/orthomclBlastParser line 106, <f> line 1.

I run Blast between the following two files:

1) The database arab.fast file looks like:

arab|NP_001030613.1 MLLSALLTSVGINLGLCFLFFTLYSILRKQPSNVTVYGPRLVKKDGKSQQSNEFNLERLLPTAGWVKRALEPTNDEILSN arab|NP_001030614.1 MEMEEGASGVGEKIKIGVCVMEKKVFSAPMGEILDRLQSFGEFEILHFGDKVILEDPIESWPICDCLIAFHSSGYPLEKA

2) My raw file against which blast is performed:

TRINITY_DN10001_c0_g1_i1.p1 TRINITY_DN10001_c0_g1~~TRINITY_DN10001_c0_g1_i1.p1 ORF type:3prime_partial len:377 (-),score=66.02 TRINITY_DN10001_c0_g1_i1:1-1128(-) MGIRSCQLIACLSALSIADAKRPTVDVAMSQAALEPPETIGGSASTQFRRSLLQAGAKSG TRINITY_DN10001_c0_g1_i2.p1 TRINITY_DN10001_c0_g1~~TRINITY_DN10001_c0_g1_i2.p1 ORF type:complete len:154 (-),score=0.19 TRINITY_DN10001_c0_g1_i2:112-573(-) MGIRSCQLIACLSALSIADAKRPTVDVAMSQAALEPPETIGGSASTQFRRSLLQAGAKSG TSGCKWAGAAAGCIADGSFFQSKGGFEPMDEFLACLNATTSGADLSCSPGETCCTPYLHY SSLHKQYIHSTIVKKCTFPRHIMSAVVLVYSTW*

The output file after Blast is:

TRINITY_DN10001_c0_g1_i2.p1 arab|NP_180470.2 27.08 96 63 2 22 110 29 124 3e-06 29.6 TRINITY_DN10001_c0_g1_i2.p1 arab|NP_191320.1 31.58 57 31 1 20 76 38 86 7e-06 28.9 TRINITY_DN10002_c0_g2_i1.p1 arab|NP_198034.2 31.43 70 45 1 47 116 328 394 3e-08 35.8

Pleas help me sort this error.

ADD COMMENT
0
Entering edit mode

Please create a new post if you have an error. This should not be posted as an answer to an old existing question. When you do that come back here and delete this post.

ADD REPLY

Login before adding your answer.

Traffic: 1951 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6