ncbi blast results not compatible with OrthoMCL program
4.5 years ago
xiachongjing ▴ 10

I am using blast 2.4.0+ blastp output format is -outfmt 6. Some of my results are:

PSTr|PSTG_00001T0 PSTr|PSTG_00001T0 100.000 138 0 0 1 138 1 138 3.41e-101 286 MLi|212373 gnl|MLi|212373 100.000 110 0 0 1 110 1 110 8.97e-78 226 PSTr|PSTG_00001T0 gnl|PST|PstP_06241T0 98.182 110 2 0 29 138 1 110 1.75e-77 226 PSTr|PSTG_00001T0 PSTr|PSTG_14461T0 72.993 137 36 1 1 137 1 136 4.63e-67 200 PSTr|PSTG_00001T0 gnl|PST|PstP_16337T0 72.593 135 36 1 3 137 1 134 2.45e-65 196 PSTr|PSTG_00001T0 gnl|PST|PstP_17038T0 67.669 133 41 2 6 137 46 177 3.64e-55 172

My first question is: why some of my subject IDs have gnl| in second column, some do not have?

In fact, I am running OrthoMCL, if I use the blast results above for subsequent OrthoMCL, for example, I run \$ orthomclBlastParser my.blast myadjust.directory > similarSequences.txt

Then I got error: couldn't find taxon for gene 'gnl|MLi|212373' at /path/to/orthomclBlastParser line 105, <F> line 1.

So my second question is: can I just delete string gnl| in my blast results, then continue OrthoMCL ?

I don't know the answer to your first question, but for the second, yes, I believe you can remove the "gnl|" (making a backup of the original file):

sed -i.bak s/"gnl|"/""/g my.blast


edit: for the first question, maybe it is related to how you created the blast databases? Did you concatenated and created the database all at once? What were the commands used?