Question: InterProScan standalone - error connected to non-unique fasta identifiers
gravatar for al-ash
3.5 years ago by
al-ash140 wrote:

Problem: My InterProScan with nucleotide fastas as input among which there are multiple fastas with non-unique names consistently returns no output

Description: I'm running InterProScan (InterProScan-5.21-60.0) search on linux in standalone mode. In test searches, when I'm looking only for GO terms and search Pfam database and I use the test multifasta provided in InterProScan package (test_nt_redundant.fasta) which includes also some fastas different in sequence but with non-unique names (see below), the analysis runs without any problems. -i test_nt_redundant.fasta -b output -goterms -appl Pfam  -t n

The fasta headers in the test file are:

>ENA|AACH01000026|AACH01000026.1 Saccharomyces mikatae IFO 1815 YM4906-Contig2858, whole genome shotgun sequence.
>ENA|AACH01000027|AACH01000027.2 Saccharomyces mikatae IFO 1815 YM4906-Contig2858, whole genome shotgun sequence.
>reverse translation of P22298
>reverse translation of P22298

However, when I run the same analysis with a set of 15 fastas which I'd like to annotate and which contains also some fastas with non-unique identifiers, I'm consistently receiving following massage and interproscan ends without any output:

Found 3 non unique identifier(s). These identifiers do have different sequences, within the FASTA nucleotide sequence input file.
    Please find below a list of detected identifiers:
    InterProScan will shutdown, because there is no way to map nucleic sequences and predicted proteins.

Remarkably, even the returned list of non-unique identifiers is not complete. (see below for the list of fasta headers in the 15fasta set):


Additionaly, when I remove from the 15fasta set the non-unique fastas, the analysis runs without any problem - so I guess the problem is somehow connected to the number of non-unique fasta identifiers in the input.

I'm wondering what might be the source of this error and how to solve it? Thanks in advance for any hints.

ADD COMMENTlink written 3.5 years ago by al-ash140

Just add a unique identifier to non-unique fasta headers. It makes sense to stop on non-unique identifiers since if IDs are not unique, you wouldn't be able to unambiguously associate the results with a sequence.

ADD REPLYlink written 3.5 years ago by Jean-Karim Heriche22k

Thanks for the reply! I thought that InterProScan should be able to take care of the non-unique identifiers (e.g. by adding number suffix) but now I went through the manual once again and indeed it is not ( The reason, why it did not return error with the sample set was, that the two sequences with identical identifiers in this multifasta has also identical sequences in which case InterProScan just merges it into one sequence.

ADD REPLYlink written 3.5 years ago by al-ash140
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1178 users visited in the last hour