Question: Error when running PfamScan: Fasta file contains two sequences with same ID
0
gravatar for philmster90
14 months ago by
philmster900 wrote:

Hello everyone,

I'm new to the biostars community and also to the bioinformatics field but I already have a question. Currently I face a problem when I try to run pfam_scan.pl. After translating all CDS from my input GFF3 genome file by using gffread, I want to identify all domains in my proteome with PfamScan. But the script stops immediately printing an error:

'FATAL: Sequence identifiers must be unique. Your fasta file contains two sequences with the same id'

Sure, this error message is self-explanatory but I don't know how to solve this issue. Should I alter the options in gffread or is the GFF3 file which I obtained from ensembl.org not suited for this purpose? Or could these sequences with same IDs occur due to trans-splicing? I don't think that I can just delete every problematic transcript entry in my fasta file as this would surely introduce some bias to my data.

Any help is much appreciated!

ADD COMMENTlink written 14 months ago by philmster900

No, you should not simply delete the redundant ones (immediately). First have a look why they are redundant. Can you track down which IDs they are and then post the relevant GFF lines for those entries?

ADD REPLYlink written 14 months ago by lieven.sterck5.5k

Thank you for the fast reply. There are a total of 67 sequence IDs occurring more than one time in the fasta file. Thus, I will just post two examples here: GFF3_example

ADD REPLYlink modified 14 months ago • written 14 months ago by philmster900

I don't see any redundancy in the GFF file at first sight. Can you post the IDs from the fasta file that are redundant (or at least the ones that are relevant for the GFF file you provided)?

ADD REPLYlink written 14 months ago by lieven.sterck5.5k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1603 users visited in the last hour