I'm new to the biostars community and also to the bioinformatics field but I already have a question. Currently I face a problem when I try to run pfam_scan.pl. After translating all CDS from my input GFF3 genome file by using gffread, I want to identify all domains in my proteome with PfamScan. But the script stops immediately printing an error:
'FATAL: Sequence identifiers must be unique. Your fasta file contains two sequences with the same id'
Sure, this error message is self-explanatory but I don't know how to solve this issue. Should I alter the options in gffread or is the GFF3 file which I obtained from ensembl.org not suited for this purpose? Or could these sequences with same IDs occur due to trans-splicing? I don't think that I can just delete every problematic transcript entry in my fasta file as this would surely introduce some bias to my data.
Any help is much appreciated!