I am doing an mRNA analysis (mRNA from mice) and want to use GENE-counter for getting raw counts. The manual says that I need a reference genome and a annotation file in GFF3-format, which I will use to create a reference database in MySQL. Now I have tried reference and GFF from both refseq (from NCBI) and ensembl, and none of them works. I get an error telling me that there are duplicates in the GFF-file and hence it is not in GFF3-format (though it is according to the files). In the readme file coming together with the GFF-files from ensembl I read that: "Some validators may warn about duplicated identifiers for CDS features. This is to allow split features to be grouped." OKEJ so this is a known phenomenon, but what can I do about it? Any advice? To sort the file and use the uniq command did not help.
Update: I am pretty sure that the problem is that some of the entries in the CDS feature have the same attribute value. Do anyone know a way to make the m unique, or can I just remove the attributes from that feature? Will that impact the counts I get?