Make a GFF-file compatible for reference database
1
0
Entering edit mode
4.2 years ago

I am doing an mRNA analysis (mRNA from mice) and want to use GENE-counter for getting raw counts. The manual says that I need a reference genome and a annotation file in GFF3-format, which I will use to create a reference database in MySQL. Now I have tried reference and GFF from both refseq (from NCBI) and ensembl, and none of them works. I get an error telling me that there are duplicates in the GFF-file and hence it is not in GFF3-format (though it is according to the files). In the readme file coming together with the GFF-files from ensembl I read that: "Some validators may warn about duplicated identifiers for CDS features. This is to allow split features to be grouped." OKEJ so this is a known phenomenon, but what can I do about it? Any advice? To sort the file and use the uniq command did not help.

Update: I am pretty sure that the problem is that some of the entries in the CDS feature have the same attribute value. Do anyone know a way to make the m unique, or can I just remove the attributes from that feature? Will that impact the counts I get?

RNA-Seq MySQL GENE-counter GFF Ensembl • 1.6k views
ADD COMMENT
0
Entering edit mode

Which GFF file are you using from Ensembl? If you use the chr_patch_hapl_scaff file this will contain duplicated gene names due to the genes on the patches and haplotypes. Could this be the problem?

ADD REPLY
0
Entering edit mode

I am using the one called just Mus_musculus.GRCm38.99.gff3 and the primary assembly as reference genome. So I guess that is not the problem...

ADD REPLY
1
Entering edit mode

try Mus_musculus.GRCm38.99.chr.gff3.gz

ADD REPLY
0
Entering edit mode

You mean just the same file but zipped? Does not work... Thanks anyway!

ADD REPLY
0
Entering edit mode

Sorry, I put in the wrong thing. Edited my comment now. I meant the one with chr in the name, that should be just the primary assembly.

ADD REPLY
0
Entering edit mode

No worries! No difference I'm afraid. It gives me the same error. The weird thing is that their warn for this in the readme file, but there is no solution given...

ADD REPLY
0
Entering edit mode
4.2 years ago
Juke34 8.5k

You could try agat_convert_sp_gxf2gxf.pl from AGAT it should be able to fix your gff file.

ADD COMMENT
0
Entering edit mode

I really thought this would solve it, but now I have tried both the script you recomended and the one called remove redundants, and mysql still does not accept the file. When I run the redundant-script it says that there are no redundancies. I do not get this at all...

ADD REPLY
0
Entering edit mode

Still the same warning from GENE-counter?

ADD REPLY
0
Entering edit mode

Yep. I think the problem is that some of the CDS-features have the same attribute value. For this script that does not count as redundancy. Though I can not find away to make them unique.

ADD REPLY
0
Entering edit mode

Do you know if I can just delete the attributes for the CDS feature, or will that impact the counts I get?

ADD REPLY
0
Entering edit mode

Difficult to say I don’t know how works GENE-counter

ADD REPLY
0
Entering edit mode

OK, thank you for your response :)

ADD REPLY

Login before adding your answer.

Traffic: 1964 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6