GTF file for Bombyx Mori
2
0
Entering edit mode
5.9 years ago
Sandeep ▴ 260

I am trying to analyse the small RNA sequencing data for Bombyx mori. I have used the ASM15162v1 version of genome as reference. Also, I wanted to count the features of all mapped reads by using the gff file provided for the same available at Ensembl Using htseq or featureCount I am unable to count. It shows an error saying it is a non-standard gff file. I have pasted the three line example of the gff file below. It is indeed not like the standard gtf available for humans.

NW_004581272.1  BestRefSeq  exon    5598    5608    .   -   .   ID=id5;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;end_range=5608,.;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference=similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1
NW_004581272.1  BestRefSeq  exon    5568    5595    .   -   .   ID=id6;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference=similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1
NW_004581272.1  BestRefSeq  exon    4884    5565    .   -   .   ID=id7;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference= similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1

Are there any tools to convert the above gff format to the one such as human gtf?

miRNA small RNA bombyx mori smallrna seq • 2.3k views
ADD COMMENT
0
Entering edit mode

The command for htseq i use is htseq-count -m intersection-nonempty -q -t exon -s no PsC.bam ASM15162v1.gff Which throws an error message

"Error occured when processing GFF file (line 24 of file ASM15162v1.gff):
  Feature id5 does not contain a 'gene_id' attribute"

featureCount also throws a warning message

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
ADD REPLY
2
Entering edit mode
5.9 years ago
Carambakaracho ★ 3.2k

Hi Sandeep,

the file you have is gff3 format and htscount seems to expect gff2 format (gtf is pretty much gff2). I'm surprised it wouldn't work with gff3, either it requires some annotation field or fails on the header?

Anyway, you might try gffread to convert between the formats, but be aware that gff2/gtf can be dangerous to your health ;-)

ADD COMMENT
0
Entering edit mode

Haha... gff2 injurious to your health :) Thanks for your input. gffread seems to work well in converting the gff to gtf, but the gene_id attribute is missing for exons. It is present for CDS in the third column.

ADD REPLY
0
Entering edit mode

you might be able to transfer the gene_id from CDS to exons (or convert the parent attribute to gene_id). This is one of the ugly conversion limitations. Btw, the problem with GFF files is almost always the 9th column - this is where the specs get really "flexible"

ADD REPLY
0
Entering edit mode

That precisely was the next plan of action. Thanks for the input.

ADD REPLY
1
Entering edit mode
5.9 years ago

You can download it from ensembl ftp , the link is here

ADD COMMENT
0
Entering edit mode

Additionally, please share the exact command for counting features and the exact error message!

ADD REPLY
0
Entering edit mode

I have added the exact error message in the comments section of my question.

ADD REPLY
0
Entering edit mode

thanks, I didn't know they still maintain a gtf section...

ADD REPLY
0
Entering edit mode

I have downloaded the gff file, but the problem seems to be in the 9th column of the gff file.

ADD REPLY

Login before adding your answer.

Traffic: 1790 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6