Question: GTF file for Bombyx Mori
0
gravatar for Sandeep
17 days ago by
Sandeep250
Manipal, India
Sandeep250 wrote:

I am trying to analyse the small RNA sequencing data for Bombyx mori. I have used the ASM15162v1 version of genome as reference. Also, I wanted to count the features of all mapped reads by using the gff file provided for the same available at Ensembl Using htseq or featureCount I am unable to count. It shows an error saying it is a non-standard gff file. I have pasted the three line example of the gff file below. It is indeed not like the standard gtf available for humans.

NW_004581272.1  BestRefSeq  exon    5598    5608    .   -   .   ID=id5;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;end_range=5608,.;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference=similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1
NW_004581272.1  BestRefSeq  exon    5568    5595    .   -   .   ID=id6;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference=similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1
NW_004581272.1  BestRefSeq  exon    4884    5565    .   -   .   ID=id7;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference= similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1

Are there any tools to convert the above gff format to the one such as human gtf?

ADD COMMENTlink modified 16 days ago by Carambakaracho270 • written 17 days ago by Sandeep250

The command for htseq i use is htseq-count -m intersection-nonempty -q -t exon -s no PsC.bam ASM15162v1.gff Which throws an error message

"Error occured when processing GFF file (line 24 of file ASM15162v1.gff):
  Feature id5 does not contain a 'gene_id' attribute"

featureCount also throws a warning message

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
ADD REPLYlink written 16 days ago by Sandeep250
2
gravatar for Carambakaracho
16 days ago by
Switzerland
Carambakaracho270 wrote:

Hi Sandeep,

the file you have is gff3 format and htscount seems to expect gff2 format (gtf is pretty much gff2). I'm surprised it wouldn't work with gff3, either it requires some annotation field or fails on the header?

Anyway, you might try gffread to convert between the formats, but be aware that gff2/gtf can be dangerous to your health ;-)

ADD COMMENTlink written 16 days ago by Carambakaracho270

Haha... gff2 injurious to your health :) Thanks for your input. gffread seems to work well in converting the gff to gtf, but the gene_id attribute is missing for exons. It is present for CDS in the third column.

ADD REPLYlink written 16 days ago by Sandeep250

you might be able to transfer the gene_id from CDS to exons (or convert the parent attribute to gene_id). This is one of the ugly conversion limitations. Btw, the problem with GFF files is almost always the 9th column - this is where the specs get really "flexible"

ADD REPLYlink written 16 days ago by Carambakaracho270

That precisely was the next plan of action. Thanks for the input.

ADD REPLYlink written 16 days ago by Sandeep250
1
gravatar for Vijay Lakhujani
16 days ago by
Vijay Lakhujani2.5k
India
Vijay Lakhujani2.5k wrote:

You can download it from ensembl ftp , the link is here

ADD COMMENTlink modified 16 days ago • written 16 days ago by Vijay Lakhujani2.5k

Additionally, please share the exact command for counting features and the exact error message!

ADD REPLYlink written 16 days ago by Vijay Lakhujani2.5k

I have added the exact error message in the comments section of my question.

ADD REPLYlink written 16 days ago by Sandeep250

thanks, I didn't know they still maintain a gtf section...

ADD REPLYlink written 16 days ago by Carambakaracho270

I have downloaded the gff file, but the problem seems to be in the 9th column of the gff file.

ADD REPLYlink written 16 days ago by Sandeep250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 972 users visited in the last hour