Question: GTF file for Bombyx Mori
0
gravatar for Sandeep
6 months ago by
Sandeep250
Manipal, India
Sandeep250 wrote:

I am trying to analyse the small RNA sequencing data for Bombyx mori. I have used the ASM15162v1 version of genome as reference. Also, I wanted to count the features of all mapped reads by using the gff file provided for the same available at Ensembl Using htseq or featureCount I am unable to count. It shows an error saying it is a non-standard gff file. I have pasted the three line example of the gff file below. It is indeed not like the standard gtf available for humans.

NW_004581272.1  BestRefSeq  exon    5598    5608    .   -   .   ID=id5;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;end_range=5608,.;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference=similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1
NW_004581272.1  BestRefSeq  exon    5568    5595    .   -   .   ID=id6;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference=similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1
NW_004581272.1  BestRefSeq  exon    4884    5565    .   -   .   ID=id7;Parent=rna0;Dbxref=GeneID:100141452,Genbank:NM_001114933.1;Note=The RefSeq transcript has 3 frameshifts and aligns at 7%25 coverage compared to this genomic sequence;exception=annotated by transcript or proteomic data;gbkey=mRNA;gene=Titin1;inference= similar to RNA sequence%2C mRNA (same species):RefSeq:NM_001114933.1;partial=true;product=Titin-like protein;transcript_id=NM_001114933.1

Are there any tools to convert the above gff format to the one such as human gtf?

ADD COMMENTlink modified 6 months ago by Carambakaracho620 • written 6 months ago by Sandeep250

The command for htseq i use is htseq-count -m intersection-nonempty -q -t exon -s no PsC.bam ASM15162v1.gff Which throws an error message

"Error occured when processing GFF file (line 24 of file ASM15162v1.gff):
  Feature id5 does not contain a 'gene_id' attribute"

featureCount also throws a warning message

Warning: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id'
ADD REPLYlink written 6 months ago by Sandeep250
2
gravatar for Carambakaracho
6 months ago by
Switzerland
Carambakaracho620 wrote:

Hi Sandeep,

the file you have is gff3 format and htscount seems to expect gff2 format (gtf is pretty much gff2). I'm surprised it wouldn't work with gff3, either it requires some annotation field or fails on the header?

Anyway, you might try gffread to convert between the formats, but be aware that gff2/gtf can be dangerous to your health ;-)

ADD COMMENTlink written 6 months ago by Carambakaracho620

Haha... gff2 injurious to your health :) Thanks for your input. gffread seems to work well in converting the gff to gtf, but the gene_id attribute is missing for exons. It is present for CDS in the third column.

ADD REPLYlink written 6 months ago by Sandeep250

you might be able to transfer the gene_id from CDS to exons (or convert the parent attribute to gene_id). This is one of the ugly conversion limitations. Btw, the problem with GFF files is almost always the 9th column - this is where the specs get really "flexible"

ADD REPLYlink written 6 months ago by Carambakaracho620

That precisely was the next plan of action. Thanks for the input.

ADD REPLYlink written 6 months ago by Sandeep250
1
gravatar for Vijay Lakhujani
6 months ago by
Vijay Lakhujani3.4k
India
Vijay Lakhujani3.4k wrote:

You can download it from ensembl ftp , the link is here

ADD COMMENTlink modified 6 months ago • written 6 months ago by Vijay Lakhujani3.4k

Additionally, please share the exact command for counting features and the exact error message!

ADD REPLYlink written 6 months ago by Vijay Lakhujani3.4k

I have added the exact error message in the comments section of my question.

ADD REPLYlink written 6 months ago by Sandeep250

thanks, I didn't know they still maintain a gtf section...

ADD REPLYlink written 6 months ago by Carambakaracho620

I have downloaded the gff file, but the problem seems to be in the 9th column of the gff file.

ADD REPLYlink written 6 months ago by Sandeep250
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2191 users visited in the last hour