Question: Annotation problem while using Rsubread
1
gravatar for gokberk
17 months ago by
gokberk60
gokberk60 wrote:

Hi all,

I've been trying to read some bam files using Rsubread. As for the annotation file, I converted a gff file to gtf using gffread. Here is how my gtf file looks like:

NC_012670.1 RefSeq  exon    543 614 .   +   .   transcript_id "rna76575";
NC_012670.1 RefSeq  exon    615 1562    .   +   .   transcript_id "rna76576";
NC_012670.1 RefSeq  exon    1563    1631    .   +   .   transcript_id "rna76577";
NC_012670.1 RefSeq  exon    1632    3191    .   +   .   transcript_id "rna76578";
NC_012670.1 RefSeq  exon    3192    3266    .   +   .   transcript_id "rna76579";
NC_012670.1 RefSeq  CDS 3269    4223    .   +   0   transcript_id "gene33355"; gene_id "gene33355"; gene_name "ND1";
NC_012670.1 RefSeq  exon    4224    4291    .   +   .   transcript_id "rna76580";
NC_012670.1 RefSeq  exon    4289    4360    .   -   .   transcript_id "rna76581";
NC_012670.1 RefSeq  exon    4362    4429    .   +   .   transcript_id "rna76582";
NC_012670.1 RefSeq  CDS 4430    5471    .   +   0   transcript_id "gene33356"; gene_id "gene33356"; gene_name "ND2";
NC_012670.1 RefSeq  exon    5472    5538    .   +   .   transcript_id "rna76583";
NC_012670.1 RefSeq  exon    5546    5614    .   -   .   transcript_id "rna76584";
NC_012670.1 RefSeq  exon    5616    5688    .   -   .   transcript_id "rna76585";
NC_012670.1 RefSeq  exon    5721    5782    .   -   .   transcript_id "rna76586";
NC_012670.1 RefSeq  exon    5783    5853    .   -   .   transcript_id "rna76587";
NC_012670.1 RefSeq  CDS 5858    7396    .   +   0   transcript_id "gene33357"; gene_id "gene33357"; gene_name "COX1";
NC_012670.1 RefSeq  exon    7398    7469    .   -   .   transcript_id "rna76588";
NC_012670.1 RefSeq  exon    7471    7538    .   +   .   transcript_id "rna76589";
NC_012670.1 RefSeq  CDS 7540    8223    .   +   0   transcript_id "gene33358"; gene_id "gene33358"; gene_name "COX2";
NC_012670.1 RefSeq  exon    8297    8362    .   +   .   transcript_id "rna76590";
NC_012670.1 RefSeq  CDS 8364    8570    .   +   0   transcript_id "gene33359"; gene_id "gene33359"; gene_name "ATP8";
NC_012670.1 RefSeq  CDS 8525    9205    .   +   0   transcript_id "gene33360"; gene_id "gene33360"; gene_name "ATP6";
NC_012670.1 RefSeq  CDS 9205    9988    .   +   0   transcript_id "gene33361"; gene_id "gene33361"; gene_name "COX3";

So, I run the following command in R:

rsubread.counts<-featureCounts(files=E08_bam,
                               annot.ext="/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf",
                               isGTFAnnotationFile = TRUE,
                               GTF.featureType="exon",
                               GTF.attrType="gene_id",
                               useMetaFeatures = TRUE,
                               allowMultiOverlap = FALSE)

And receive the following error:

//================================= Running ==================================\\
||                                                                            ||
|| Load annotation file Macaca_fascicularis_5.0_genomic.gtf ...               ||

ERROR: failed to find the gene identifier attribute in the 9th column of the provided GTF file.
The specified gene identifier attribute is 'gene_id' 
An example of attributes included in your GTF annotation is 'transcript_id "id177006"; gene_name "LOC102131469";' 
The program has to terminate.

Error in featureCounts(files = E08_bam, annot.ext = "/media/gokberk/DATA/Thesis/Data/Annotations/Macaca_fascicularis_5.0_genomic.gtf",  : 
  No counts were generated.

I was wondering what exactly is the problem with my gtf file (apparently gene_id is missing for some rows, but is that it) and how could I fix it. I'd be more than happy if you could help me.

Cheers, Gökberk

gff annotation gtf • 803 views
ADD COMMENTlink written 17 months ago by gokberk60

Dear Goekberg, your GTF doesn't seem to have gene annotation at all - only CDS and exons (as seen in col 3). Besides the exons not having gene_ids, the exons don't match the CDS ranges at all. This seems very odd. In addition, exons and CDS are second or third level annotation - a gene should be at the lowest level. I'd suggest you review your gff file, either the conversion was wrong or the GFF is invalid in the first place.

ADD REPLYlink written 17 months ago by Carambakaracho2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1581 users visited in the last hour