Question: Converting gff file to gtf for htseq-count
0
gravatar for natsterbug
3.0 years ago by
natsterbug0
natsterbug0 wrote:

After running TopHat2/2.1.0 on RNA-seq SE 50bp reads from S.tuberosum, I am now attempting to count reads mapping to each feature using htseq-count. Using the following command:

htseq-count -m intersection-nonempty --format=bam \
tophat_Kalkaska_control/tophat_K10C/accepted_hits.bam \
PGSC_DM_V403_genes_strand_filtered.gff

I receive the following error message:

Error occured when processing GFF file (line 3 of file PGSC_DM_V403_genes_strand_filtered.gff):
  Feature PGSC0003DME400103709 does not contain a 'gene_id' attribute
  [Exception type: ValueError, raised in count.py:53]

My understanding is that htseq is expecting a gtf file rather than the gff file I supplied. I would like to convert my gff file to gtf or modify the 9th column of the gff. A sample of my gff file is below:

##gff-version   3
ST4.03ch01      Cufflinks       mRNA    152322  153489  .       -       .       ID=PGSC0003DMT400039136;Parent=PGSC0003DMG400015133;Source_id=RNASEQ26.809.0;Mapping_depth=16.192011;Class=4;name="Defensin"
ST4.03ch01      Cufflinks       exon    153389  153489  .       -       .       ID=PGSC0003DME400103709;Parent=PGSC0003DMT400039136
ST4.03ch01      Cufflinks       exon    152322  152593  .       -       .       ID=PGSC0003DME400103710;Parent=PGSC0003DMT400039136
ST4.03ch01      Cufflinks       intron  152594  153388  .       -       .       ID=PGSC0003DMI400065839;Parent=PGSC0003DMT400039136
ST4.03ch01      BestORF CDS     152418  152576  .       -       0       ID=PGSC0003DMC400026563;Parent=PGSC0003DMT400039136;name="Defensin"
ST4.03ch01      GLEAN   mRNA    160499  160663  .       -       .       ID=PGSC0003DMT400039133;Parent=PGSC0003DMG400015132;Source_id=PGSC0003DMG000019750;Class=2;name="Defensin"
ST4.03ch01      Cufflinks       mRNA    160379  161885  .       -       .       ID=PGSC0003DMT400039134;Parent=PGSC0003DMG400015132;Source_id=RNASEQ26.803.0;Mapping_depth=35.840147;Class=2;name="Defensin"
ST4.03ch01      Cufflinks       exon    161722  161885  .       -       .       ID=PGSC0003DME400103705;Parent=PGSC0003DMT400039134
ST4.03ch01      GLEAN   exon    160499  160663  .       -       .       ID=PGSC0003DME400103707;Parent=PGSC0003DMT400039133

Is gffread PGSC_DM_V403_genes_strand_filtered.gff -T -o PGSC_DM_V403_genes_strand_filtered.gtf the appropriate course of action? Thanks, Natalie

ADD COMMENTlink modified 3.0 years ago by RamRS20k • written 3.0 years ago by natsterbug0

I don't recall all the features to gffread but it sounds about right. What do you get as a result?

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by Istvan Albert ♦♦ 79k

I apologize for the extremely tardy response. Below is the output:

ST4.03ch00 GLEAN exon 63411 63498 . + . transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN exon 66359 66816 . + . transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN CDS 63411 63498 . + 0 transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN CDS 66359 66816 . + 2 transcript_id "PGSC0003DMT400089830"; gene_id "PGSC0003DMG400039401"; ST4.03ch00 GLEAN exon 70051 70281 . + . transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN exon 72021 73032 . + . transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN exon 73103 73227 . + . transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN CDS 70051 70281 . + 0 transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN CDS 72021 73032 . + 0 transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996"; ST4.03ch00 GLEAN CDS 73103 73227 . + 2 transcript_id "PGSC0003DMT400036367"; gene_id "PGSC0003DMG400013996";

ADD REPLYlink modified 3.0 years ago • written 3.0 years ago by natsterbug0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1758 users visited in the last hour