GFF/GTF file error / featureCounts
0
0
Entering edit mode
12 months ago
Pegasus ▴ 100

Hi all,

I am trying to generate a count.matrix for sorted bam files, using featureCounts on linux.

I have a non-modal organism (bacteria), so I generated the annotation.file using both PROKKA and RAST.

I used all the following files in featurecounts;

PROKKA.gff,

RAST.gff

RAST.gtf

gffread converted-PROKKA.gtf file

But still facing the same issue/error:

**ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..**

I've tried using different flags like -F

I've tried replacing gene_id by ID or locus_tag or gene

I've tried replacing IDs with gene_ids using sed command

I read all posts related to this problem, but I am still facing the same error.

Here is part of prokka.gff file;

caffold_1   Aragorn:001002  tRNA    1914    1989    .   +   .   ID=KKGOPEHD_00011;inference=COORDINATES:profile:Aragorn:001002;locus_tag=KKGOPEHD_00011;product=tRNA-Pro(tgg)
Scaffold_1  Prodigal:002006 CDS 2164    2808    .   +   0   ID=KKGOPEHD_00012;inference=ab initio prediction:Prodigal:002006;locus_tag=KKGOPEHD_00012;product=hypothetical protein
Scaffold_1  Prodigal:002006 CDS 3037    3540    .   +   0   ID=KKGOPEHD_00013;eC_number=3.1.22.4;Name=ruvC;db_xref=COG:COG0817;gene=ruvC;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P0A814;locus_tag=KKGOPEHD_00013;product=Crossover junction endodeoxyribonuclease RuvC
Scaffold_1  Prodigal:002006 CDS 3537    4151    .   +   0   ID=KKGOPEHD_00014;eC_number=3.6.4.12;Name=ruvA;db_xref=COG:COG0632;gene=ruvA;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P0A809;locus_tag=KKGOPEHD_00014;product=Holliday junction ATP-dependent DNA helicase RuvA
Scaffold_1  Prodigal:002006 CDS 4202    5209    .   +   0   ID=KKGOPEHD_00015;eC_number=3.6.4.12;Name=ruvB_1;db_xref=COG:COG2255;gene=ruvB_1;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:O32055;locus_tag=KKGOPEHD_00015;product=Holliday junction ATP-dependent DNA helicase RuvB

Here is part of RAST.gff file;

Scaffold_1  FIG CDS 762 875 .   -   0   ID=fig|6666666.1005592.peg.1;Name=hypothetical protein
Scaffold_1  FIG CDS 2131    2808    .   +   1   ID=fig|6666666.1005592.peg.2;Name=BofC protein
Scaffold_1  FIG CDS 3037    3540    .   +   1   ID=fig|6666666.1005592.peg.3;Name=Crossover junction endodeoxyribonuclease RuvC (EC 3.1.22.4);Ontology_term=KEGG_ENZYME:3.1.22.4
Scaffold_1  FIG CDS 3537    4151    .   +   0   ID=fig|6666666.1005592.peg.4;Name=Holliday junction ATP-dependent DNA helicase RuvA (EC 3.6.4.12);Ontology_term=KEGG_ENZYME:3.6.4.12
Scaffold_1  FIG CDS 4202    5209    .   +   2   ID=fig|6666666.1005592.peg.5;Name=Holliday junction ATP-dependent DNA helicase RuvB (EC 3.6.4.12);Ontology_term=KEGG_ENZYME:3.6.4.12
Scaffold_1  FIG CDS 5438    7531    .   +   2   ID=fig|6666666.1005592.peg.6;Name=hypothetical protein
Scaffold_1  FIG CDS 7539    8567    .   +   0   ID=fig|6666666.1005592.peg.7;Name=S-adenosylmethionine:tRNA ribosyltransferase-isomerase (EC 2.4.99.17);Ontology_term=KEGG_ENZYME:2.4.99.17
Scaffold_1  FIG CDS 8588    9727    .   +   2   ID=fig|6666666.1005592.peg.8;Name=Queuine tRNA-ribosyltransferase (EC 2.4.2.29);Ontology_term=KEGG_ENZYME:2.4.2.29
Scaffold_1  FIG CDS 9762    10097   .   +   0   ID=fig|6666666.1005

Many thanks

RNA-SEQ • 1.1k views
ADD COMMENT
1
Entering edit mode

Could you also post the command you're using? Are you specifying the feature and attribute you want with e.g. -t CDS -g ID?

ADD REPLY
0
Entering edit mode

Hi barslmn,

here is the command line; should I include the gene_length in the command line?

featureCounts -p -t gene -g gene_id -a Rast.gff -o FCcount.default.txt md1.sorted.bam md2.sorted.bam md3.sorted.bam md4.sorted.bam md5.sorted.bam mc1.sorted.bam mc2.sorted.bam mc3.sorted.bam mc4.sorted.bam mc5.sorted.bam
ADD REPLY
0
Entering edit mode

Your gff snippets doesn't have the feature gene nor the attribute gene_id. Are these in your files?

ADD REPLY
0
Entering edit mode

Unfortunately No, I have only the product and the ID numbers. I am still confused why both annotation tools did not include such data in the generated gff file.

ADD REPLY
0
Entering edit mode

Can you test with -t CDS and -g ID the ones you have in your annotation file?

ADD REPLY

Login before adding your answer.

Traffic: 1434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6