PROKKA.gff file is not compatible with featureCounts
1
0
Entering edit mode
14 months ago
Pegasus ▴ 100

Hi all,

I am trying to count the number of reads that map to each gene using FeatureCounts. (RNA-Seq PE, linux)

my input;

  • GFF. file generated using Prokka
  • GTF.file generated by NCBI annotation
  • Sorted.bam files generated by bowtie2 and samtools.

When I used gtf.file generated by NCBI, featurecounts run without any issue, however, I am interested in PROKKA.gff because it showed more comprehensive features and a higher mapping rate (compared to NCBI.gtf). So, when I used prokke.gff file, I received this error;

ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option..

Part of the content of prokka.gff file as below;

##sequence-region JAFJXZ010000052.1 1 250
##sequence-region JAFJXZ010000053.1 1 48755
##sequence-region JAFJXZ010000054.1 1 255
##sequence-region JAFJXZ010000055.1 1 465
##sequence-region JAFJXZ010000056.1 1 355
##sequence-region JAFJXZ010000008.1 1 618
##sequence-region JAFJXZ010000057.1 1 255
##sequence-region JAFJXZ010000058.1 1 271
##sequence-region JAFJXZ010000059.1 1 223
##sequence-region JAFJXZ010000009.1 1 354
JAFJXZ010000010.1 Prodigal:002006 CDS 3477 3851 . - 0 ID=GOHBADNI_00001;inference=ab initio prediction:Prodigal:002006;locus_tag=GOHBADNI_00001;product=hypothetical protein
JAFJXZ010000012.1 Prodigal:002006 CDS 712 1704 . + 0 ID=GOHBADNI_00002;eC_number=2.3.1.180;Name=fabHB;db_xref=COG:COG0332;gene=fabHB;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:007600;locustag=GOHBADNI_00002;product=3-oxoacylqacyl-carrier-protein] synthase 3 protein 2
JAFJXZ010000012.1 Prodigal:002006 CDS 1834 3969 . + 0 ID=GOHBADNI_00003;eC_number=5.6.2.1;Name=topB 1;db_xref=COG:COG0550;gene=topB 1;inference=ab initio prediction:Prodigal:002006,similar to AA 

I checked several threads discussing this issue and couldn't find an appropriate answer so far.

I tried these steps to solve the error;

  • used stringtie to generate merge gtf.file (prokka.gff as input). However, I counted the same error when I used the merged.gtf file in featureCounts instead of ncbi.gtf or PROKKA.gff.

  • converted gff to gtf or gff3 > gffread resulted 0 bytes files, so I couldn't try them

Kindly provide guidance on resolving this error. Appreciate your help greatly.

featureCounts RNA-seq • 1.9k views
ADD COMMENT
2
Entering edit mode
14 months ago
Mensur Dlakic ★ 27k

I dealt with this problem, and I think the solution was two-fold. First to convert the files using AGAT, something like this:

agat_convert_sp_gff2gtf.pl --gff original_functional_annotation.gff -o new_functional_annotation.gtf --relax

I think the rest was replacing CDS with gene:

perl -pi -e 's/CDS/gene/g' new_functional_annotation.gtf

There may be something else left to do, but for the moment it escapes me. Best way to find out is to compare the GTF file created above to the NCBI GTF file that worked.

ADD COMMENT
0
Entering edit mode

Thank you Mensur,

Unfortunately, I could not install AGAT into my linux, since it is not local and doesn't support conda nor docker,also other methods were not successful; Using Singularity Old school - Manually

Is there any alternative approach, for example, can I use R to do such task

Thanks

ADD REPLY
0
Entering edit mode

I can't remember how I installed it - it might have been old school - but there are two ways listed that support conda:

conda install -c bioconda agat

The other is within old school way.

There are many other GFF -> GTF converters, but I don't know if any of them work for this particular purpose.

ADD REPLY
0
Entering edit mode

Actually, I cannot use conda on my non-local linux, also GFF -> GTF did not work.

Thank you,

ADD REPLY
0
Entering edit mode

Could you explain why you can't use conda? It's possible to install conda as a non-root user, and I've done so successfully on different HPC systems in the past.

ADD REPLY
0
Entering edit mode

As mention on the supercomputer server website;

conda installs binaries which are not optimized for the processor architecture on our clusters.

ADD REPLY
1
Entering edit mode

As these are perl scripts, there should be no issue with optimized binaries. Besides, even installing non-optimal binaries may get the job done faster than searching for alternative solutions. Finally, running featureCounts is not a demanding task, and should be doable on any personal Linux computer.

ADD REPLY
0
Entering edit mode

Can we conduct this task using R installed on my local?

ADD REPLY

Login before adding your answer.

Traffic: 1655 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6