Problem with featureCounts using a GTF generated by me
1
0
Entering edit mode
2.9 years ago
arturo.marin ▴ 20

Hi, the program featureCounts give me the next error when I try to do the count with a GTF file that was generated by me:

ERROR: no features were loaded in format GTF. The annotation format can be specified by the '-F' option, and the required feature type can be specified by the '-t' option.. The porgram has to terminate.

I tried with the suggested commands,

featureCounts -F GTF -p -T 10 -t gene_id \
 -a ref/${GTF_FILE} \
 -o counts_Promastigote_vs_Haptomonas.txt \
 bams/P1.bam \
 bams/P2.bam \
 bams/P3.bam \
 bams/H1.bam \
 bams/H2.bam \
 bams/H3.bam

but it still gives the same error. I add an example of part of my GTF:

#!genome-build LPASSIMC3V1
#!genome-version LPASSIMC3V1
#!genome-date 2020-09
#!genome-build-accession NaN
#!genebuild-last-updated 2020-09
jcf7180000024611    AUGUSTUS    gene    2158    2691    1   -   .   gene_id "LPASSIMC3V1_1";
jcf7180000024611    AUGUSTUS    mRNA    2158    2691    1   -   .   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    stop_codon  2158    2160    .   -   0   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    CDS 2161    2691    1   -   0   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    start_codon 2689    2691    .   -   0   gene_id "LPASSIMC3V1_1"; transcript_id "LPASSIMC3V1_1.t1";
jcf7180000024611    AUGUSTUS    gene    3930    4637    1   -   .   gene_id "LPASSIMC3V1_2"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    mRNA    3930    4637    1   -   .   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    stop_codon  3930    3932    .   -   0   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    CDS 3933    4637    1   -   0   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    start_codon 4635    4637    .   -   0   gene_id "LPASSIMC3V1_2"; transcript_id "LPASSIMC3V1_2.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    gene    5850    6671    1   -   .   gene_id "LPASSIMC3V1_3"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    mRNA    5850    6671    1   -   .   gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    stop_codon  5850    5852    .   -   0   gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    CDS 5853    6671    1   -   0   gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";
jcf7180000024611    AUGUSTUS    start_codon 6669    6671    .   -   0   gene_id "LPASSIMC3V1_3"; transcript_id "LPASSIMC3V1_3.t1"; product "hypothetical protein, unknown function [Leishmania infantum JPCM5]"; protein_id "XP_001467570";

What can this error be due to? What could I do to fix it?

Thanks,

gtf annotation rnaseq • 4.8k views
ADD COMMENT
2
Entering edit mode
2.9 years ago
ATpoint 81k

The -t must be an element of column3, so "gene" as it indicates the feature type, not gene_id.

ADD COMMENT
0
Entering edit mode

Thanks, but now it give me another error:

the feature on the 776-th line has zero coordinate or zero lengths

And the line is:

jcf7180000024759    RFAM    gene    671 571 .   -   .   gene_id "LPASSIMC3V1_ncRNA_1"; product "Small nucleolar RNA TBR2"; db_xref "RFAM:RF02786";

Can this error be because the start is a number greater than the end? If so, how could I solve it? Those values are what Infernal gives me.

ADD REPLY
1
Entering edit mode

Yes this is forbidden in GTF and GFF

ADD REPLY
1
Entering edit mode

As @Juke34 says, GTF files have some conventions you have to follow. Coordinates are 1-based and start < end. The strand coordinate will tell whether start column is actually the start or end of the interval in the genome. You may consider using dedicated tools to parse GTF from whatever your template was rather than making this yourself, to ensure output obeys conventions.

ADD REPLY
1
Entering edit mode

Thanks @Juke34 and @ATpoint. Yours solutions works for me. Without a doubt I have to read the specification of the GTF format better. In the case of VCF files it is much easier to find the document that specifies the format, but for GFF / GTF I am not sure of the information I have found.

GFF: https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

GTF: https://mblab.wustl.edu/GTF22.html

Is that information the official one?

thanks

ADD REPLY
1
Entering edit mode

Yes they are the official ones. If you want another source you can have a look here https://agat.readthedocs.io/en/latest/gxf.html I made a review that can clarify a bit more the formats.

ADD REPLY

Login before adding your answer.

Traffic: 2976 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6