Gffcompare issue: 0 reference transcripts loaded
0
0
Entering edit mode
3.9 years ago
nattzy94 ▴ 50

I am trying to use gffcompare to compare my assembled transcriptome to a reference gtf that contains information about small open reading frames (sORFs). The reference gtf was obtained by processing downloaded data from several sORF databases and is called all_38.gtf.

I used the command: gffcompare -r path/to/all_38.gtf -o /output_folder my_transcriptome.gtf

mytranscriptome.gtf is generated by assembling my Bam files with reference to ensembl hg38 v99 reference.

Upon completion, the output of the error message log file reads:

0 reference transcripts loaded.
  237788 query transfrags loaded.
  2714 duplicate query transfrags discarded.

The expected output files are generated except for the .refmap file which I need for downstream analysis. I assume this is because 0 reference transcripts were loaded into the program. Has anyone encountered similar issues? And is possibly something wrong with my custom reference gtf file?

The reference gtf file can be found attached to this GitHub issue.

Assembly • 1.1k views
ADD COMMENT
0
Entering edit mode

Give us a sample of your gtf file.

ADD REPLY
0
Entering edit mode

The gtf used as the reference can be found here.

Let me know if the link doesn't work.

The first few lines of my input my_transcriptome.gtf are:

# StringTie version 2.1.2
GL000213.1  StringTie   transcript  71696   71917   1000    -   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.1"; 
GL000213.1  StringTie   exon    71696   71917   1000    -   .   gene_id "MSTRG.1"; transcript_id "MSTRG.1.1"; exon_number "1"; 
GL000218.1  StringTie   transcript  38800   41022   1000    -   .   gene_id "MSTRG.2"; transcript_id "MSTRG.2.1"; 
GL000218.1  StringTie   exon    38800   39203   1000    -   .   gene_id "MSTRG.2"; transcript_id "MSTRG.2.1"; exon_number "1";
ADD REPLY
0
Entering edit mode

Your problem is most likely related to the sequence identifiers (1st column) from your file that do not match with any of the sequence identifiers from the reference.

ADD REPLY
0
Entering edit mode

Sorry, I realised I pasted an older version of my_transcriptome.gtf.

The updated gtf looks like this:

# StringTie version 2.1.2
1       StringTie       transcript      11869   14409   1000    +       .       gene_id "MSTRG.1"; transcript_id "ENST00000456328"; gene_name "DDX11L1"; ref
_gene_id "ENSG00000223972";
1       StringTie       exon    11869   12227   1000    +       .       gene_id "MSTRG.1"; transcript_id "ENST00000456328"; exon_number "1"; gene_name "DDX1
1L1"; ref_gene_id "ENSG00000223972";
1       StringTie       exon    12613   12721   1000    +       .       gene_id "MSTRG.1"; transcript_id "ENST00000456328"; exon_number "2"; gene_name "DDX1
1L1"; ref_gene_id "ENSG00000223972";
1       StringTie       exon    13221   14409   1000    +       .       gene_id "MSTRG.1"; transcript_id "ENST00000456328"; exon_number "3"; gene_name "DDX1
1L1"; ref_gene_id "ENSG00000223972";
1       StringTie       transcript      12010   14409   1000    +       .       gene_id "MSTRG.1"; transcript_id "ENST00000450305"; gene_name "DDX11L1"; ref
_gene_id "ENSG00000223972";

The chromosome identifier should be the same now. I ran gffcompare with the old and new version of my_transcriptome.gtf but run into the same errors.

ADD REPLY

Login before adding your answer.

Traffic: 2084 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6