Loading GTF with GenomicFeatures makeTxDb gives an empty TxDb object
0
1
Entering edit mode
2.4 years ago
nlehmann ▴ 140

Hello,

I am trying to load a GTF file processed through a de novo genome annotation reconstruction tool (StringTie or Scallop) and Gffcompare in GenomicFeatures. For that, I use txdb <- makeTxDbFromGFF(gffcmp.annotated.gtf, format="gtf"). I can load the file, except that it results in an empty TxDb object. Eg:

> transcripts(txdb)
GRanges object with 0 ranges and 2 metadata columns:
seqnames    ranges strand |     tx_id     tx_name
  <Rle> <IRanges>  <Rle> | <integer> <character>
  -------
seqinfo: no sequences

If we have a look at the GTF:

> head gffcmp.annotated.gtf
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XM_025152731.1"; class_code "n"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   33187   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "2";
chr1    scallop exon    33287   35838   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "3";
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XM_025152731.1"; class_code "j"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   35838   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; exon_number "2";
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XR_003076321.1"; class_code "c"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   32331   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; exon_number "2";

The file gffcmp.annotated.gtf has 1,218,427 lines. I have no UTR regions, only "transcript" and "exon".

Can you see a reason why the TxDb object is empty ? What could I change ?

gffcompare genomicfeatures maketxdb RNA-Seq • 1.0k views
ADD COMMENT
0
Entering edit mode

Hey, could you please try the following:

makeTxDbFromGFF(gffcmp.annotated.gtf, format = 'auto')

Also, how was the GTF produced? Is it genuinely tab-delimited?

ADD REPLY

Login before adding your answer.

Traffic: 640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6