Question: Loading GTF with GenomicFeatures makeTxDb gives an empty TxDb object
0
gravatar for nlehmann
11 weeks ago by
nlehmann90
France
nlehmann90 wrote:

Hello,

I am trying to load a GTF file processed through a de novo genome annotation reconstruction tool (StringTie or Scallop) and Gffcompare in GenomicFeatures. For that, I use txdb <- makeTxDbFromGFF(gffcmp.annotated.gtf, format="gtf"). I can load the file, except that it results in an empty TxDb object. Eg:

> transcripts(txdb)
GRanges object with 0 ranges and 2 metadata columns:
seqnames    ranges strand |     tx_id     tx_name
  <Rle> <IRanges>  <Rle> | <integer> <character>
  -------
seqinfo: no sequences

If we have a look at the GTF:

> head gffcmp.annotated.gtf
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XM_025152731.1"; class_code "n"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   33187   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "2";
chr1    scallop exon    33287   35838   .   +   .   transcript_id "gene.3.0.7"; gene_id "gene.3.0"; exon_number "3";
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XM_025152731.1"; class_code "j"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   35838   .   +   .   transcript_id "gene.3.0.6"; gene_id "gene.3.0"; exon_number "2";
chr1    scallop transcript  26467   35838   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; gene_name "CLC2DL5"; xloc "XLOC_000001"; cmp_ref "XR_003076321.1"; class_code "c"; tss_id "TSS1";
chr1    scallop exon    26467   27503   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; exon_number "1";
chr1    scallop exon    32230   32331   .   +   .   transcript_id "gene.3.0.12"; gene_id "gene.3.0"; exon_number "2";

The file gffcmp.annotated.gtf has 1,218,427 lines. I have no UTR regions, only "transcript" and "exon".

Can you see a reason why the TxDb object is empty ? What could I change ?

ADD COMMENTlink modified 11 weeks ago • written 11 weeks ago by nlehmann90

Hey, could you please try the following:

makeTxDbFromGFF(gffcmp.annotated.gtf, format = 'auto')

Also, how was the GTF produced? Is it genuinely tab-delimited?

ADD REPLYlink written 11 weeks ago by Kevin Blighe61k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1929 users visited in the last hour