Question: Parsing GTF file - Help!
0
gravatar for espop23
2.5 years ago by
espop2360
Switzerland
espop2360 wrote:

I have data from gencode which looks like this: 

     chr1    ENSEMBL    gene    17369    17436    .    -    .    gene_id "ENSG00000278267.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR6859-1"; level 3;
     chr1    ENSEMBL    gene    30366    30503    .    +    .    gene_id "ENSG00000274890.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR1302-2"; level 3;
     chr1    ENSEMBL    gene    157784    157887    .    -    .    gene_id "ENSG00000222623.1"; gene_type "snRNA"; gene_status "KNOWN"; gene_name "RNU6-1100P"; level 3;

 

I have tried using gffutils, but I get an error with this code: 

    import gffutils

    db = gffutils.create_db("sRNA.gene.gtf", dbfn='sRNA.gene.gtf.db')

   print(list(db.featuretypes()))
  # ['CDS', 'exon', 'gene', 'start_codon', 'stop_codon', 'transcript']

   # Here's how to write genes out to file
   with open('sRNA.gene.gtf', 'w') as fout:
       for gene in db.features_of_type('gene'):
       fout.write(str(gene) + '\n')

 

Where it says 

ImportError: cannot import name 'feature'. 

 

Can someone please offer suggestions on the best way to parse such GTF files? 

gffutils R parsing python gtf • 1.3k views
ADD COMMENTlink written 2.5 years ago by espop2360
1

If I use your example GTF file and your example code, it works -- with the exception that the list of featuretypes is ['gene'] since only gene features are in your example GTF.

Can you provide a minimal example (complete code and input) that reproduces the error?

More generally, what is your end goal? It may not be necessary to create a database. For example, you can use gffutils just for parsing a GTF file (with the gffutils.FeatureIterator class).

Last, see some hints at A: GFFutils very slow at creating database file. Any Idea why..? for using GENCODE GTF files which now already include features for genes and transcripts.

ADD REPLYlink written 2.5 years ago by Ryan Dale4.7k

Hello espop23!

It appears that your post has been cross-posted to another site: https://www.reddit.com/r/bioinformatics/comments/3rvn3g/help_parsing_gtf_file/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLYlink written 2.5 years ago by Pierre Lindenbaum107k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 931 users visited in the last hour