Parsing GTF file - Help!
0
0
Entering edit mode
6.1 years ago
espop23 ▴ 60

I have data from gencode which looks like this: 

     chr1    ENSEMBL    gene    17369    17436    .    -    .    gene_id "ENSG00000278267.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR6859-1"; level 3;
     chr1    ENSEMBL    gene    30366    30503    .    +    .    gene_id "ENSG00000274890.1"; gene_type "miRNA"; gene_status "KNOWN"; gene_name "MIR1302-2"; level 3;
     chr1    ENSEMBL    gene    157784    157887    .    -    .    gene_id "ENSG00000222623.1"; gene_type "snRNA"; gene_status "KNOWN"; gene_name "RNU6-1100P"; level 3;

 

I have tried using gffutils, but I get an error with this code: 

    import gffutils

    db = gffutils.create_db("sRNA.gene.gtf", dbfn='sRNA.gene.gtf.db')

   print(list(db.featuretypes()))
  # ['CDS', 'exon', 'gene', 'start_codon', 'stop_codon', 'transcript']

   # Here's how to write genes out to file
   with open('sRNA.gene.gtf', 'w') as fout:
       for gene in db.features_of_type('gene'):
       fout.write(str(gene) + '\n')

 

Where it says 

ImportError: cannot import name 'feature'. 

 

Can someone please offer suggestions on the best way to parse such GTF files? 

parsing gtf python r gffutils • 3.6k views
ADD COMMENT
1
Entering edit mode

If I use your example GTF file and your example code, it works -- with the exception that the list of featuretypes is ['gene'] since only gene features are in your example GTF.

Can you provide a minimal example (complete code and input) that reproduces the error?

More generally, what is your end goal? It may not be necessary to create a database. For example, you can use gffutils just for parsing a GTF file (with the gffutils.FeatureIterator class).

Last, see some hints at A: GFFutils very slow at creating database file. Any Idea why..? for using GENCODE GTF files which now already include features for genes and transcripts.

ADD REPLY
0
Entering edit mode

Hello espop23!

It appears that your post has been cross-posted to another site: https://www.reddit.com/r/bioinformatics/comments/3rvn3g/help_parsing_gtf_file/

This is typically not recommended as it runs the risk of annoying people in both communities.

ADD REPLY

Login before adding your answer.

Traffic: 2559 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6