Filtering GFF3 file
0
0
Entering edit mode
2.3 years ago
Ric ▴ 370

I have the following gff3 file. Some of the mRNA contains an attribute Note and some no.

...
NbV1Ch07    AUGUSTUS    gene    31690934    31699151    .   -   .   ID=g88996;Name=POLX_2333
NbV1Ch07    AUGUSTUS    mRNA    31690934    31699151    .   -   .   ID=g88996.t1;Parent=g88996;Note=Retrovirus-related Pol polyprotein from transposon TNT 1-94;Dbxref=CDD:cd09272,InterPro:IPR013103,InterPro:IPR029472,MOBIDBLITE:mobidb-lite,PFAM:PF07727,PFAM:PF14244,SUPERFAMILY:SSF56672
['Retrovirus-related Pol polyprotein from transposon TNT 1-94']
...

I wrote the below script which failed to filter genes into two files. If Note is in mRNA attribute then write gene, mRNA, exon, CDS into keep.gff3 file. If Note is not in mRNA attribute then write gene, mRNA, exon, CDS into reject.gff3 file.

#!/usr/bin/python3

import click
import gffutils

@click.command()
@click.option('--gff3', help="Provide GFF3 file", required=True)
@click.option('--keep', help="Keep GFF3 file", required=True)
@click.option('--reject', help="Reject GFF3 file", required=True)
def run(gff3, keep, reject):
    #db = gffutils.create_db(gff3, dbfn='data/test.db', force=True, keep_order=True, merge_strategy='merge',
    #                        sort_attribute_values=True)

    db = gffutils.FeatureDB('data/test.db', keep_order=True)

    with open(keep, 'w') as out_keep:
        with open(reject, 'w') as out_reject:
            for gene in db.features_of_type('gene', order_by='start'):
                print(gene)
                for i in db.children(gene, featuretype='mRNA', order_by='start'):
                    print(i)
                    try:
                        if i.attributes['Note']:
                            print(i.attributes['Note'])
                            out_keep.write(str(gene) + '\n')
                        else:
                            out_reject.write(str(gene) + '\n')
                    except KeyError:
                        continue


if __name__ == '__main__':
    run()

Unfortunately, the above script only prints in keep.gff3 file the genes and nothing else.

What did I miss?

Thank you in advance,

gffutils python gff3 annotation • 1.9k views
ADD COMMENT
0
Entering edit mode

Hello Ric ,

what your method is doing, is iterate over all gene's, finding all children of that genes that are mRNA and write all of these mRNA who have a Note attribute to keep.gff3. If Note is not a valid key, python will raise an KeyError exception, so the else block is never executed and nothing is written to reject.gff3. The correct way to check if a key exists would be if "Note" in i.attributes.

fin swimmer

ADD REPLY
0
Entering edit mode

Thank you, I changed it but both files contain only the features gene but missing mRNA, exon, CDS. How is it possible to add those ones, too?

ADD REPLY
0
Entering edit mode

I changed it but both files contain only the features gene

Of course, because you are writing only gene to the file

out_keep.write(str(gene) + '\n')
...
out_reject.write(str(gene) + '\n')

Could you please explain a little bit more about your goal? Also a larger example of your gff file would be helpful with lines you've expected in the one or in the other output file.

ADD REPLY
0
Entering edit mode

If you cross post your question please leave a note, so that other can see what is already answered!

ADD REPLY

Login before adding your answer.

Traffic: 2493 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6