Entering edit mode
6.2 years ago
Ric
▴
440
I have the following gff3 file. Some of the mRNA contains an attribute Note and some no.
...
NbV1Ch07 AUGUSTUS gene 31690934 31699151 . - . ID=g88996;Name=POLX_2333
NbV1Ch07 AUGUSTUS mRNA 31690934 31699151 . - . ID=g88996.t1;Parent=g88996;Note=Retrovirus-related Pol polyprotein from transposon TNT 1-94;Dbxref=CDD:cd09272,InterPro:IPR013103,InterPro:IPR029472,MOBIDBLITE:mobidb-lite,PFAM:PF07727,PFAM:PF14244,SUPERFAMILY:SSF56672
['Retrovirus-related Pol polyprotein from transposon TNT 1-94']
...
I wrote the below script which failed to filter genes into two files. If Note is in mRNA attribute then write gene, mRNA, exon, CDS into keep.gff3 file.
If Note is not in mRNA attribute then write gene, mRNA, exon, CDS into reject.gff3 file.
#!/usr/bin/python3
import click
import gffutils
@click.command()
@click.option('--gff3', help="Provide GFF3 file", required=True)
@click.option('--keep', help="Keep GFF3 file", required=True)
@click.option('--reject', help="Reject GFF3 file", required=True)
def run(gff3, keep, reject):
#db = gffutils.create_db(gff3, dbfn='data/test.db', force=True, keep_order=True, merge_strategy='merge',
# sort_attribute_values=True)
db = gffutils.FeatureDB('data/test.db', keep_order=True)
with open(keep, 'w') as out_keep:
with open(reject, 'w') as out_reject:
for gene in db.features_of_type('gene', order_by='start'):
print(gene)
for i in db.children(gene, featuretype='mRNA', order_by='start'):
print(i)
try:
if i.attributes['Note']:
print(i.attributes['Note'])
out_keep.write(str(gene) + '\n')
else:
out_reject.write(str(gene) + '\n')
except KeyError:
continue
if __name__ == '__main__':
run()
Unfortunately, the above script only prints in keep.gff3 file the genes and nothing else.
What did I miss?
Thank you in advance,
Hello Ric ,
what your method is doing, is iterate over all
gene's, finding all children of that genes that aremRNAand write all of these mRNA who have aNoteattribute tokeep.gff3. IfNoteis not a valid key, python will raise an KeyError exception, so theelseblock is never executed and nothing is written toreject.gff3. The correct way to check if a key exists would beif "Note" in i.attributes.fin swimmer
Thank you, I changed it but both files contain only the features
genebut missingmRNA, exon, CDS. How is it possible to add those ones, too?Of course, because you are writing only
geneto the fileCould you please explain a little bit more about your goal? Also a larger example of your gff file would be helpful with lines you've expected in the one or in the other output file.
If you cross post your question please leave a note, so that other can see what is already answered!