I am new to Python. I want to create a new GFF3 file from an existing file by filtering out features that are smaller than 1 kb.
Is there a way to parse a GFF3 file containing feature information of semicolon separated tags as well as start and end of the feature, and create a new file altogether?
My gff file looks like below:
seq_chr1 S-MART match 158337 160567 . - . Superfamily=LINE;Target=RIL-Map20 356 2619;ID=ms1_seq_chr1_RIL-Map20;Order=TE;Class=Unknown;Identity=93.9881;Name=ms1_seq_chr1_RIL-Map20
Thanks for your reply. I implemented the method and it worked. Yes, I also need to keep everything with attribute SuperFamily=LINES and filter out everything else. I hoped there was an implementation with Biopython, but I guess I can work with re and search.
No need for regular expressions or the search function there: a simple change will suffice. Change
if length >= 1000:
toif length >= 1000 and "SuperFamily=LINES" in values[8]:
.