Question: filter gff3 file for complete gene
0
gravatar for arunprasanna83
4 weeks ago by
arunprasanna8330 wrote:

I have a gff3 file which has complete length sequence. But, few of the complete sequences have multiple UTRs. I wish to filter them out. Is there any utility that is available ?

scaffold105size588288 transdecoder gene 130390 132407 . + .
scaffold105size588288 transdecoder mRNA 130390 132407 . + .
scaffold105size588288 transdecoder five_prime_UTR 130390 130818 . + .
scaffold105size588288 transdecoder exon 130390 132407 . + .
scaffold105size588288 transdecoder CDS 130819 131979 . + 0
scaffold105size588288 transdecoder three_prime_UTR 131980 132407 . + .

scaffold105size588288 transdecoder gene 278652 281390 . + .
scaffold105size588288 transdecoder mRNA 278652 281390 . + .
scaffold105size588288 transdecoder five_prime_UTR 278652 278776 . + .
scaffold105size588288 transdecoder exon 278652 278847 . + .
scaffold105size588288 transdecoder CDS 278777 278847 . + 0
scaffold105size588288 transdecoder exon 279283 280020 . + .
scaffold105size588288 transdecoder CDS 279283 279589 . + 1
scaffold105size588288 transdecoder exon 280311 280393 . + .
scaffold105size588288 transdecoder three_prime_UTR 280311 280393 . + .
scaffold105size588288 transdecoder three_prime_UTR 280593 280678 . + .

scaffold105size588288 transdecoder three_prime_UTR 280757 280812 . + .

In this trimmed example, I need to remove the second gene set as it has 3 3'UTRs and retain the first one, which is more a complete set.

Thanks in advance.

next-gen assembly genome • 138 views
ADD COMMENTlink modified 4 weeks ago • written 4 weeks ago by arunprasanna8330

Select those that have column 4 == "gene". Please use google to find solution for this using awk or sed.

ADD REPLYlink written 4 weeks ago by ATpoint16k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 832 users visited in the last hour