Question

Filter with awk ?

0

Entering edit mode

5.9 years ago

Sillpositive ▴ 20

Hello everyone I have a Fasta file of predicted elements but I want to filter by size greater than 120 bp that is less than 120 bp I want to eliminate.

>NODE_19_length_97_cov_3.030928
 TTTTGACCTAACCACGTATCGTAATGAAATTAGGTTCAAACAATCAAACCATATCATAAG
 GATATCAATTATGAAATCAGGGCATCTAATTGAACTTAAAAGGACATCGTCGGCCCACTG
 GCCag

Thank you !

sequence • 1.6k views

ADD COMMENT • link written 5.9 years ago by Sillpositive ▴ 20

score 3 · Answer 1 · 2018-05-30

3

Entering edit mode

5.9 years ago

GenoMax 141k

How To Filter Multi Fasta By Length??
http://itrylinux.com/use-awk-to-filter-fasta-file-by-minimum-sequence-length/

ADD COMMENT • link 5.9 years ago by GenoMax 141k

score 2 · Answer 2 · 2018-05-30

2

Entering edit mode

5.9 years ago

cpad0112 21k

non-awk solution:

seqkit seq -m 120 test.fa -o test.fitlered.fa

download seqkit from here: https://bioinf.shenwei.me/seqkit/download/

ADD COMMENT • link 5.9 years ago by cpad0112 21k

0

Entering edit mode

Thank you so much ! I will try this option too !

ADD REPLY • link 5.9 years ago by Sillpositive ▴ 20

score 1 · Answer 3 · 2018-05-30

1

Entering edit mode

5.9 years ago

Alex Reynolds 35k

Here's an awk-based approach, if your FASTA records are single-lined (header on one line, sequence on the next line, and so on):

$ awk '!/^>/ { next } { getline seq } length(seq) >= 120 { print $0 "\n" seq }' input.fa > output.fa

If multi-lined (header on one line, sequence split across two or more lines), then the input.fa would need preprocessing to make it single-lined. There are other answers on Biostars that explain how to do this.

Via: http://itrylinux.com/use-awk-to-filter-fasta-file-by-minimum-sequence-length/