Filter with awk ?
4
0
Entering edit mode
5.9 years ago
Sillpositive ▴ 20

Hello everyone I have a Fasta file of predicted elements but I want to filter by size greater than 120 bp that is less than 120 bp I want to eliminate.

>NODE_19_length_97_cov_3.030928
 TTTTGACCTAACCACGTATCGTAATGAAATTAGGTTCAAACAATCAAACCATATCATAAG
 GATATCAATTATGAAATCAGGGCATCTAATTGAACTTAAAAGGACATCGTCGGCCCACTG
 GCCag

Thank you !

sequence • 1.6k views
ADD COMMENT
2
Entering edit mode
5.9 years ago

non-awk solution:

seqkit seq -m 120 test.fa -o test.fitlered.fa

download seqkit from here: https://bioinf.shenwei.me/seqkit/download/

ADD COMMENT
0
Entering edit mode

Thank you so much ! I will try this option too !

ADD REPLY
1
Entering edit mode
5.9 years ago

Here's an awk-based approach, if your FASTA records are single-lined (header on one line, sequence on the next line, and so on):

$ awk '!/^>/ { next } { getline seq } length(seq) >= 120 { print $0 "\n" seq }' input.fa > output.fa

If multi-lined (header on one line, sequence split across two or more lines), then the input.fa would need preprocessing to make it single-lined. There are other answers on Biostars that explain how to do this.

Via: http://itrylinux.com/use-awk-to-filter-fasta-file-by-minimum-sequence-length/

ADD COMMENT
0
Entering edit mode

Thank you so much, I will try this !

ADD REPLY

Login before adding your answer.

Traffic: 2522 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6