Question: Extract Sequences From Fasta Using Awk One-Liner
4
gravatar for Newvin
6.7 years ago by
Newvin340
Newvin340 wrote:

Hi all. Really basic question here. I'd like to grab the sequences from a FASTA file with an AWK one-liner. To grab the headers, I can do:

awk < seq.fasta '/^>/ { print $0 }'

How do I negate this, so that it grabs the lines that do NOT begin with the '>' character. Feel free to chime in with other methods to solve the problem, but I'd like to learn an AWK-specific solution as I am trying to level up my AWK.

Thanks!

fasta parsing sequence awk • 8.7k views
ADD COMMENTlink modified 6.7 years ago by Aaronquinlan10k • written 6.7 years ago by Newvin340

IMHO, you really want to "level up" in regular expressions, not awk specfically. The more experience you develop with regex, you'll be able to apply it to awk, sed, and grep (as well as most programming languages) equally well.

ADD REPLYlink written 6.7 years ago by Andrew Su4.8k
8
gravatar for Aaronquinlan
6.7 years ago by
Aaronquinlan10k
United States
Aaronquinlan10k wrote:

awk < seq.fasta '!/^>/ { print $0 }'

or (preferred for clarity):

awk < seq.fasta '$0 !~ /^>/ { print $0 }'

or merely:

awk < seq.fasta '$0 !~ /^>/'

or grep

grep -v ^\> seq.fasta

or some people prefer "perl one liners" for this sort of thing because you can conceivably use Perl for awk-ish filters and for your day to day scripting.

perl -lne 'print if !($_ =~ /^\>/)' seq.fasta
ADD COMMENTlink modified 6.7 years ago • written 6.7 years ago by Aaronquinlan10k
3

Thanks. I must have missed that in that documentation. How AWK-ward.

ADD REPLYlink written 6.7 years ago by Newvin340

Perl line is not quite right. Your command will print all lines that don't have '>' anywhere. To print just those lines that don't start with '>':

perl -lne 'print if !($_ =~ /^>/)' seq.fasta

ADD REPLYlink written 6.7 years ago by Chris Maloney320

right, thanks Chris. updated the perl example to match the awk regex.

ADD REPLYlink written 6.7 years ago by Aaronquinlan10k
2
gravatar for Pierre Lindenbaum
6.7 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum109k wrote:

awk '($0 ~ /^[^>]/)' < file.fasta

ADD COMMENTlink written 6.7 years ago by Pierre Lindenbaum109k
1
gravatar for User 4133
6.7 years ago by
User 4133150
User 4133150 wrote:

You can also use this:

grep -v '>' file.fasta

In my blog you can find a comprehensive posto about formatting and splitting fasta files using python scripts:

http://basicbioinformatics.blogspot.com/2011/10/split-fasta-file.html

ADD COMMENTlink written 6.7 years ago by User 4133150
1

A minor point, but you really want to ensure that the ">" starts at the beginning of the line, per the FASTA spec.

ADD REPLYlink written 6.7 years ago by Aaronquinlan10k

i.e. grep -v "^>"

ADD REPLYlink written 6.7 years ago by Neilfws48k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 937 users visited in the last hour