Question

Fasta file manipulation

0

Entering edit mode

10.0 years ago

GP ▴ 10

Hi All,

I want to extract the sequences that don't have * (special character/stop codon) in it from a fasta file that I have . Is there any one liner or easy way to do that t'h the command line (mac os) or if anyone could redirect me to the similar post on this forum, that would be very helpful.

Thanks!!

sequence • 2.2k views

ADD COMMENT • link updated 2.6 years ago by Ram 43k • written 10.0 years ago by GP ▴ 10

Ram · Answer 1 · 2014-05-15

2

Entering edit mode

10.0 years ago

Pierre Lindenbaum 161k

linearize, grep, convert back to fasta:

awk '/^>/{printf("\n%s\t",$0);next;} {printf("%s",$0);} END {printf("\n");}' file.fa |\
awk -F '\t' '!($2 ~ /\*/)' |\
tr "\t" "\n"

ADD COMMENT • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Thanks for the fast response! It works perfectly :) and btw, what I will have to change in this command in order to print the sequences that has * in it.

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by GP ▴ 10

0

Entering edit mode

remove the ! in the second awk cmd

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by Pierre Lindenbaum 161k

0

Entering edit mode

Great, thanks again Pierre!!

ADD REPLY • link updated 4.3 years ago by Ram 43k • written 10.0 years ago by GP ▴ 10