Removing (stubborn) new line from Fasta file sequence?
Entering edit mode
4 months ago
Eliveri ▴ 340

I have a fasta file in this format:

>WP_003850266.1 toxin [Corynebacterium diphtheriae]

Which I want it to appear like

>WP_003850266.1 toxin [Corynebacterium diphtheriae]

However for the particular fasta file I have, for some reason no matter what I try, the newlines cannot be removed.

I have already tried

awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}' < test.fasta > output.fasta

But the new lines remain ...

fasta • 458 views
Entering edit mode

Try with bioawk, for example or something similar:

bioawk -cfastx '{print ">"$name"\n"$seq}' test.vcf > out.fasta
Entering edit mode
4 months ago
seidel 11k

Your file has some lines with carriage returns (\r or ^M), but not all:

tail -2 test.fasta | od -c
0000000    S   T   N   S   R   L   C   A   V   F   V   R   S   G   Q   P
0000020    V   I   G   A   C   T   S   P   Y   D   G   K   Y   W   S   M
0000040    Y   S   R   L   R   K   M   L   Y   L   I   Y   V   A   G   I
0000060    S   V   R   V   H   V   S   K   E   E   Q   Y   Y   D   Y   E
0000100    D   A   T   F   E   T  \r  \n   Y   A   L   T   G   I   S   I
0000120    C   N   P   G   S   S   L   C  \n

One easy solution is to simply preface your command with sed to replace the carriage returns with nothing:

sed -e 's/\r//g' test.fasta | awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);}  END {printf("\n");}'

The sed part can be read as: substitute/thispattern/forthatpattern/global.

Entering edit mode
4 months ago
Carambakaracho ★ 3.2k

I still love to solve these things with Perl oneliners.

perl -nwe 'if(s/^>/\n>/){s/\r?\n$/\n/;}else{s/\r?\n$//};print $_' test.fasta | tail -n +2

Explanation: if you match > at the start, substitute with newline and >: \n> then match optional carriage return \r? and newline \n, replace with \n else match match optional carriage return \r? and newline \n, replace with nothing. Print standard input variable. The tail is required as I didn't include a check for the first line which is an empty line now.

Previously I was convinced Perl regex oneliners are much better than awk as I never cared to learn awk. With more and more time without active Perl development I think I come to acknowledge Perl's picket fencing


Login before adding your answer.

Traffic: 2799 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6