Question: How to delete last 5 characters off of FASTA header?
0
gravatar for angela1
6 days ago by
angela10
angela10 wrote:

Hi,

I am trying to remove the last 5 characters from my FASTA header in my sequencing data. I have ≈400,000 sequences and have tried to use sed command in terminal to do this for me.

Input text:

>1-4-8.45  
TAGGGAGA

Expected Output:

>1-4           
TAGGGAGA

How can I use sed command to remove the last 5 characters from my FASTA headers?

sed fasta header • 76 views
ADD COMMENTlink modified 6 days ago by wm280 • written 6 days ago by angela10
1
gravatar for wm
6 days ago by
wm280
China
wm280 wrote:

using sed, this solution is not consider the white spaces in header.

$ sed '/^>/s/.\{5\}$//' in.fa

for fasta and fastq file, bioawk https://github.com/lh3/bioawk is also good option, it can separate the $name and $comment in header.

$ bioawk -cfastx '{id=substr($name, 0, length($name) - 5); print ">"id"\n"$seq}'
ADD COMMENTlink written 6 days ago by wm280
0
gravatar for RamRS
6 days ago by
RamRS26k
Houston, TX
RamRS26k wrote:

What have you tried? This sort of problem has been addressed on the site multiple times

sed can match the first character of each line to pick lines where an operation is performed - you can use that to restrict the operation to just header lines. You can also capture the last five characters with the regex (.{5})$.

Please use these hints to get to the solution.

ADD COMMENTlink modified 6 days ago • written 6 days ago by RamRS26k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1192 users visited in the last hour