Question: How to remove header's tail of a multi-fasta file with sed or other
gravatar for tremblayemilie9
5.8 years ago by
tremblayemilie90 wrote:


I have a multifasta file with read's headers such as:


now, I would like to remove this tail part of my hearders where we get the sequence's id. I do not know how to do so for different tails for each reads.I thought of something like this:

sed s'/^.fastq/s/[^ ]* //'g

but it does not apply for some reason.

I would like to get something like this:

sequence • 2.3k views
ADD COMMENTlink modified 5 months ago by RamRS28k • written 5.8 years ago by tremblayemilie90

Hi again,

I also have to remeve that sequence number from another file, but in that case, the sequence is in between...:

>barcodelabel= #ITS2_A_B10_R_2014_02_19_15_00_39_user_SN2-19-FUNGI_OOMYCETE-EMVSAMPLES_et_2014-02-19_RUN1_Fungi_oomycete_Run1_Ana140224.fastq_72JCK_00944_01804;size=52893;
>barcodelabel= #ITS1F_A_B21_R_2014_02_19_15_00_39_user_SN2-19-FUNGI_OOMYCETE-EMVSAMPLES_et_2014-02-19_RUN1_Fungi_oomycete_Run1_Ana140224.fastq_72JCK_03245_02705;size=33771;

So I want to keep the size=52893 part but remove the 72JCK_00944_01804 part.

ADD REPLYlink modified 5 months ago by RamRS28k • written 5.8 years ago by tremblayemilie90

You might wanna start working on regular expressions more. These come best when you practice a bit. As long as you don't overwrite the file, nothing should go wrong in experimentation. 

In this case, you wanna match something that starts after a fastq_ and ends before the next ;

Should be easy enough to do that from the answer in your other question on the forum.

ADD REPLYlink written 5.8 years ago by RamRS28k

Hey I want to remove the header from a multifasta file except the first header is that possible?

ADD REPLYlink written 5 months ago by zhamouda0

This is not an answer to the top-level question and hence must not be added as an answer. I'm moving it to a comment.

Please open a new post describing your exact problem as well as what you've tried in your efforts to solve that problem.

ADD REPLYlink written 5 months ago by RamRS28k
gravatar for dariober
5.8 years ago by
WCIP | Glasgow | UK
dariober11k wrote:

What about:

sed 's/fastq_.*/fastq/' myseq.fa

Assuming the string "fastq_" occurs only at the end of the sequence name and everything after and including "_" will be stripped.

ADD COMMENTlink modified 5 months ago by RamRS28k • written 5.8 years ago by dariober11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 969 users visited in the last hour