Question: removing part of the fasta header from multifasta file
0
gravatar for 2kg2523
6.0 years ago by
2kg252310
Italy
2kg252310 wrote:

I was trying to delete some words (strings) from sequence header from multifasta file. I want to eliminate len= and path=[...] so finaly I would have only the seq identifier and the length

>comp2_c0_seq1 len=589 path=[1:0-588]
>comp2_c1_seq1 len=352 path=[1462:0-351]

What I want to have is the following in two column

>comp2_c0_seq1 589

>comp2_c1_seq1 352 

 

 

Thank you very much

 

rna-seq sequence • 6.0k views
ADD COMMENTlink modified 6.0 years ago by Matt Shirley9.3k • written 6.0 years ago by 2kg252310
0
gravatar for Juke34
6.0 years ago by
Juke344.4k
Sweden
Juke344.4k wrote:

Hey,

If you have a Mac or linux you can use a bash command to do that:

IFS=$'\n'; for i in $(cat YOURFILE);do if [[ $i =~ ">" ]];then part1=$(echo $i | cut -d' ' -f1); part2=$(echo $i | cut -d' ' -f2);part2ok=$(echo $part2 | cut -d'=' -f2) ; echo "$part1 $part2ok" ;else echo $i; fi ;done >> outputFile

You have just to replace YOURFILE by the name of the multifasta file that you want to modify. The result will be in outputFile.

This is not the most effective but it should work.

ADD COMMENTlink written 6.0 years ago by Juke344.4k
1

sed -e  '/^>/s/len= //' -e /'^>/s/path.*//'

 

ADD REPLYlink modified 6.0 years ago • written 6.0 years ago by Pierre Lindenbaum129k

Thank you for recommendation but it does not work. It gave me the error message. Here is what I did.

sed -e '^>/s/len= //' -e '^>/s/path.*//' MultiFasta.txt > OutLength.txt

The error message

sed: -e expression #1, char 1: unknown command: `^'

Where did I go wrong? Thank you again.

ADD REPLYlink written 6.0 years ago by 2kg252310
1

I didn't even check my sed expr. forgot the prefix '/' before '^'

ADD REPLYlink written 6.0 years ago by Pierre Lindenbaum129k

It worked with the following modifications 

Original which does not work

sed -e '^>/s/len= //' -e '^>/s/path.*//' MultiFasta.txt > OutLength.txt

I deleted "^>" and the space after "len=", and added "\" after "s/". The working syntax is the following

sed -e 's/\len=//' -e 's/\path.*// MultiFasta.txt > OutLength.txt

ADD REPLYlink written 6.0 years ago by 2kg252310

Correction: Must add ' to the end of 's/\path.// to make it 's/\path.//'

Working syntax is the following:

sed -e 's/\len=//' -e 's/\path.*//' MultiFasta.txt > OutLength.txt

ADD REPLYlink modified 2.9 years ago • written 2.9 years ago by TrentGenomics30
0
gravatar for Matt Shirley
6.0 years ago by
Matt Shirley9.3k
Cambridge, MA
Matt Shirley9.3k wrote:

I like Pierre's solution using sed. Here is a Python solution:

You can install pyfaidx using "pip install --user pyfaidx"

ADD COMMENTlink written 6.0 years ago by Matt Shirley9.3k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 790 users visited in the last hour