removing part of the fasta header from multifasta file
2
0
Entering edit mode
6.9 years ago
2kg2523 ▴ 10

I was trying to delete some words (strings) from sequence header from multifasta file. I want to eliminate len= and path=[...] so finaly I would have only the seq identifier and the length

>comp2_c0_seq1 len=589 path=[1:0-588]
>comp2_c1_seq1 len=352 path=[1462:0-351]

What I want to have is the following in two column

>comp2_c0_seq1 589

>comp2_c1_seq1 352 

 

 

Thank you very much

 

rna-seq sequence • 7.0k views
ADD COMMENT
0
Entering edit mode
6.9 years ago
Juke34 ★ 5.7k

Hey,

If you have a Mac or linux you can use a bash command to do that:

IFS=$'\n'; for i in $(cat YOURFILE);do if [[ $i =~ ">" ]];then part1=$(echo $i | cut -d' ' -f1); part2=$(echo $i | cut -d' ' -f2);part2ok=$(echo $part2 | cut -d'=' -f2) ; echo "$part1 $part2ok" ;else echo $i; fi ;done >> outputFile

You have just to replace YOURFILE by the name of the multifasta file that you want to modify. The result will be in outputFile.

This is not the most effective but it should work.

ADD COMMENT
1
Entering edit mode

sed -e  '/^>/s/len= //' -e /'^>/s/path.*//'

 

ADD REPLY
0
Entering edit mode

Thank you for recommendation but it does not work. It gave me the error message. Here is what I did.

sed -e '^>/s/len= //' -e '^>/s/path.*//' MultiFasta.txt > OutLength.txt

The error message

sed: -e expression #1, char 1: unknown command: `^'

Where did I go wrong? Thank you again.

ADD REPLY
1
Entering edit mode

I didn't even check my sed expr. forgot the prefix '/' before '^'

ADD REPLY
0
Entering edit mode

It worked with the following modifications 

Original which does not work

sed -e '^>/s/len= //' -e '^>/s/path.*//' MultiFasta.txt > OutLength.txt

I deleted "^>" and the space after "len=", and added "\" after "s/". The working syntax is the following

sed -e 's/\len=//' -e 's/\path.*// MultiFasta.txt > OutLength.txt

ADD REPLY
0
Entering edit mode

Correction: Must add ' to the end of 's/\path.// to make it 's/\path.//'

Working syntax is the following:

sed -e 's/\len=//' -e 's/\path.*//' MultiFasta.txt > OutLength.txt

ADD REPLY
0
Entering edit mode
6.9 years ago

I like Pierre's solution using sed. Here is a Python solution:

You can install pyfaidx using "pip install --user pyfaidx"

ADD COMMENT

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6