How to remove certain portion of a description line of a multifasta file?
0
0
Entering edit mode
6.3 years ago
majeedaasim ▴ 60

I have a sequence file having description lines like

>AR_DN39_c0_g1_i1|m.1 AR_DN39_c0_g1_i1|g.1 type:5prime_partial len:209 gc:universal AR_DN39_c0_g1_i1:881-255(-)
RGQELNTFSLPALSSSLEDLFSMVVCSTGNSFSKEVSIRRRIVNIFNKREEDFPSLREYNDYLEEVEDMTFKLVEGIDVPAIEAKIAKYQEENAEQIINNRARKAEEVARSLKEHQEQPATGVANDTGLAQNSQAMGIGQYNPIFMQPRPPGLTQQPVPIGGSNAHSVPEDEATLRQRAERAARAGGWTNELCRKRAFEEAFSSLWVS*

I need to remove everything beyond "|" so tha I can retain only AR_DN39_c0_g1_i1 in the heading.

sed awk • 660 views
ADD COMMENT
0
Entering edit mode

There are plenty of past threads on biostars to do this. Please take some time to search. Use google and not the built in Biostars search.

ADD REPLY
0
Entering edit mode
cut -f 1 -d '|' in.fa
ADD REPLY
0
Entering edit mode

Hello majeedaasim!

FAQ. Many past threads with answers. There are two usable answers here.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLY
0
Entering edit mode
6.3 years ago
natallah ▴ 10

You can do

sed 's/|.*//' file.fasta > newfile.fasta

This will remove everything after |

ADD COMMENT

Login before adding your answer.

Traffic: 1959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6