Question: (Closed) Trim the ID line of a fasta
0
gravatar for SaltedPork
12 months ago by
SaltedPork80
SaltedPork80 wrote:

I have a fasta file with some very silly ID's (straight from NCBI) that look like this:

>KF859747.1 HIV-1 isolate DEURF09UG005 from Uganda gag protein (gag) gene, complete cds; pol protein (pol) gene, partial cds; vif protein (vif), vpr protein (vpr), tat protein (tat), rev protein (rev), vpu protein (vpu), and envelope glycoprotein (env) genes, complete cds; and nonfunctional nef protein (nef) gene, complete sequence

If all I wanted was the >KF859747 part to be in my fasta, are there any nifty one liners to do this on the command line?

regex fasta • 345 views
ADD COMMENTlink modified 12 months ago by lieven.sterck3.9k • written 12 months ago by SaltedPork80

In case you are interested in obtaining such headers using a desktop GUI application, you may have a look at our SEDA software (http://www.sing-group.org/seda/). It has different options for processing headers as well as obtaining them using the "Statistics" option. Regards.

ADD REPLYlink written 12 months ago by Hugo150

Hello SaltedPork!

Commonly asked question. Please search Biostars first in future.

For this reason we have closed your question. This allows us to keep the site focused on the topics that the community can help with.

If you disagree please tell us why in a reply below, we'll be happy to talk about it.

Cheers!

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax62k
2
gravatar for apeltzer
12 months ago by
apeltzer130
Tuebingen, Germany
apeltzer130 wrote:

Something like

cat blabla.fasta | cut -f 1 -d " " > output.fasta

That should keep your FastA sequence, as it simply returns the first column (-f 1) of each row before a single dash (-d " ") .

ADD COMMENTlink written 12 months ago by apeltzer130
2
gravatar for lieven.sterck
12 months ago by
lieven.sterck3.9k
VIB, Ghent, Belgium
lieven.sterck3.9k wrote:

same principle but with sed

cat file.fasta | sed 's/ .*//g' > out.fasta
ADD COMMENTlink written 12 months ago by lieven.sterck3.9k
Please log in to add an answer.
The thread is closed. No new answers may be added.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1591 users visited in the last hour