How to replace a set of sequence ID's with another set of sequence ID's ??
3
1
Entering edit mode
9.0 years ago
mjoyraj ▴ 80

Let us consider I have a file (Notepad or MS-doc) having 10 sequences with ID's A, B, C, D, E, F, G, H, I, J. I want to replace the ID's with K, L, M, N, O ,P, Q, R, S, T i.e., A to re replaced with K, B to be replaced with M and so on. How can I do this? I am a Biologist who have newly started analyzing Large data's and having this problem. Any form of advice/help will be highly appreciated.

gene genome sequence • 2.8k views
ADD COMMENT
2
Entering edit mode
9.0 years ago

Your question give me the opportunity to answer using the Knime workbench and the 'fasta node' I've recently described: http://plindenbaum.blogspot.fr/2015/02/automatic-code-generation-for-knime.html

  1. Download knime from https://www.knime.org/downloads/overview
  2. Download my fasta extension from http://cardioserve.nantes.inserm.fr/~lindenb/knime/fasta/ and put the *.jar file in the plugin directory of knime : something like .../knime_2.11.1/plugins/com.github.lindenb.xsltsandbox_2015.02.18.jar
  3. Open knime
  4. Create a node to read the fasta, a node reading a file with two columns (old-name,new-name). Join both file, save the result as fasta. I've uploaded a full example on myexperiment.org: http://www.myexperiment.org/workflows/4643.html?version=1

http://www.myexperiment.org/workflows/4643/versions/1/previews/full

ADD COMMENT
0
Entering edit mode

How to create the node..??

ADD REPLY
0
Entering edit mode

create a new project, drag the nodes from the bottom-left pane to the project area.

You should read the manual for Knime and/or watch a few videos....

ADD REPLY
0
Entering edit mode

Thanks a lot...

ADD REPLY
0
Entering edit mode

... and you can download a pre-defined workflow from http://www.myexperiment.org/workflows/4643.html?version=1 click on [Download Workflow]

ADD REPLY
1
Entering edit mode
9.0 years ago
Peter 6.0k

If you are already using Galaxy you could ask your Galaxy Admin to install this tool of mine:

https://github.com/peterjc/pico_galaxy/tree/master/tools/seq_rename

http://toolshed.g2.bx.psu.edu/view/peterjc/seq_rename

This takes a sequence file (FASTA, FASTQ or even SFF format) and a name mapping table, just a tabular file where you have the old names in one column, and the new names in another column. Of course, you may still have to do some work to get the renaming information into this format...

ADD COMMENT
1
Entering edit mode
9.0 years ago
Prakki Rama ★ 2.7k

Large data! I think you eventually have to use linux. You may have to write a a script if the fastafile is bigger to change header names.

Following command should work in linux for your case.

sed -i -e 's/>A/>K/;s/>B/>L/;s/>C/>M/;s/>D/>N/;s/>E/>O/;s/>F/>P/;s/>G/>Q/;s/>H/>R/;s/>I/>S/;s/>J/>T/' filename.fasta

ADD COMMENT
0
Entering edit mode

I think `tr` may be useful here.

ADD REPLY
0
Entering edit mode

I am looking for a script as my file is very big having 3882 gene sequences...

ADD REPLY
0
Entering edit mode

may be showing a sample of how your headers in the fasta file would like and the IDs you would want to change will help us give much clear solution.

ADD REPLY

Login before adding your answer.

Traffic: 1968 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6