Question: Renaming Entries In A Fasta File
7
gravatar for thiago84naka
5.3 years ago by
thiago84naka80
thiago84naka80 wrote:

Hello,

I have never made a script in my life.

The ploblem is how to change the fasta names like this input file:

>Glyma04g14800|Glyma04g14800.3
MMLETVAAVPGMVAGMLLHCKSLRRFEHSGGWIKALLEEAENERMHLMTFMEVAKPKWYE
>Glyma05g24460|Glyma05g24460.1
SNVSIDLTKHHVPKNFLDKVAYRTVKLLRIPTDLFFKRRYGCRAMMLETVAAVPGMVGGM

in this output file (change original names to numbers in ascending order, starting with 1):

>1
MMLETVAAVPGMVAGMLLHCKSLRRFEHSGGWIKALLEEAENERMHLMTFMEVAKPKWYE
>2
SNVSIDLTKHHVPKNFLDKVAYRTVKLLRIPTDLFFKRRYGCRAMMLETVAAVPGMVGGM

I'm so grateful for helping. Regards, Naka

fasta • 22k views
ADD COMMENTlink modified 20 months ago by noirot.celine50 • written 5.3 years ago by thiago84naka80
1

Thank you very much for all the answers!!!

ADD REPLYlink written 5.3 years ago by thiago84naka80
1

Welcome to biostar. Its great that you received so many good answers to your question. Next time make your thank you as a comment to the existing answers or as a comment or edit to your original question, and not as a separate answer.

ADD REPLYlink written 5.3 years ago by Obi Griffith16k
26
gravatar for Pierre Lindenbaum
5.3 years ago by
France/Nantes/Institut du Thorax - INSERM UMR1087
Pierre Lindenbaum103k wrote:
 awk '/^>/{print ">" ++i; next}{print}' < file.fasta
ADD COMMENTlink written 5.3 years ago by Pierre Lindenbaum103k

Sir can we modify above awk syntax in this way

instead of printing like

>1 

>2

....

it prints like 

>chromosome1 

>chromosome2

...

for that purpose , where and how do i put the text "chromosome"

please help me out

ADD REPLYlink written 3.7 years ago by Raghav100
1

@Raghav: If you wanted to add chromosome in the header with the counter, simply add it in the ">" portion of the one-liner.

 awk '/^>/{print ">chromosome" ++i; next}{print}' < file.fasta
ADD REPLYlink written 3.4 years ago by st.ph.n2.0k

How can we add "chr" just after >? I don't want to change anything else. For example:

2L I want it to become chr2L

ADD REPLYlink written 18 months ago by saswati.s20100

Hello there, (I already solved this)

I am trying to understand your script line to modified. So, is the script saying?: For every line ('/) where you find a > (^>/) print the > and then add (+) a counter (+), then next print what follows.

In my case the names are like:

M02137:143:000000000-APU54:1:1101:21985:13014 1:N:0:10

M02137:143:000000000-APU54:1:1112:18691:9995 1:N:0:10

etc. I want to leave only what is different.

awk '/^>/{print ">" remove "M02137:143:000000000-APU54:1:"; next}{print}' < file.fasta

And can I do this in ssh? ( I don't think I have awk installed)

Many thanks in advance for your time,

Caro PS: I am new to HTS/NGS and don't know much about programming

ADD REPLYlink modified 17 months ago • written 17 months ago by cdiaza0

This doesn't work if the read spans in multiple lines ?

ADD REPLYlink written 14 months ago by Picasa320

@Pierre Lindenbaum Hi, 1. how can I modify this command to add genus_species name after > in every entry and yet keep most of the information in the the header ie. my entries are like this

lcl|HF546977.1_cds_CCO27433.1_1 [gene=cox1] [protein=cytochrome c oxidase subunit 1] [protein_id=CCO27433.1] [location.......]

and want to have the entries name like this

genus_species HF546977.1_cds_CCO27433.1_1 [gene=cox1] [protein=cytochrome c oxidase subunit 1]

By using

awk '/^>/{print ">genus_species gene." ++i; next}{print}' < file.fa

I got,

genus_species gene.1 and so on

  1. and how can I add output file in the command line

Having the genus_species name in the beginning is requires as I'll be comparing different species and also, I don't want to loose the ids and protein names for ease of downstream analysis.

ADD REPLYlink modified 13 months ago • written 13 months ago by mirza80

Hello Pierre, Thank you for your useful code. May I please ask how can I modify the code to keep everything else in the sequence and just to add the sample name in front and that too for the batch of files.

e.g. my file looks like :

>M03691:51:000000000-BD94Y:1:1101:14841:1381 1:N:0:1

ACTGGGTGTAAAGGGCGTGTAGGCGGAGAAGCAAGTCAGAAGTGAAATCCATGGGCTTAACCCATGAACTGCTTTTGAAACTGTTTCCCTTGAGTATCGGAGAGGCAGGCGGAATTCCTAGTGTAGCGGTGAAATGCGTAGATATTAGGAGGAACACCAGTGGCGAAGGCGGCCTGCTGGACGACAACTGACGCTGAGGCGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCCGGT

M03691:51:000000000-BD94Y:1:1101:15960:1389 1:N:0:1 TACTGGGGTATCTAATCCTATTTGCTCCCCACGCTTTCGGGACTGAGCGTCAGTTATGCGCCAGATCGTCGCCTTCGCCACTGGTGTTCCTCCATATATCTACGCATTTCACCGCTACACATGGAATTCCACGATCCTCTCACACACTCTAGCTCTACGGTTTCCATGGCTTACCGAAGTTAAGCTTCGATCTTTCACCACAGACCCTTAGTGCCGCCTGCTCCCTCTTTACACCCAGT M03691:51:000000000-BD94Y:1:1101:15662:1415 1:N:0:1 ACTGGGTGTAAAGGGCTCGTAGGCGGTTCGTCGCGTCCGGTGTGAAAGTCCATCGCTTAACGGTGGATCTGCGCCGGGTACGGGCGGGCTGGAGTGCGGTAGGGGAGACTGGAATTCCCGGTGTAACGGTGGAATGTGTAGATATCGGGAAGAACACCAATGGCGAAGGCAGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGATTAGATACCCCCGTA

Now I want to add Sample name after > and keep everything else as it it.

This process I want to do for a batch of files. Any help will be really great. Thanks, Mitra

ADD REPLYlink written 4 months ago by Mitra0
8
gravatar for Istvan Albert
5.3 years ago by
Istvan Albert ♦♦ 75k
University Park, USA
Istvan Albert ♦♦ 75k wrote:

The Fastx Renamer tool can do this as well: http://hannonlab.cshl.edu/fastx_toolkit/commandline.html#fastx_renamer_usage

$ more test.fa 
>GS6SIDE04J1T1R xy=4004_1485
CATAGTAGTGAGAGTTGATCATGGCTCAGCCATCTCATCCAGCAGCCGCGGTAATCACTACTAT
>GS6SIDE04J0352 xy=3996_712
ACGAGTGCGTAGAGTTGATCATGGCTCAGCAGCCTCCTCGTGCCAGCAGCCGCGGTAATACGCACTCG
>GS6SIDE04JM7EM xy=3837_2988
AGCACTGTAGAGAGTTGATCCTGGCTCAGGGATAGGCCAGCAGCCGCGGTAATCTACAGTGC

$ ~/Downloads/bin/fastx_renamer -i test.fa -n COUNT 
>1
CATAGTAGTGAGAGTTGATCATGGCTCAGCCATCTCATCCAGCAGCCGCGGTAATCACTACTAT
>2
ACGAGTGCGTAGAGTTGATCATGGCTCAGCAGCCTCCTCGTGCCAGCAGCCGCGGTAATACGCACTCG
>3
AGCACTGTAGAGAGTTGATCCTGGCTCAGGGATAGGCCAGCAGCCGCGGTAATCTACAGTGC
ADD COMMENTlink written 5.3 years ago by Istvan Albert ♦♦ 75k
6
gravatar for David Langenberger
5.3 years ago by
Deutschland
David Langenberger8.0k wrote:

Try this:

cat youFile.fa | perl -ane 'if(/\>/){$a++;print ">$a\n"}else{print;}' > youFile_new.fa
ADD COMMENTlink written 5.3 years ago by David Langenberger8.0k
7

Instead of useless cat, try: perl -ane 'if(/\>/){$a++;print ">$a\n"}else{print;}' youFile.fa > youFile_new.fa

ADD REPLYlink modified 5.3 years ago • written 5.3 years ago by Matt Shirley8.0k
5
gravatar for noirot.celine
20 months ago by
noirot.celine50 wrote:

Here is a generic way to convert ncbi headers to simple header

>gi|1002620271|ref|NC_029525.1| Coturnix japonica isolate 7356 chromosome 10, Coturnix japonica 2.0, whole genome shotgun sequence
TACTCCCCAAGAA

to

>NC_029525.1
TACTCCCCAAGAA

By sed :

sed 's/^[^ ]*[|]\([^|]*\)[|] .*$/>\1/' Coturnix_japonica.fasta > Coturnix_japonica_rename.fasta
ADD COMMENTlink written 20 months ago by noirot.celine50

Great. Thank you for this creative usage of sed.

ADD REPLYlink written 19 months ago by SomeoneElse0

Thank you so much! Just what I was looking for. It worked a treat!

ADD REPLYlink written 15 months ago by kwathen-dunn0

@noirot.celine Hi, I have a similar problem and your above command didn't work for me (I am really new to linux environment). I have different fasta files. Some of my fasta headers are like this (augustus output file)

g1134t1 geneg1134 I want to keep the header and just add the species_genus name after >

or better like this

Species_genus gene1134

Similarly, for file with headers like this,

AG1IA_00006 contig1:1338:4722:+ [translate_table: standard]

I want to keep >AG1IA_00006 and since the ids in files are also not in continuation, so simply renaming in series won't help.

p.s. my OS= Ubuntu16.04

ADD REPLYlink modified 11 months ago • written 11 months ago by mirza80

Oh, god... Thank Handbook for linking me here to see the magic!!!

ADD REPLYlink written 3 months ago by mckf1111120
1
gravatar for AGS
5.3 years ago by
AGS230
Brooklyn, ny
AGS230 wrote:

I'd use faSimplify

ADD COMMENTlink written 5.3 years ago by AGS230
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1269 users visited in the last hour