I have a code that is creating me a script where I can change the name of the titles of the sequences in a FASTA file.
This is the text file I'm using:
#Assembly   Genome Center name  RefSeq Accession.version    GenBank Accession.version   NCBI name
GeoFor_1.0  scaffold40  NW_005054297    JH739887    GPS_002009865
GeoFor_1.0  scaffold112 NW_005054298    JH739888    GPS_002009866
GeoFor_1.0  scaffold41  NW_005054299    JH739889    GPS_002009867
GeoFor_1.0  scaffold130 NW_005054300    JH739890    GPS_002009868
GeoFor_1.0  scaffold54  NW_005054301    JH739891    GPS_002009869
GeoFor_1.0  scaffold16  NW_005054302    JH739892    GPS_002009870
This is the FASTA file that I'm using and that I want to change the names. AS you can see, I want to find the scaffold names that match the different JH######.
>Scaffold410    275
TGCATTAATATGAGTGTGTGCTGCAAAAGTTCAGGTCATGGTCCGATCATACTTCACATTTTGGTAGCACTTTAAGCAGAGATCGGTTATCCCATTCTGTGGAAGACTCAACACTATCATAAGGTCCCACAGTTTTATTATCCCTCTGCCTCCCGGAATGCCCCCGGCAGTGAGGGGTACCATCTTCTCAGCAGTAAGGATATTCTTCAGGAGTTCCGTGTGAGCTTTCCCGGATTTAGTTCCATTTTTTAAATACTTCCCAATTCTTTGCTTTG
>Scaffold430    374
CTTTGTTAACTGAAAGAGCCTCTAAGTAGATGACCAGTGCTCAGTTAGTACAGTATGAATTTTGTTTAATGGAACAGGAAGATTTAGTATTGAGAAGCGGTTAAGGGTTTAACCCAGCCTCCTGTCTGAATGGACCTGAAGAGGGGGGCCGGGAAGAAACCCATGACTGCATTAAAGTGATAGATCTCCAGACATGGGCTAGGGAAGATTTACAAGACACTCCCTGGCCTGAGGGAGAAAATATGTTTATTGATGAGTCTTCAAGGGTGGCAGAAGGGAAGCGATTTACAGGATACACAATCATTAATGGAAGGAAATTAAAGGAAGGGGGGAGATTGTCACCCACCTGGTCAGTTCAGACAGCAGAGCTGTAT
>Scaffold1010   597
GGAACACACCTGGGCACACCTGGATGGAGCAGGAACACACCTGGATGGGGTTAGGACACATCTGGATGGCGTTGGGACACACCTGGATGCGCTCAGGGTACACCTG...
Thesis the command I use to create a script to change the names
tail -n +2 scaffold_names_2.txt | while read assemb gcenter refseq genbank ncbi; do echo -ne "sed 's/[[:<:]]$gcenter[[:>:]]/$genbank/g' | " >>script.sh; done
The problem is that I'm not able to save the fasta file with the new names.
This is the last line of my script:
... sed 's/[[:<:]]scaffold4469[[:>:]]/JH767125/g'  name.fasta.fa
The script is running without error, but it's doing nothing.
Do you know why? How can I change all the titles and save it as a new fasta file with another name ?
The only problem here is that you are not getting the same format as the original fasta file.
will give you something like this (see the line number):
Whereas the original file is organized like this:
The solution to this is that:
--line-width 0will not wrap the text!--keep-keywill keep a string that wasn't matchedSee
seqkit replace -hglad you found it!!!
Hi all! I'm having sort of the same problem, but in my case I have keys that must only partially match the name in the fasta file, since the fasta include also numbers, and I need to match by the species code. Example fasta
Example key
I tried several things, from including special characters (
*and.*) in the key to playing around with the"^(\w+)"expression, in which I suspect the problem lies, to no avail. I'm sure the solution it's close, but I haven't got it yet. Any clues? P.S. It correctly loads the kv, yet it cannot match anything in the fasta file.Thank you!
Also, the info lines
won't be printed if you decide to save the files! Super handy trick!