how to replace a list of headers in fasta file that are not in order
7 weeks ago
mthm ▴ 30

That is how my fasta file looks like:

>monCan3F9-B-G1795-Map9
TTTATTATACCCTGAACCCATTAAAA(multiple lines)
>monJX13F48-L-B718-Map1
AAAATTAATTCAGAATTATGTTTG(multiple lines)
.
.
.


the list of new names are not in the same order as in the fasta file, so I have to define it like this e.g. :

monCan3F9-B-G1795-Map9 > BARI1#DNA/Tc1-Mariner

monJX13F48-L-B718-Map1 > PARIS#LTR

.

.


for a few names, I could do that manually using 'sed' but I don't know how to do it when I have about 1000 of them! I tried to check samtools manual but as far as I understood it requires sam and bam files, is there any other toolkit to do such a thing?

for a few names, I could do that manually using 'sed' but I don't know how to do it when I have about 1000 of them!

use sed with option -f

7 weeks ago

use seqkit rename with file option. Input files are two: a fasta file and a tab separated file. Tab separated file should have following columns: First column name/id/pattern (fasta header) from file 1 and second column new header.

Thanks for your suggestion. I tried this command:

./seqkit replace -p "(>\S+)" --replacement "{kv}" --kv-file rename.txt test.fa --keep-key > output.fa


returns:

[INFO] read key-value file: rename
[INFO] 170 pairs of key-value loaded


but when I check my output file, names are not changed, I don't understand where I am wrong?

> is not part of the sequence name.

-p "^(\S+)"

thanks it worked

