Hey all,
I'm working with a lot of data from NCBI and at the moment I'm kind of stuck.
I have a ton of fasta files, either containing genomic contigs or the 16S sequences I extracted from those genomes using RNAmmer. The files were automatically downloaded from NCBI and are named like this: GCF_000284355.1_ASM28435v1_genomic.fasta (for contigs) GCF_000284355.1_ASM28435v1_genomic_16S.fasta (for 16S retrieved from genomes) Another example is a file named like this: GCF_000284235.1_ASM28423v1_genomic_16S.fasta
When looking up either GCF_000284355.1 or ASM28435v1 in NCBI, one finds that the organism name is Arcobacter_butzleri_ED-1
The thing is that I would like to rename all my files and preferably also each sequence inside so they contain the organism name and the strain number, but I don't get how to easily do it. This is why I'm asking here for help.
For example, the filename mentioned above should become Arcobacter_butzleri_ED-1_GCF_000284355.1_ASM28435v1_genomic.fasta
The fasta header of each sequence inside should change from for example this (in case of the 16S file):
>rRNA_NZ_JABW01000042.1_198-1703_DIR+ /molecule=16s_rRNA /score=1786.1
to this:
>Arcobacter_butzleri_ED-1_rRNA_NZ_JABW01000042.1_198-1703_DIR+ /molecule=16s_rRNA /score=1786.1
Many thanks in advance!!
This is one of the oft asked questions on biostars. Search externally using google with keywords "rename fasta file". You should dins something that will work. If you run into problems post your code and we can go from there.