Rename fasta file after organism name obtained from NCBI
0
0
Entering edit mode
5.0 years ago

Hey all,

I'm working with a lot of data from NCBI and at the moment I'm kind of stuck.

I have a ton of fasta files, either containing genomic contigs or the 16S sequences I extracted from those genomes using RNAmmer. The files were automatically downloaded from NCBI and are named like this: GCF_000284355.1_ASM28435v1_genomic.fasta (for contigs) GCF_000284355.1_ASM28435v1_genomic_16S.fasta (for 16S retrieved from genomes) Another example is a file named like this: GCF_000284235.1_ASM28423v1_genomic_16S.fasta

When looking up either GCF_000284355.1 or ASM28435v1 in NCBI, one finds that the organism name is Arcobacter_butzleri_ED-1

The thing is that I would like to rename all my files and preferably also each sequence inside so they contain the organism name and the strain number, but I don't get how to easily do it. This is why I'm asking here for help.

For example, the filename mentioned above should become Arcobacter_butzleri_ED-1_GCF_000284355.1_ASM28435v1_genomic.fasta

The fasta header of each sequence inside should change from for example this (in case of the 16S file):

>rRNA_NZ_JABW01000042.1_198-1703_DIR+ /molecule=16s_rRNA /score=1786.1

to this:

>Arcobacter_butzleri_ED-1_rRNA_NZ_JABW01000042.1_198-1703_DIR+ /molecule=16s_rRNA /score=1786.1

Many thanks in advance!!

genome Assembly sequence • 1.0k views
ADD COMMENT
0
Entering edit mode

This is one of the oft asked questions on biostars. Search externally using google with keywords "rename fasta file". You should dins something that will work. If you run into problems post your code and we can go from there.

ADD REPLY

Login before adding your answer.

Traffic: 2807 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6