Entering edit mode
20 months ago
BATMAN
•
0
Hello people, I hope you are well. I was wondering if you can help me, I need to batch rename a large amount of RefSeq genome files ".fna" format.
Below I show you an example of the file headers:
GCA_000007685.1_ASM768v1_genomic.fna
GCA_000092565.1_ASM9256v1_genomic.fna
GCF_000941035.1_ASM94103v1_genomic.fna
>AFKF01000245.1 Leptospira interrogans serovar Copenhageni str. Fiocruz LV2772 contig_245, whole genome shotgun sequence
ACAGATTTAGTCACATTATAAGTTATTAAAGTGCTCCTTTTCATGATTGGAATTTGTAATAATTCCTACATTTTCAAAAT
CCAACCGTAAAACTTAGTTCCCACCCCACAACGCGATTCATAGATTAGTATGATTCTTGGACGTAAGACAAGTCGAAGTG
>AFLQ01000338.1 Leptospira interrogans serovar Canicola str. HAI0024 contig_338, whole genome shotgun sequence
ATTATAGCAAAAGAACAAATTTTCTTTCTATACCCTTGAAAACGAATACCTACTTCATTTAAAAACAATCCTTAGTTTGT
TAGGTTGAGGAAATTCGAGGAAAAGTGAGGGAAGAATCCGAGATTCTAAATCCAGCTAAAGAGTATGATTCACCATAATA
>AE016823.1 Leptospira interrogans serovar Copenhageni str. Fiocruz L1-130, chromosome I, complete sequence
TTCTTTTAAAAATACTGTATTCTCATTCATTGAATTCTAGAATTGAAAAGTCTGTCAAATGTAGGAATTTTACATAGAAA
TCTCAGAGTCGAAAGTCATCCGAAAAAGTATAACCATATAAGAATTTAGTTTTTCGGAACACTATGTTTTGATAAAAATC
>AOUX01000001.1 Leptospira interrogans serovar Naam str. Naam ctg7180000007300, whole genome shotgun sequence
ACGGATCGCAGCATAATAATCGCTTTCGCATTTGTTATACCGAATTCACGTTAATTAATGTTTAGTTTAAAATGGATGTT
TTGTTTTTACTACTTGGATATATGAAAAAATACTTTGGAACTTGTTTCAAAAGTTAGAATGTGGGGTCTTCTTCAAAAAA
My idea is to rename them to look like this:
Li_serovar_Copenhageni_Fiocruz_L1-130.fna
Li_serovar_Canicola__HAI0024.fna
Li_serovar_Copenhageni_Fiocruz_LV2772.fna
I am new into this field of bioinformatics.
Best regards
Search the forum (and Google) for "FASTA header edit" - this topic has been addressed in a ton of variations. You can use sed, awk or bioawk.