Entering edit mode
10 weeks ago
iankeetkumar • 0
I have a fasta file with CDS of a viral genome. These sequences are in order. By utilising the ids "fig|11292.9703.CDS.1"
Both the problems are separate
1. I want to merge these genes to form a whole genome?
I mean firstly I want to merge all the corresponding CDS into one big genome and then, next fasta file should start like
>Genome_1 ALL THE CDS COMBINED >Genome_2 ALL THE CDS COMBINED
2. I want to replace their name, which is stored in another text file
Please help me!
The fasta file looks like this.
>fig|11292.9703.CDS.1| atgagcaagatttttgtcaacccgagtgctatcagagccggtctggccgatctagagatg gctgaagagactgttgatctgatcaatagaaacatagaagataatcaagctcatctccag ggggaacccatagaagtggacaatctccctgaggacatgaggagacttcacttggatgac ggaaaatcgtctaaccttgatgagatggccagagcgggggaaggcaagtatcgggaagac >fig|11292.9703.CDS.2| atgagcaagatttttgtcaacccgagtgctatcagagccggtctggccgatctagagatg gctgaagagactgttgatctgatcaatagaaacatagaagataatcaagctcatctccag ggggaacccatagaagtggacaatctccctgaggacatgaggagacttcacttggatgac ggaaaatcgtctaaccttgatgagatggccagagcgggggaaggcaagtatcgggaagac
fig|11292.9703.CDS.1| Name_of_organism fig|11292.9703.CDS.2| Name_of_organism
If you require any additional information I would be happy to provide.
2) replace fasta headers with another name in a text file ; Renaming fasta headers according to a matching name list ; etc... etc....
There are lots of examples in the search results:
You can merge your CDS from a genome by several commands, as suggested by others, but all merged CDS does constitute the whole genome. What about intergenic regions?
By following your concept, you can make an arbitrary sequence that constitutes of all CDS attached side by side.
That's what I want. I was doing the analysis based on wgs, but the staring and end were not matching and it seems the MSA algorithm that I am using was not able to align gene to gene. I always saw a frame shift.