Question

I want to merge gene sequences in fasta to create one genome from all cds region.

0

Entering edit mode

15 months ago

iankeetkumar • 0

I have a fasta file with CDS of a viral genome. These sequences are in order. By utilising the ids "fig|11292.9703.CDS.1"

Both the problems are separate

1. I want to merge these genes to form a whole genome?

I mean firstly I want to merge all the corresponding CDS into one big genome and then, next fasta file should start like

>Genome_1
ALL THE CDS COMBINED
>Genome_2
ALL THE CDS COMBINED

2. I want to replace their name, which is stored in another text file

Please help me!

The fasta file looks like this.

>fig|11292.9703.CDS.1|
atgagcaagatttttgtcaacccgagtgctatcagagccggtctggccgatctagagatg
gctgaagagactgttgatctgatcaatagaaacatagaagataatcaagctcatctccag
ggggaacccatagaagtggacaatctccctgaggacatgaggagacttcacttggatgac
ggaaaatcgtctaaccttgatgagatggccagagcgggggaaggcaagtatcgggaagac
>fig|11292.9703.CDS.2|
atgagcaagatttttgtcaacccgagtgctatcagagccggtctggccgatctagagatg
gctgaagagactgttgatctgatcaatagaaacatagaagataatcaagctcatctccag
ggggaacccatagaagtggacaatctccctgaggacatgaggagacttcacttggatgac
ggaaaatcgtctaaccttgatgagatggccagagcgggggaaggcaagtatcgggaagac

text file

fig|11292.9703.CDS.1|        Name_of_organism
fig|11292.9703.CDS.2|        Name_of_organism

If you require any additional information I would be happy to provide.

fasta linux genome • 1.0k views

ADD COMMENT • link updated 13 months ago by Ram 43k • written 15 months ago by iankeetkumar • 0

0

Entering edit mode

1) https://man7.org/linux/man-pages/man1/cat.1.html

2) replace fasta headers with another name in a text file ; Renaming fasta headers according to a matching name list ; etc... etc....

ADD REPLY • link 15 months ago by Pierre Lindenbaum 161k

0

Entering edit mode

There are lots of examples in the search results:

https://www.biostars.org/post/search/?query=rename+fasta+header

https://www.biostars.org/post/search/?query=merge+fasta

ADD REPLY • link 15 months ago by barslmn ★ 2.1k

0

Entering edit mode

You can merge your CDS from a genome by several commands, as suggested by others, but all merged CDS does constitute the whole genome. What about intergenic regions?

By following your concept, you can make an arbitrary sequence that constitutes of all CDS attached side by side.

ADD REPLY • link 15 months ago by kashiff007 ★ 1.9k

0

Entering edit mode

That's what I want. I was doing the analysis based on wgs, but the staring and end were not matching and it seems the MSA algorithm that I am using was not able to align gene to gene. I always saw a frame shift.

ADD REPLY • link 15 months ago by iankeetkumar • 0