Aligning genome wide coding sequences of two species for Ka/Ks calculation

1

Entering edit mode

8.1 years ago

shreyasibiswas88 ▴ 30

Hi,

I have two fasta files each containing ~300,000 sequences of coding exons for one species. Both species are mapped to the same reference and hence, have same co-ordinates (synteny cannot be detected) example: File 1: >Rnor_chr1:268298874-268299113 TTACGCACACGGGGCACAGCCGCACTTGGTGGGCTTCT.....

File 2: >Rrat_chr1:268298874-268299113 TTACGCACACGGGGCACAGCCGCACTTGGTGGGCTTCTCCACCTC....

I want to align each of the entries and then calculate Ka/Ks using PAML. Basically, pairwise alignment of 1000s of pairs. Looking at Muscle or Prank, I get the idea that they require an input file which contains both sequences to be aligned from the two species. That would mean, I need 300,000 input fasta files? Can anybody give me an idea how this can be solved?

Thank you.

alignment next-gen • 1.7k views

ADD COMMENT • link updated 8.0 years ago by Biostar 20 • written 8.1 years ago by shreyasibiswas88 ▴ 30

Login before adding your answer.