How to obtain protein-coding sequences from assembled genome/exome dataset?
Entering edit mode
4.8 years ago
DNAngel ▴ 250

I use bwa-mem to assemble my genome and exome datasets to work with just CDS of my various species. But so far, I was able to do this for individual CDS at a time using individual CDS ref sequences from different reference species.

Of course this is just not feasible when wanting to explore the entire genomic/exonic dataset and to test for selection on all the protein-coding genes obtained in my species. I am not sure how to assemble my raw single-end reads if I should download all the CDS sequences for the specific species and run it all in one file? The end of my custom script produces a single MSA file when using a single CDS gene as my reference, so would this produce one giant MSA alignment? I would have to then run various models one each gene individually or BLAST them so I need individual MSAs.

Any advice on this so I can be most efficient? End goal: obtain MSAs for all protein-coding genes in my genomic/exonic datasets so I can run various models testing for selection pressures on each gene.

PAML bwa • 801 views

Login before adding your answer.

Traffic: 2306 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6