Question: How to obtain protein-coding sequences from assembled genome/exome dataset?
gravatar for DNAngel
9 months ago by
DNAngel40 wrote:

I use bwa-mem to assemble my genome and exome datasets to work with just CDS of my various species. But so far, I was able to do this for individual CDS at a time using individual CDS ref sequences from different reference species.

Of course this is just not feasible when wanting to explore the entire genomic/exonic dataset and to test for selection on all the protein-coding genes obtained in my species. I am not sure how to assemble my raw single-end reads if I should download all the CDS sequences for the specific species and run it all in one file? The end of my custom script produces a single MSA file when using a single CDS gene as my reference, so would this produce one giant MSA alignment? I would have to then run various models one each gene individually or BLAST them so I need individual MSAs.

Any advice on this so I can be most efficient? End goal: obtain MSAs for all protein-coding genes in my genomic/exonic datasets so I can run various models testing for selection pressures on each gene.

bwa paml • 258 views
ADD COMMENTlink written 9 months ago by DNAngel40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1414 users visited in the last hour