Question: How to obtain protein-coding sequences from assembled genome/exome dataset?
gravatar for DNAngel
13 months ago by
DNAngel80 wrote:

I use bwa-mem to assemble my genome and exome datasets to work with just CDS of my various species. But so far, I was able to do this for individual CDS at a time using individual CDS ref sequences from different reference species.

Of course this is just not feasible when wanting to explore the entire genomic/exonic dataset and to test for selection on all the protein-coding genes obtained in my species. I am not sure how to assemble my raw single-end reads if I should download all the CDS sequences for the specific species and run it all in one file? The end of my custom script produces a single MSA file when using a single CDS gene as my reference, so would this produce one giant MSA alignment? I would have to then run various models one each gene individually or BLAST them so I need individual MSAs.

Any advice on this so I can be most efficient? End goal: obtain MSAs for all protein-coding genes in my genomic/exonic datasets so I can run various models testing for selection pressures on each gene.

bwa paml • 292 views
ADD COMMENTlink written 13 months ago by DNAngel80
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1594 users visited in the last hour