I'm trying to get the coding sequences from several reference-genome assemblies. The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.
I can extract the CDS regions with bedtools and use the gff file from the reference genome, but I'm thinking that I could lost some regions of coding sequences if I only get the cds based on the reference genome.
I would like to find and extract those coding sequences of each consensus genome without use the genomic information of the reference genome.
I have been trying to get the CDS using: ESTScan and Transeq, but I would like to know if there is a best strategy to perform it.
Thank you so much