Question: Get CDS from consensus genome assembly
0
gravatar for yoce_pf
14 months ago by
yoce_pf40
Universidad Nacional Autónoma de México - University of Bath
yoce_pf40 wrote:

Hello!!

I'm trying to get the coding sequences from several reference-genome assemblies. The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.

I can extract the CDS regions with bedtools and use the gff file from the reference genome, but I'm thinking that I could lost some regions of coding sequences if I only get the cds based on the reference genome.

I would like to find and extract those coding sequences of each consensus genome without use the genomic information of the reference genome.

I have been trying to get the CDS using: ESTScan and Transeq, but I would like to know if there is a best strategy to perform it.

Thank you so much

ADD COMMENTlink written 14 months ago by yoce_pf40

The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.

This really doesn't explain what you have done. I suspect you have several resequencing genomes, by the list of tools used. And you suspect some of these genomes will have additional genes in relation to the reference annotation?

Are you extracting CDS with Transeq and ESTscan from the whole genome sequence? That is not how they should be used, they are not the appropriate tools for the task.

ADD REPLYlink written 14 months ago by h.mon28k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 956 users visited in the last hour