Question: Get CDS from consensus genome assembly
0
gravatar for yoce_pf
6 months ago by
yoce_pf30
Universidad Nacional Autónoma de México - University of Bath
yoce_pf30 wrote:

Hello!!

I'm trying to get the coding sequences from several reference-genome assemblies. The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.

I can extract the CDS regions with bedtools and use the gff file from the reference genome, but I'm thinking that I could lost some regions of coding sequences if I only get the cds based on the reference genome.

I would like to find and extract those coding sequences of each consensus genome without use the genomic information of the reference genome.

I have been trying to get the CDS using: ESTScan and Transeq, but I would like to know if there is a best strategy to perform it.

Thank you so much

ADD COMMENTlink written 6 months ago by yoce_pf30

The reference-genome assemblies were obtained wit: GATK, samtools mpileup, bcftools, vcfutils.pl and seqtk.

This really doesn't explain what you have done. I suspect you have several resequencing genomes, by the list of tools used. And you suspect some of these genomes will have additional genes in relation to the reference annotation?

Are you extracting CDS with Transeq and ESTscan from the whole genome sequence? That is not how they should be used, they are not the appropriate tools for the task.

ADD REPLYlink written 6 months ago by h.mon24k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2266 users visited in the last hour