Question: how to extract coding sequences (CDS) from a bam file or vcf file ?
0
gravatar for sunnykevin97
12 days ago by
sunnykevin9710
sunnykevin9710 wrote:

Hi Is it possible to extract the CDS (Coding sequences) from a aligned bam file or from a vcf file ? If I'm wrong, what is the best way to extract the CDS from a WGS dataset ?

I'm interesting in positive selection scan by comparing with different subgroups

Suggestions please.

snp alignment • 148 views
ADD COMMENTlink written 12 days ago by sunnykevin9710
1

If you have reference fasta and corresponding annotation file with CDS and vcf, you can use getfasta from bedtools suite, to get CDS sequence. You can also use bcftools consensus function to get sequence information using VCF. samtools or bamutils can help you in extracting regions of interest from bam.

ADD REPLYlink modified 12 days ago • written 12 days ago by cpad011210k

Thanks for suggestions. I don't have annotation file with CDS for all genomes (except for ref genome) I started in this way ----- 1) I downloaded bam files of different subgroups and I called variants using GATK4 and generated vcfs 2) As of now, I had only one bed file for my reference genome with CDS coordinates then using bedtools I extracted the CDS for reference genome. 3) I don't have annotation files for other genomes how to proceed further analysis ??

"I need to extract the CDS from 22 subgroup genome's, I had only bam files of all these genomes"

My work 1) I'm trying to extract CDS from 22 different subgroup populations 2)Then, I'll perform MSA among these CDS 3) By subjecting MSA alignment file as an Input to PAML, I estimate the dN/dS ratio and construct a positive scan model.

suggestions please.

ADD REPLYlink modified 11 days ago • written 11 days ago by sunnykevin9710

Hello sunnykevin97 ,

are you interested in:

  • variants that are located in a CDS?
  • the consensus sequence of the CDS, which is made by integrate called variants into the reference sequence?
  • something different?

In any case you need the coordinates of your CDS before you can start.

fin swimmer

ADD REPLYlink written 12 days ago by finswimmer8.8k

Thanks for suggestions,

I'm looking for variants in CDS among (~22) different subgroups

1) I'm trying to extract CDS from different subgroup populations 2)Then, I'll perform MSA among these CDS 3) By subjecting MSA alignment file as an Input to PAML, I estimate the dN/dS ratio and construct a positive scan model.

whether, the approach I'm doing was correct ? or is their any other simplest way to do it ?

ADD REPLYlink modified 11 days ago • written 11 days ago by sunnykevin9710
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1793 users visited in the last hour