extract CDS sequence between two genomic positions
Entering edit mode
5.6 years ago

Dear all,

given a start and end position in genomic coordinates (hg19), known to fall within coding regions of the same gene, how does one extract the coding sequence between those positions?

Some background info: I'm trying to use deep sequencing as a readout for a method in which we enrich a library of cells for those with a particular phenotype. Each cell in the library contains an unknown partial cDNA sequence (+/-500bp each, covering the entire transcriptome).  After enrichment, the pool of cells is subjected to PCR to extract the partial cDNA sequences, and these are paired-end sequenced on a MiSeq. After QC and mapping to the human genome, I thus obtain BAM files of mapped 2x150bp paired-end reads.

I know need to know what the amino acid sequence is of all these partial cDNAs. I've played around with bedtools, and R packages like GRanges and GenomicFeatures, but I just cannot seem to find a solution as most tools extract the entire transcript or CDS sequence, not the partial one between specific positions. I can turn the BAM file to a GRanges object with the start and stop position of each pair as genomic coordinates, but that's it.

Any help would be greatly appreciated.

sequencing R bam cds coordinates • 1.9k views
Entering edit mode
5.6 years ago
abascalfederico ★ 1.2k

I don't think there is an easy solution. May be you could use Ensembl's REST service. The following service tells you the genomic coordinate of a given protein site:


Trying different protein sites (CDS) you could find which part of a protein is coded between those coordinates. 

Alternatively, the most elegant and efficient would be to do some parsing of CDS coordinates. From Ensembl/Biomart you can retrieve data on exons, CDS, protein sites, etc. from which you can make a bed file. Then intersect your genomic coordinates with the bed file and parse the intersecting CDS lines


Entering edit mode

Thank you for your answer abascalfederico. It's more or less what I thought. I'll try to make it work using the exon coordinates and intersections with my data. Thanks!


Login before adding your answer.

Traffic: 1688 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6