Question: extract CDS sequence between two genomic positions
0
gravatar for morganeboone
3.0 years ago by
morganeboone0 wrote:

Dear all,

given a start and end position in genomic coordinates (hg19), known to fall within coding regions of the same gene, how does one extract the coding sequence between those positions?

Some background info: I'm trying to use deep sequencing as a readout for a method in which we enrich a library of cells for those with a particular phenotype. Each cell in the library contains an unknown partial cDNA sequence (+/-500bp each, covering the entire transcriptome).  After enrichment, the pool of cells is subjected to PCR to extract the partial cDNA sequences, and these are paired-end sequenced on a MiSeq. After QC and mapping to the human genome, I thus obtain BAM files of mapped 2x150bp paired-end reads.

I know need to know what the amino acid sequence is of all these partial cDNAs. I've played around with bedtools, and R packages like GRanges and GenomicFeatures, but I just cannot seem to find a solution as most tools extract the entire transcript or CDS sequence, not the partial one between specific positions. I can turn the BAM file to a GRanges object with the start and stop position of each pair as genomic coordinates, but that's it.

Any help would be greatly appreciated.

cds bam coordinates R sequencing • 1.3k views
ADD COMMENTlink modified 3.0 years ago by abascalfederico1.0k • written 3.0 years ago by morganeboone0
2
gravatar for abascalfederico
3.0 years ago by
abascalfederico1.0k
Spain
abascalfederico1.0k wrote:

I don't think there is an easy solution. May be you could use Ensembl's REST service. The following service tells you the genomic coordinate of a given protein site:

http://rest.ensembl.org/documentation/info/assembly_translation

Trying different protein sites (CDS) you could find which part of a protein is coded between those coordinates. 

Alternatively, the most elegant and efficient would be to do some parsing of CDS coordinates. From Ensembl/Biomart you can retrieve data on exons, CDS, protein sites, etc. from which you can make a bed file. Then intersect your genomic coordinates with the bed file and parse the intersecting CDS lines

 

ADD COMMENTlink modified 3.0 years ago • written 3.0 years ago by abascalfederico1.0k

Thank you for your answer abascalfederico. It's more or less what I thought. I'll try to make it work using the exon coordinates and intersections with my data. Thanks!

ADD REPLYlink written 3.0 years ago by morganeboone0
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 822 users visited in the last hour