I have a data set of mutation calls that I would like to map onto a reference human genome in order to infer the ratio of synonymous to non-synonymous substitutions.
To proceed, I used a list of hgnc gene names from the human genome and attempted to obtain their coordinates using BioMart, i.e.
library("BSgenome.Hsapiens.UCSC.hg19") library("biomaRt") mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl")) f = getBM(attributes = c("genomic_coding_start", "genomic_coding_end"), filters = "hgnc_symbol", values = genes, mart = mart)
I selected "coding" because for dN/dS I'm only concerned with matching identified mutations to coding regions. What I found was that the coordinates in genomic_coding_start - genomic_coding_end did NOT correspond to the lengths of the returned cds obtained using the attribute "coding."
Is there a simple way to obtain both the cds sequence and its chromosome location using biomaRt, and if not, I would greatly appreciate an alternative way to obtain this information.
Thank you in advance for your feedback and assistance.