Question: Find gene using nucleotide sequence position in R
1
gravatar for swamyvinny
4 weeks ago by
swamyvinny10
swamyvinny10 wrote:

Hi, I am trying to find what gene, if any, a specific nucleotide positions belong to. For example I'd like to know what gene nucleotide 100004651 is on. I'm using R currently and would be grateful if anyone can point me towards any packages that could help.

I'm working with the human genome if that helps at all, and also have information on which chromsome the position belongs to and which strand(+/-)

Thanks in advance

sequence R genome • 212 views
ADD COMMENTlink modified 4 weeks ago by Charles Plessy2.2k • written 4 weeks ago by swamyvinny10

use bedtools to intersect coordinates with gene coordinates

ADD REPLYlink written 4 weeks ago by cpad01121.9k
2
gravatar for Emily_Ensembl
4 weeks ago by
Emily_Ensembl13k
EMBL-EBI
Emily_Ensembl13k wrote:

You could use biomaRt, filtering by your loci.

ADD COMMENTlink written 4 weeks ago by Emily_Ensembl13k

Since I recently did this I can easily share you some code (random position in human genome)

library("biomaRt")
mart = useMart("ensembl", dataset="hsapiens_gene_ensembl")

all.genes <- getBM(
    attributes=c("ensembl_gene_id","hgnc_symbol","chromosome_name","start_position","end_position"),
    filters=c("chromosome_name", "start", "end"),
    values=list(chromosome="5", start="1565669",end="1565670"),
    mart=mart)
ADD REPLYlink written 4 weeks ago by WouterDeCoster21k

This is exactly what I need! thank you so much

ADD REPLYlink written 29 days ago by swamyvinny10

Glad that it was useful. I have moved my comment to an answer so you can accept it and mark this thread as closed.

Upvote|Bookmark|Accept

ADD REPLYlink written 29 days ago by WouterDeCoster21k
0
gravatar for Charles Plessy
4 weeks ago by
Charles Plessy2.2k
Japan
Charles Plessy2.2k wrote:

Here is an internal function that I am about to add to Bioconductor's CAGEr package, to solve the same problem.

#' ranges2genes
#' 
#' Assign gene symbol(s) to Genomic Ranges.
#' 
#' @param ranges Genomics Ranges object, for example extracted from a
#'               RangedSummarizedExperiment object with the \code{rowRanges}
#'               command.
#' 
#' @param genes A \code{\link{GRanges}} object containing \code{gene_name} metadata.
#' 
#' @return A character vector of same length as the GRanges object, indicating
#'         one gene symbol or a comma-separated list of gene symbols for each
#'         range.
#' 
#' @importFrom GenomicRanges findOverlaps
#' @importFrom S4Vectors List Rle unstrsplit
#' @importFrom IRanges extractList
#' 
#' @examples
#' # Example for Biostars
#' # Imagine that nucleotides are represented by a GRanges object called "gr"
#' # Download GENCODE from Bioc's annotation hub:
#' ah <- AnnotationHub::AnnotationHub()
#' gff <- ah[["AH49556"]]
#' Instead one can also use rtracklayer::import.gff, etc...
#' 
#' Annotate the GRanges:
#' gr$geneSymbols <- ranges2genes(gr, gff)

ranges2genes <- function(ranges, genes) {
  if (is.null(genes$gene_name))
    stop("Annotation must contain ", dQuote("gene_name"), " metdata.")
  gnames <- findOverlaps(ranges, genes)
  gnames <- as(gnames, "List")
  gnames <- extractList(genes$gene_name, gnames)
  gnames <- unique(gnames)
  gnames <- unstrsplit(gnames, ";")
  Rle(gnames)
}
ADD COMMENTlink written 4 weeks ago by Charles Plessy2.2k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 514 users visited in the last hour