Question: Annotate .bed file with gene names and exon
0
gravatar for mariab
12 months ago by
mariab0
mariab0 wrote:

I have a bed file with the following structure (1 to 6 indicate the row number of the bed):

 chr.number       start       stop         V4
----------
1 chr1  4131635  4131815  rs3936238

2 chr1 11489587 11489767   rs877309

3 chr1 21652120 21652300   rs213028

4 chr1 25277819 25278166 rs11249206

5 chr1 27022864 27022985  NM_139135

6 chr1 27023022 27023312  NM_139135

I would like to annotate it in columns 5 and 6 with the following exonic information, preferably in R.

gene symbol

Exon number

Example desired output:

chr.number  start   end GEN_SYMBOL  exon

chr13   32972272    32972941    BRCA2   exon27

for the entire bed file. If possible also start and end position for each gene and each exon as well.

I have tried with biomaRt, but I do not know how to get all those filters as output. Additionally, I only found out how to annotate with the ENSEMBL annotation and not the one I need...

Thank you in advance!

R • 599 views
ADD COMMENTlink modified 12 months ago by Lila M 820 • written 12 months ago by mariab0
0
gravatar for 2nelly
12 months ago by
2nelly180
Geneva,Switzerland
2nelly180 wrote:

Hi mariab,

For me the easiest solution would be to use bedtools intersect function.

You can intersect your bed file with a gtf file(clean it first by keeping only exons coordinates) and get the 2 extra columns you want.

In case you don t have a gtf file, you can obtain it (mouse example) using the code below:

wget http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/refGene.txt.gz ./
gzip -d refGene.txt.gz
cut -f 2- refGene.txt > mm10refGene.input
genePredToGtf file mm10refGene.input mm10refGene.gtf
sort -V -k1,1 -k4,4 -k5,5 mm10refGene.gtf > mm10refGene
mv mm10refGene mm10refGene.gtf
rm mm10refGene.input refGene.txt refGene.txt.gz
ADD COMMENTlink written 12 months ago by 2nelly180
0
gravatar for Lila M
12 months ago by
Lila M 820
UK
Lila M 820 wrote:

Hope this help

library(biomaRt) 

mart <- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))

annotated <- getBM(filters= "yourfilter", 
attributes=  c("chromosome_name", "exon_chrom_start", "exon_chrom_end", "strand", "ensembl_gene_id","ensembl_exon_id"),
values=yourvalue, mart= mart)
ADD COMMENTlink written 12 months ago by Lila M 820
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1059 users visited in the last hour