Annotate .bed file with gene names and exon
3.2 years ago
mariab • 0

I have a bed file with the following structure (1 to 6 indicate the row number of the bed):

 chr.number       start       stop         V4
1 chr1  4131635  4131815  rs3936238

2 chr1 11489587 11489767   rs877309

3 chr1 21652120 21652300   rs213028

4 chr1 25277819 25278166 rs11249206

5 chr1 27022864 27022985  NM_139135

6 chr1 27023022 27023312  NM_139135

I would like to annotate it in columns 5 and 6 with the following exonic information, preferably in R.

gene symbol

Exon number

Example desired output:

chr.number  start   end GEN_SYMBOL  exon

chr13   32972272    32972941    BRCA2   exon27

for the entire bed file. If possible also start and end position for each gene and each exon as well.

I have tried with biomaRt, but I do not know how to get all those filters as output. Additionally, I only found out how to annotate with the ENSEMBL annotation and not the one I need...

Thank you in advance!

R • 2.6k views
3.2 years ago
2nelly ▴ 310

Hi mariab,

For me the easiest solution would be to use bedtools intersect function.

You can intersect your bed file with a gtf file(clean it first by keeping only exons coordinates) and get the 2 extra columns you want.

In case you don t have a gtf file, you can obtain it (mouse example) using the code below:

wget ./
gzip -d refGene.txt.gz
cut -f 2- refGene.txt > mm10refGene.input
genePredToGtf file mm10refGene.input mm10refGene.gtf
sort -V -k1,1 -k4,4 -k5,5 mm10refGene.gtf > mm10refGene
mv mm10refGene mm10refGene.gtf
rm mm10refGene.input refGene.txt refGene.txt.gz
3.2 years ago
Lila M ★ 1.2k

Hope this help


mart <- useDataset("mmusculus_gene_ensembl", useMart("ensembl"))

annotated <- getBM(filters= "yourfilter", 
attributes=  c("chromosome_name", "exon_chrom_start", "exon_chrom_end", "strand", "ensembl_gene_id","ensembl_exon_id"),
values=yourvalue, mart= mart)

