biomaRt LPR region to position giving different results than I expect (using R)
2
0
Entering edit mode
8.6 years ago
Niek De Klein ★ 2.6k

I am trying to map some ensembl gene IDs and LRG regions to position on the genome. I got the gene IDs and LRG regions by mapping transcript IDs to gene IDs with

mapping <- getBM(attributes = c("ensembl_transcript_id","ensembl_gene_id","hgnc_symbol",'chromosome_name','start_position','end_position'),filters = "ensembl_transcript_id", values = rownames(tpm_transcripts), mart = mart)
    tpm_transcripts$ensembl_gene_id <-  mapping[match(rownames(tpm_transcripts),mapping$ensembl_transcript_id),]$ensembl_gene_id

Then to map from gene ID to gene position I do:

mapping <- getBM(attributes = c("ensembl_gene_id",'chromosome_name','start_position','end_position'),filters "ensembl_transcript_id", values = rownames(tpm_transcripts), mart = mart)

And then get the position with

mapping[mapping$ensembl_gene_id=="LRG_195",]

Which for this particular LRG region ives me:

     ensembl_gene_id chromosome_name   start_position  end_position
     LRG_195         LRG_195           5001            10472

So it doesn't give me a chromosome, and when I look at http://www.ensembl.org/Homo_sapiens/LRG/Summary?lrg=LRG_195 it says the position is: Chromosome 16: 30,178,605-30,191,076, so the start/end position are also wrong. Is there a different way of doing this?

ensembl biomaRt • 2.2k views
ADD COMMENT
2
Entering edit mode
8.6 years ago

both are correct. Please refer to the displayed information (esp scale bar). Biomart is giving you the correct result.

http://asia.ensembl.org/Homo_sapiens/LRG/Summary?lrg=LRG_195;redirect=no

You were trying to get LRG transcript ( I assume it is LRG195t1) coordinates. Biomart provides LRG transcript coordinates within LRG record i.e LRG195. Look at the furnished example code below for gene coordinates provided by biomart:

code:

>getBM(attributes=c("ens_lrg_gene","ensembl_gene_id","hgnc_symbol",'chromosome_name','tart_position','end_position'), filters = "ens_lrg_gene", values = "LRG_195", mart = ensembl)

output:

    ens_lrg_gene ensembl_gene_id hgnc_symbol chromosome_name start_position end_position
1      LRG_195 ENSG00000102879      CORO1A              16       30182827     30189076
ADD COMMENT
2
Entering edit mode
8.6 years ago
Emily 23k

The purpose of LRGs is to be independent of genomes. It has no chromosomal location, it is its own location. Their coordinates are, therefore, themselves + 5kb upstream, allowing you to map variants to them and their likely regulatory regions. Have a look at some more LRGs, they all have themselves as the coordinate system name and they all have a start coordinate of 5001.

ADD COMMENT

Login before adding your answer.

Traffic: 2399 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6