Question: biomaRt LPR region to position giving different results than I expect (using R)
0
gravatar for Niek De Klein
3.5 years ago by
Niek De Klein2.5k
Netherlands
Niek De Klein2.5k wrote:

I am trying to map some ensembl gene IDs and LRG regions to position on the genome. I got the gene IDs and LRG regions by mapping transcript IDs to gene IDs with

mapping <- getBM(attributes = c("ensembl_transcript_id","ensembl_gene_id","hgnc_symbol",'chromosome_name','start_position','end_position'),filters = "ensembl_transcript_id", values = rownames(tpm_transcripts), mart = mart)
    tpm_transcripts$ensembl_gene_id <-  mapping[match(rownames(tpm_transcripts),mapping$ensembl_transcript_id),]$ensembl_gene_id

Then to map from gene ID to gene position I do:

mapping <- getBM(attributes = c("ensembl_gene_id",'chromosome_name','start_position','end_position'),filters "ensembl_transcript_id", values = rownames(tpm_transcripts), mart = mart)

And then get the position with

    mapping[mapping$ensembl_gene_id=="LRG_195",]

Which for this particular LRG region ives me:

     ensembl_gene_id chromosome_name   start_position  end_position
     LRG_195         LRG_195           5001            10472

 

So it doesn't give me a chromosome, and when I look at http://www.ensembl.org/Homo_sapiens/LRG/Summary?lrg=LRG_195 it says the position is: Chromosome 16: 30,178,605-30,191,076, so the start/end position are also wrong. Is there a different way of doing this?

 

 

 

biomart ensembl • 1.1k views
ADD COMMENTlink modified 3.5 years ago by Emily_Ensembl18k • written 3.5 years ago by Niek De Klein2.5k
2
gravatar for cpad0112
3.5 years ago by
cpad011211k
India
cpad011211k wrote:

both are correct. Please refer to the displayed information (esp scale bar). Biomart is giving you the correct result.

http://asia.ensembl.org/Homo_sapiens/LRG/Summary?lrg=LRG_195;redirect=no

You were trying to get LRG transcript ( I assume it is LRG195t1) coordinates. Biomart provides LRG transcript coordinates within LRG record i.e LRG195.  Look at the furnished example code below for gene coordinates provided by biomart:

code:

>getBM(attributes=c("ens_lrg_gene","ensembl_gene_id","hgnc_symbol",'chromosome_name','start_position','end_position'), filters = "ens_lrg_gene", values = "LRG_195", mart = ensembl)

output: 

 ens_lrg_gene ensembl_gene_id hgnc_symbol chromosome_name start_position end_position
1      LRG_195 ENSG00000102879      CORO1A              16       30182827     30189076

 

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by cpad011211k
2
gravatar for Emily_Ensembl
3.5 years ago by
Emily_Ensembl18k
EMBL-EBI
Emily_Ensembl18k wrote:

The purpose of LRGs is to be independent of genomes. It has no chromosomal location, it is its own location. Their coordinates are, therefore, themselves + 5kb upstream, allowing you to map variants to them and their likely regulatory regions. Have a look at some more LRGs, they all have themselves as the coordinate system name and they all have a start coordinate of 5001.

ADD COMMENTlink modified 3.5 years ago • written 3.5 years ago by Emily_Ensembl18k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2108 users visited in the last hour