Get Chromosome Annotation
1
0
Entering edit mode
14 months ago
Bine ▴ 40

Dear all,

So far I was able to fetch the Gene Name and Gene Entrez for my genes of interest with

> top0$Gene_Name <- mapIds(org.Hs.eg.db, > keys=ens.str, > column="SYMBOL", > keytype="ENSEMBL", > multiVals="first") top0$Gene_Entrez <- mapIds(org.Hs.eg.db,
keys=ens.str,
column="ENTREZID",
keytype="ENSEMBL",
multiVals="first")


But I am struggling to get the Chromosome information for these ones. Does anyone has an idea how I could do that?

Thank you very much, Bine

R annotation • 1.1k views
2
Entering edit mode
14 months ago

Why don't you try BioMart API??

mart <- useMart('ENSEMBL_MART_ENSEMBL', host='www.ensembl.org')
mart <- useDataset("hsapiens_gene_ensembl", mart)
annotLookup <- getBM(mart = mart, attributes = c('chromosome_name', 'start_position', 'end_position', 'strand', 'ensembl_gene_id', 'gene_biotype', 'hgnc_symbol'), filter = 'ensembl_gene_id', values = listOfInputENSIds, uniqueRows = TRUE)

0
Entering edit mode

Thank you but running this gives me

 "Error in useMart("ENSEMBL_MART_ENSEMBL", host = "www.ensembl.org") :
could not find function "useMart"


Any idea why I am getting this error?

0
Entering edit mode

It worked now!!! I dont know why i got this error earlier.

Thank you so much :) :)

0
Entering edit mode

Probably you forgot to load the package

library(biomaRt)

0
Entering edit mode

One additional question on this: I am now getting the following error with above code. Do you know what could be the reason?

list$annotLookup <- getBM(mart = mart, attributes = c('chromosome_name', 'hgnc_symbol'), filter = 'ensembl_gene_id', values = ens.str, uniqueRows = TRUE) Error in [[<-(*tmp*, name, value = list(chromosome_name = c("Y", "19", : 49 elements in value to replace 50 elements  Thanks so much, Bine ADD REPLY 1 Entering edit mode I am not sure about this error, it seems a very generalized error to me. Could you run the command without assigning it to any variable? ADD REPLY 0 Entering edit mode Thank you very much. Interestingly the error does not appear then. But somehow I need to add these values to my "list".. I wonder how else I could do that.. ADD REPLY 0 Entering edit mode If you don't mind can I see your complete code? I mean at least from the list declaration to the assignment of annotation dataframe. ADD REPLY 0 Entering edit mode Ah it seems that my "list" has 50 rows whereas my "annotLookup" list annotLookup <- getBM(mart = mart, attributes = c('chromosome_name', 'start_position', 'end_position', 'strand', 'ensembl_gene_id', 'gene_biotype', 'hgnc_symbol'), filter = 'ensembl_gene_id', values = ens.str, uniqueRows = TRUE)  has 49 rows. But annotLookup should have 50 rows since I said ens.str <- substr(top0$Gene_ID, 1, 50)


I dont understand why it has only 49...

Do you have an idea?

Thanks so much!

1
Entering edit mode

Honestly, I do not have any clue why you are doing this,

ens.str <- substr(top0$Gene_ID, 1, 50)  And for creating a list of annotations I would use something like this: listOfAnno = list() listOfAnno$anno1 = annotLookup
listOfAnno$anno2 = annotLookup ..... listOfAnno$annoN = annotLookup


If you are using for loop and I assume you have a list of ens_ids then,

listOfAnno = list()

for (i in seq(1, length(listOfEnsIds))) {   #OR you could you names instead of indices, if you have named list
annotLookup <-  getBM(mart = mart, attributes = c('chromosome_name', 'start_position', 'end_position', 'strand', 'ensembl_gene_id', 'gene_biotype', 'hgnc_symbol'), filter = 'ensembl_gene_id', values = listOfEnsIds[[i]], uniqueRows = TRUE)
listOfAnno[[i]] = annotLookup
}


OR you can run this operation using apply family functions (lapply).

0
Entering edit mode

I do this ens.str <- substr(top0\$Gene_ID, 1, 50) to limit my list to the top 50 genes. The list is much longer.

I still dont understand why I only get 49 then... I am still not able to combine list and annotLookup due to the difference (49/50).

0
Entering edit mode

The same happens if I use another variable. I have only 48 genes in annotLookup, even though I am asking for the top 50 genes...

0
Entering edit mode

I guess to select the top 50 genes substr would not help you. substr is for getting the substring of a character vector, it will trim the string to a specified length.

What I think about getting the less number of records (i mean less than the input ids) is, there could be few Ensemble Ids that are obsolete. Did you check the assembly version you are using?

0
Entering edit mode

Please use ADD REPLY when responding to existing comments to keep threads in logical order.