Entering edit mode
7.1 years ago
sahar850
•
0
Hi,
I need to convert data from TCGA in the form of ensembl gene id version to hgnc symbol using Biomat r package. After creating a data frame containing all the ensembl gene id,I tried this loop code:
for (i in 1:length(data[,1])) {
data[i,1] <- getBM(attributes=c('hgnc_symbol'),filters = 'ensembl_gene_id', values =
sub("\\..*", "", data[i,1]), mart = ensembl)
}
But I keep getting this error message:
Error in x[[jj]][iseq] <- vjj : replacement has length zero
I also tried this code:
hgnc_id <- getBM(attributes=c('hgnc_symbol'),filters = 'ensembl_gene_id_version', values = data[,1], mart = ensembl)
In this case I only get 15000 out of the 60000 genes
hgnc_id <- getBM(attributes=c('hgnc_symbol'),filters = 'ensembl_gene_id', values = sub("\\..*", "", data[,1]), mart = ensembl)
In this case I only get 30000 out of the 60000 genes
Anyone had a similar problem or can offer a solution?
Side note: It's
ensembl, there's noeat the end of the word.First off, I'd recommend using parameter names when you call functions, so commands are explicit. This is especially useful with the
subandgsub, asx,patternandreplacementare really weirdly positioned in these functions.Does
sub(pattern="\\..*", replacement="", x=data[1:15,1])give you the expected output in the expected format (vector)? I recall needing to usesapplyto get anunlisted vector of results fromgsub.Tnx for the tip, i will add the parameters names (it's actually the first time i'm using R) To the current subject, the sub works fain, i just tried running the code on parts of the data and the error is given in the 532 element which is: ENSG00000036549.11 and ENSG00000036549 after the sub, really cant see why it stopped specifically there... all the element before it actually got the hgnc symbol.
i will try to use try catch so it will skip an index the ones who make this error pop out (if its possible i R...) but if someone have a better solution it will be helpful
What happens when you query with just
ENSG00000036549? Compare that to a couple of calls made with different gene ids, and you should see where your code breaks.It was the first thin i did, i get the same error message listed above...
What is your R version?
my R version is 3.5.0
IMO 3.5 might not be mature yet - I've had problems working on 3.5 too. Can you try working on 3.4.1 maybe? You can use conda to install 3.4.1 without affecting your 3.5 installation:
Once done, you can check
which R, ensure it points to the conda environment specific R and install bioconductor.