I need to find a gene name using the sseqid from blastn results: gi|19698730|gb|AC079789.7|
and I want to use the gi or gb id to find a gene name. I have parsed my data into an R dataframe where sseqid1 contains gi and sseqid2 contains gb.
This is my R code:
ensembl <- useMart('ENSEMBL_MART_ENSEMBL', dataset="cporcellus_gene_ensembl") keys=as.character(res$sseqid1) res$genename = getBM( attributes=c('external_gene_name','protein_id', 'refseq_mrna_predicted','entrezgene'), values=keys,mart=ensembl)
The last statement results in an error because of the biomart result set. My guess is that it returns more than one result per gi but I can't really figure out what biomart is doing, why it is doing it and what I need to do to fix it.
What I want is a gene name for each sseqid. I don't care whether the gi or the gb is used. I can't find exact definitions for the abbreviations gi and gb and I haven't been able to find out which of the biomart attributes these relate to to. If this information is in the documentation, I have not been able to find it.
Is there anyone that can help me out to get this done?
Thanks in advance. Jannetta