How can I download the UniProt ID for protein coding genes? I have about 20000 genes and it is hard to just search them one by one. IS there any code for it?
I have a code to download gene names through the DESeq2 workflow, however it does not give UniProt ID.
attributeNames <-c("ensembl_gene_id","external_gene_name","hgnc_symbol", "chromosome_name","description", "entrezgene_id") filterValues <- rownames(result_DESeq2) Annotations <- getBM(attributes=attributeNames, filters = "ensembl_gene_id",values = filterValues, mart=useMart(biomart="ensembl", dataset="hsapiens_gene_ensembl")) resAnnotated <- as.data.frame(res) %>% rownames_to_column("ensembl_gene_id") %>% left_join(Annotations, "ensembl_gene_id") %>% dplyr::rename(logFC=log2FoldChange, FDR=padj)
You can download UniProt ID's from their site. Customize columns you need or additionally filter using the options in left column.
Thank you GenoMax
The workflow is using the biomaRt library (as @Hamid points out below). If you create the mart, you can query it for the attributes in can return using
listAttributes(mart), and you'll see the roughly 3000 or so things you can get back, including UniProt IDs.