I'm trying to filter Birdsuite genotype calls (Affymetrix SNP IDs) for a list of 62 proteins (Ensembl peptide IDs) using biomaRt. Since the number of unfiltered genotype calls is large, I cannot query for the genes of each SNP, but have to query for the SNPs of each gene on my list.
Therefore, I first get the list of gene IDs for the ENSP IDs:
ensembl_mart <- biomaRt::useMart("ENSEMBL_MART_ENSEMBL", "hsapiens_gene_ensembl") ensembl_genes <- biomaRt::getBM(c("ensembl_gene_id", "ensembl_peptide_id"), "ensembl_peptide_id", ensembl_peptide_ids, ensembl_mart)
Then I try to fetch all SNPs (which I later want to match with my genotype calls) according to that list of genes:
snp_mart <- biomaRt::useMart("ENSEMBL_MART_SNP", "hsapiens_snp") snps <- biomaRt::getBM(c("refsnp_id", "ensembl_gene_stable_id"), "ensembl_gene", ensembl_genes$ensembl_gene_id, snp_mart)
However, the query times out after 5 minutes with the following error message:
Error in curl::curl_fetch_memory(url, handle = handle) : Timeout was reached: Operation timed out after 300001 milliseconds with 380749 bytes received
I tried submitting separate queries for each gene, but for some genes (presumably the ones with many SNPs) I still get a timeout. My internet connection is fine and I can access the Ensembl homepage without problems. Also, I tried different Ensembl hosts at different times of the day.
Does anyone have an idea how to address this problem, or suggestions for alternative approaches?