Entering edit mode
2.2 years ago
v.johnson
▴
20
I am trying to download a tabix file to perform an analysis on an eQTL dataset, however I have the following error each each file I try to try from the eQTL - catalogue
library(ggplot2)
library(readr)
library(coloc)
library(GenomicRanges)
library(seqminer)
tabix_paths = read.delim("https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths.tsv",
sep = "\t",
header = TRUE, stringsAsFactors = FALSE) %>% dplyr::as_tibble()
imported_tabix_paths = read.delim("https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths_imported.tsv",
sep = "\t", header = TRUE, stringsAsFactors = FALSE) %>% dplyr::as_tibble()
import_eQTLCatalogue <- function(ftp_path, region, selected_gene_id, column_names, verbose = TRUE){
if(verbose){
print(ftp_path)
}
#Fetch summary statistics with seqminer
fetch_table = seqminer::tabix.read.table(tabixFile = ftp_path, tabixRange = region, stringsAsFactors = FALSE) %>%
dplyr::as_tibble()
colnames(fetch_table) = column_names
#Remove rsid duplicates and multi-allelic variant
summary_stats = dplyr::filter(fetch_table, gene_id == selected_gene_id) %>%
dplyr::select(-rsid) %>%
dplyr::distinct() %>% #rsid duplicates
dplyr::mutate(id = paste(chromosome, position, sep = ":")) %>%
dplyr::group_by(id) %>%
dplyr::mutate(row_count = n()) %>% dplyr::ungroup() %>%
dplyr::filter(row_count == 1) #Multialllics
}
region = "3:56615721-57015721"
platelet_df = dplyr::filter(tabix_paths, study == "CEDAR", tissue_label == "platelet")
#Extract column names from first file
column_names = colnames(readr::read_tsv(platelet_df$ftp_path, n_max = 1))
#Import summary statistics
summary_stats = import_eQTLCatalogue(platelet_df$ftp_path, region, selected_gene_id = "ENSG00000163947", column_names)
[1] "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz"
Cannot open specified tabix file: ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz
Cannot open specified tabix file: ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz
Error in strsplit(body, "\t") : non-character argument
Called from: strsplit(body, "\t")
Browse[1]> Q
> platelet_df$ftp_path
[1] "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz"
I am not sure what the problem relates to? Any advice would be amazing!