Deleted:Tabix file download error eQTL analysis
0
0
Entering edit mode
2.3 years ago
v.johnson ▴ 20

I am trying to download a tabix file to perform an analysis on an eQTL dataset, however I have the following error each each file I try to try from the eQTL - catalogue

library(ggplot2)
library(readr)
library(coloc)
library(GenomicRanges)
library(seqminer)

tabix_paths = read.delim("https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths.tsv", 
                         sep = "\t", 
                         header = TRUE, stringsAsFactors = FALSE) %>% dplyr::as_tibble()
imported_tabix_paths = read.delim("https://raw.githubusercontent.com/eQTL-Catalogue/eQTL-Catalogue-resources/master/tabix/tabix_ftp_paths_imported.tsv", 
                                  sep = "\t", header = TRUE, stringsAsFactors = FALSE) %>% dplyr::as_tibble()

import_eQTLCatalogue <- function(ftp_path, region, selected_gene_id, column_names, verbose = TRUE){

  if(verbose){
    print(ftp_path)
  }

  #Fetch summary statistics with seqminer
  fetch_table = seqminer::tabix.read.table(tabixFile = ftp_path, tabixRange = region, stringsAsFactors = FALSE) %>%
    dplyr::as_tibble()
  colnames(fetch_table) = column_names

  #Remove rsid duplicates and multi-allelic variant
  summary_stats = dplyr::filter(fetch_table, gene_id == selected_gene_id) %>%
    dplyr::select(-rsid) %>% 
    dplyr::distinct() %>% #rsid duplicates
    dplyr::mutate(id = paste(chromosome, position, sep = ":")) %>% 
    dplyr::group_by(id) %>% 
    dplyr::mutate(row_count = n()) %>% dplyr::ungroup() %>% 
    dplyr::filter(row_count == 1) #Multialllics
}

region = "3:56615721-57015721"
platelet_df = dplyr::filter(tabix_paths, study == "CEDAR", tissue_label == "platelet")

#Extract column names from first file
column_names = colnames(readr::read_tsv(platelet_df$ftp_path, n_max = 1))

#Import summary statistics
summary_stats = import_eQTLCatalogue(platelet_df$ftp_path, region, selected_gene_id = "ENSG00000163947", column_names)

    [1] "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz"

    Cannot open specified tabix file: ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz
    Cannot open specified tabix file: ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz
    Error in strsplit(body, "\t") : non-character argument
    Called from: strsplit(body, "\t")
    Browse[1]> Q
    > platelet_df$ftp_path
    [1] "ftp://ftp.ebi.ac.uk/pub/databases/spot/eQTL/csv/CEDAR/microarray/CEDAR_microarray_platelet.all.tsv.gz"

I am not sure what the problem relates to? Any advice would be amazing!

download R eQTL tabix error • 586 views
ADD COMMENT
This thread is not open. No new answers may be added
Traffic: 2749 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6