Run RnBeads with 'bismarkCytosine' files
1
0
Entering edit mode
4.0 years ago
Assa Yeroslaviz ★ 1.8k

I'm trying to run the RnBeads package with the output files from bismark coverage2cytosine call. I keep getting the following error -

2020-05-13 16:19:00     3.7  8e-07  STATUS STARTED Setting up Multicore
2020-05-13 16:19:00     3.7  8e-07    INFO     Using 2 cores
2020-05-13 16:19:00     3.7  8e-07  STATUS COMPLETED Setting up Multicore    
2020-05-13 16:19:00     3.7  8e-07  STATUS STARTED RnBeads Pipeline
2020-05-13 16:19:00     3.7  8e-07    INFO     Analysis Title: Arno A. Bi-Sulphite Seq data analysis 2nd try - bismarkCytosine files
2020-05-13 16:19:00     3.7  8e-07    INFO     Initialized report index and saved to index.html
2020-05-13 16:19:00     3.7  8e-07  STATUS     STARTED Loading Data
2020-05-13 16:19:00     3.7  8e-07    INFO         Number of cores: 2
2020-05-13 16:19:00     3.7  8e-07    INFO         Loading data of type "bs.bed.dir"
2020-05-13 16:19:00     3.7  8e-07  STATUS         STARTED Performing loading test
2020-05-13 16:19:00     3.7  8e-07    INFO             The first 10000 rows will be read from each data file
2020-05-13 16:19:00     3.7  8e-07    INFO             No column with file names specified: will try to find one
2020-05-13 16:19:00     3.7  8e-07  STATUS             STARTED Loading Data From BED Files
2020-05-13 16:19:01     3.7  8e-07  STATUS                 STARTED Automatically parsing the provided sample annotation file
2020-05-13 16:19:01     3.7  8e-07  STATUS                     Potential file names found in column 3 of the supplied annotation table
2020-05-13 16:19:01     3.7  8e-07  STATUS                 COMPLETED Automatically parsing the provided sample annotation file
2020-05-13 16:19:01     3.7  8e-07    INFO                 Reading BED file: ***/GID4.R1.coverage2cytosine.bed
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
      scan() expected 'an integer', got 'CG'
    Calls: rnb.run.analysis ... read.table.ffdf -> do.call -> read.delim -> read.table -> scan
    Execution halted

The files I'm using looks like that:

head ***/GID4.R1.coverage2cytosine.bed
10      10494   +       0       CG      CG      CGT
10      10495   -       0       CG      CG      CGG
10      10497   +       0       CG      CG      CGC
10      10498   -       0       CG      CG      CGA
10      10508   +       0       CG      CG      CGA
10      10509   -       0       CG      CG      CGG
10      10523   +       0       CG      CG      CGA
10      10524   -       0       CG      CG      CGC
10      10535   +       0       CG      CG      CGC
10      10536   -       0       CG      CG      CGG

I can understand the message, but I can't figure out, where it looks for the integers and find these CG.

To run the script i use the following in an script script:

rnb.options(
        analysis.name        = "bismarkCytosine files",
        email = "******",
        identifiers.column   = "sampleID",
        replicate.id.column = "treatment",
        import.bed.style     = "bismarkCytosine",
        assembly             = "hg38",
        colors.meth        = c("#EDF8B1","#41B6C4","#081D58"),
        colors.category    = c("#1B9E77","#D95F02","#7570B3","#E7298A","#66A61E",
                                   "#E6AB02","#A6761D","#666666","#2166AC","#B2182B",
                                   "#00441B","#40004B","#053061"),
        qc.coverage.plots = TRUE,
        qc.coverage.histograms = TRUE,
        qc.coverage.violins = TRUE,
        inference = TRUE,
        inference.targets.sva = "treatment", # Column names in the sample annotation table for which surrogate variable analysis (SVA) should be conducted.
  inference.sva.num.method = "be", #What function should be used to estimate the number of surrogate variables. 
  differential.comparison.columns = c("WT_GID4",        "WT_MAEA",      "GID4_MAEA"), # Column names in the sample annotation table to be used for group definition in the differential methylation analysis.
  differential.comparison.columns.all.pairwise = c("WT_GID4",   "WT_MAEA",      "GID4_MAEA"), # Column names in the sample annotation table to be used for group definition in the differential methylation analysis in which all pairwise comparisons between groups should be conducted (the default is "one vs all" if multiple groups are specified in a column).
  region.types = c("genes", "promoters"), # Region types to carry out analysis on (this would remove the analysis for sites [done only to shorten the process.])
  differential.site.test.method = "limma", # Method to be used for calculating p-values on the site level. 
  differential.variability = TRUE, # With this analysis, the variance inside each group is sused to detect differences among them.
  differential.variability.method = "diffVar",
  differential.enrichment.go = TRUE, # whether Gene Ontology (GO)-enrichment analysis is to be conducted on the identified differentially methylated regions.
  differential.enrichment.lola = TRUE, # whether LOLA-enrichment analysis is to be conducted on the identified differentially methylated regions.
  differential.enrichment.lola.dbs = c("${LOLACore}"), # database for LOLA enrichment analysis
  export.to.trackhub = c("bigBed","bigWig"), # create tracks and hub structure
  export.to.csv = TRUE,
# several parameters for better memory management and parallel processing
        disk.dump.big.matrices = TRUE,
        disk.dump.bigff = TRUE,
        logging.disk = TRUE,
        enforce.memory.management = TRUE
        )
theme_set(theme_bw())

num.cores = 2
parallel.setup(num.cores)

rnb.run.analysis(dir.reports = report.dir, sample.sheet = sample_annotations,
                 data.dir = bed.dir, data.type = "bs.bed.dir",
                 build.index = TRUE,save.rdata = TRUE, initialize.reports = TRUE)
# shut down parallel processing 
parallel.teardown()

The only two things I have changed from the run with the cov output file from bismark are the line

    import.bed.style     = "bismarkCytosine",

and a different list of input files. In the bismark run my coverage2cytosine files were chormosome-specific, so I concatenate them with cat. Can this cause a problem?

Any ideas?

thanks Assa

rnbeads bismark 'bismarkCytosine' • 1.9k views
ADD COMMENT
0
Entering edit mode

Does it make any difference that in RnBeads' vignette the example input files for bismarkCytosine (and the rest) has chromosomes in the "chr" format instead of just numbers?

ADD REPLY
0
Entering edit mode

I hope not, as these are only the annotations, depending on the source of the genome, e.g. Ensembl vs. UCSC

ADD REPLY
2
Entering edit mode
3.9 years ago
mscherer ▴ 50

Thanks for reporting this. We are currently facing some issues with the bismarkCytosine import (see also the Issue reported here https://github.com/epigen/RnBeads/issues/7), but are working on fixing this.

However, there should be an easy workaround: Instead of using the 'bismarkCytosine' option, use the 'bismarkCov' option as 'import.bed.style' , which can be directly obtained from the bedGraph output of bismark using bismark2bedGraph (see the section "Optional bedGraph output" in the bismark user guide https://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf). In the output snapshot that you provided, coverage information is not available, which is crucial information to RnBeads, especially to perform QC and site filtering.

Hope that helps.

ADD COMMENT
0
Entering edit mode

In the output snapshot that you provided, coverage information is not available, which is crucial information to RnBeads, especially to perform QC and site filtering.

Yes, but this file doesn't contain any coverage information per default (AFAIK). So RnBeads should expect to find it using this option.

I'll try using the bismarkCov format.

ADD REPLY

Login before adding your answer.

Traffic: 2900 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6