Question

Run RnBeads with 'bismarkCytosine' files

0

Entering edit mode

4.0 years ago

Assa Yeroslaviz ★ 1.8k

I'm trying to run the RnBeads package with the output files from bismark coverage2cytosine call. I keep getting the following error -

2020-05-13 16:19:00     3.7  8e-07  STATUS STARTED Setting up Multicore
2020-05-13 16:19:00     3.7  8e-07    INFO     Using 2 cores
2020-05-13 16:19:00     3.7  8e-07  STATUS COMPLETED Setting up Multicore    
2020-05-13 16:19:00     3.7  8e-07  STATUS STARTED RnBeads Pipeline
2020-05-13 16:19:00     3.7  8e-07    INFO     Analysis Title: Arno A. Bi-Sulphite Seq data analysis 2nd try - bismarkCytosine files
2020-05-13 16:19:00     3.7  8e-07    INFO     Initialized report index and saved to index.html
2020-05-13 16:19:00     3.7  8e-07  STATUS     STARTED Loading Data
2020-05-13 16:19:00     3.7  8e-07    INFO         Number of cores: 2
2020-05-13 16:19:00     3.7  8e-07    INFO         Loading data of type "bs.bed.dir"
2020-05-13 16:19:00     3.7  8e-07  STATUS         STARTED Performing loading test
2020-05-13 16:19:00     3.7  8e-07    INFO             The first 10000 rows will be read from each data file
2020-05-13 16:19:00     3.7  8e-07    INFO             No column with file names specified: will try to find one
2020-05-13 16:19:00     3.7  8e-07  STATUS             STARTED Loading Data From BED Files
2020-05-13 16:19:01     3.7  8e-07  STATUS                 STARTED Automatically parsing the provided sample annotation file
2020-05-13 16:19:01     3.7  8e-07  STATUS                     Potential file names found in column 3 of the supplied annotation table
2020-05-13 16:19:01     3.7  8e-07  STATUS                 COMPLETED Automatically parsing the provided sample annotation file
2020-05-13 16:19:01     3.7  8e-07    INFO                 Reading BED file: ***/GID4.R1.coverage2cytosine.bed
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
      scan() expected 'an integer', got 'CG'
    Calls: rnb.run.analysis ... read.table.ffdf -> do.call -> read.delim -> read.table -> scan
    Execution halted

The files I'm using looks like that:

head ***/GID4.R1.coverage2cytosine.bed
10      10494   +       0       CG      CG      CGT
10      10495   -       0       CG      CG      CGG
10      10497   +       0       CG      CG      CGC
10      10498   -       0       CG      CG      CGA
10      10508   +       0       CG      CG      CGA
10      10509   -       0       CG      CG      CGG
10      10523   +       0       CG      CG      CGA
10      10524   -       0       CG      CG      CGC
10      10535   +       0       CG      CG      CGC
10      10536   -       0       CG      CG      CGG

I can understand the message, but I can't figure out, where it looks for the integers and find these CG.

To run the script i use the following in an script script:

rnb.options(
        analysis.name        = "bismarkCytosine files",
        email = "******",
        identifiers.column   = "sampleID",
        replicate.id.column = "treatment",
        import.bed.style     = "bismarkCytosine",
        assembly             = "hg38",
        colors.meth        = c("#EDF8B1","#41B6C4","#081D58"),
        colors.category    = c("#1B9E77","#D95F02","#7570B3","#E7298A","#66A61E",
                                   "#E6AB02","#A6761D","#666666","#2166AC","#B2182B",
                                   "#00441B","#40004B","#053061"),
        qc.coverage.plots = TRUE,
        qc.coverage.histograms = TRUE,
        qc.coverage.violins = TRUE,
        inference = TRUE,
        inference.targets.sva = "treatment", # Column names in the sample annotation table for which surrogate variable analysis (SVA) should be conducted.
  inference.sva.num.method = "be", #What function should be used to estimate the number of surrogate variables. 
  differential.comparison.columns = c("WT_GID4",        "WT_MAEA",      "GID4_MAEA"), # Column names in the sample annotation table to be used for group definition in the differential methylation analysis.
  differential.comparison.columns.all.pairwise = c("WT_GID4",   "WT_MAEA",      "GID4_MAEA"), # Column names in the sample annotation table to be used for group definition in the differential methylation analysis in which all pairwise comparisons between groups should be conducted (the default is "one vs all" if multiple groups are specified in a column).
  region.types = c("genes", "promoters"), # Region types to carry out analysis on (this would remove the analysis for sites [done only to shorten the process.])
  differential.site.test.method = "limma", # Method to be used for calculating p-values on the site level. 
  differential.variability = TRUE, # With this analysis, the variance inside each group is sused to detect differences among them.
  differential.variability.method = "diffVar",
  differential.enrichment.go = TRUE, # whether Gene Ontology (GO)-enrichment analysis is to be conducted on the identified differentially methylated regions.
  differential.enrichment.lola = TRUE, # whether LOLA-enrichment analysis is to be conducted on the identified differentially methylated regions.
  differential.enrichment.lola.dbs = c("${LOLACore}"), # database for LOLA enrichment analysis
  export.to.trackhub = c("bigBed","bigWig"), # create tracks and hub structure
  export.to.csv = TRUE,
# several parameters for better memory management and parallel processing
        disk.dump.big.matrices = TRUE,
        disk.dump.bigff = TRUE,
        logging.disk = TRUE,
        enforce.memory.management = TRUE
        )
theme_set(theme_bw())

num.cores = 2
parallel.setup(num.cores)

rnb.run.analysis(dir.reports = report.dir, sample.sheet = sample_annotations,
                 data.dir = bed.dir, data.type = "bs.bed.dir",
                 build.index = TRUE,save.rdata = TRUE, initialize.reports = TRUE)
# shut down parallel processing 
parallel.teardown()

The only two things I have changed from the run with the cov output file from bismark are the line

    import.bed.style     = "bismarkCytosine",

and a different list of input files. In the bismark run my coverage2cytosine files were chormosome-specific, so I concatenate them with cat. Can this cause a problem?

Any ideas?

thanks Assa

rnbeads bismark 'bismarkCytosine' • 1.9k views

ADD COMMENT • link updated 3.9 years ago by mscherer ▴ 50 • written 4.0 years ago by Assa Yeroslaviz ★ 1.8k

0

Entering edit mode

Does it make any difference that in RnBeads' vignette the example input files for bismarkCytosine (and the rest) has chromosomes in the "chr" format instead of just numbers?

ADD REPLY • link 4.0 years ago by Papyrus ★ 2.9k

0

Entering edit mode

I hope not, as these are only the annotations, depending on the source of the genome, e.g. Ensembl vs. UCSC

ADD REPLY • link 3.9 years ago by Assa Yeroslaviz ★ 1.8k

score 2 · Answer 1 · 2020-05-25

Thanks for reporting this. We are currently facing some issues with the bismarkCytosine import (see also the Issue reported here https://github.com/epigen/RnBeads/issues/7), but are working on fixing this.

However, there should be an easy workaround: Instead of using the 'bismarkCytosine' option, use the 'bismarkCov' option as 'import.bed.style' , which can be directly obtained from the bedGraph output of bismark using bismark2bedGraph (see the section "Optional bedGraph output" in the bismark user guide https://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide.pdf). In the output snapshot that you provided, coverage information is not available, which is crucial information to RnBeads, especially to perform QC and site filtering.

Hope that helps.