Question

RnBeads Differential Methylation

0

Entering edit mode

7 months ago

aroso491 • 0

Hi,

I am trying to run a Differential methylation analysis with RnBeads as I have done previously on a similar fashion. This time I am just running it on my HPC and it seems to run just fine until it arrives to the differential methylation step. See the error output below:

    Loading required package: BiocParallel   

 opening ff /local/scratch/96814/RtmpmMxVBc/ff/ff1a07b2bb247d.ff
    opening ff /local/scratch/96814/RtmpmMxVBc/ff/ff1a07b6ddc2074.ff
    Coordinate system already present. Adding new coordinate system, which will replace the existing one.
    **Error in rnb.sample.groups(x, columns = pheno.cols, columns.pairs = columns.pairs) :
      invalid value for columns.pairs; not all names are present in the annotation
    Calls: rnb.run.analysis ... rnb.execute.computeDiffMeth -> get.comparison.info -> rnb.sample.groups
    In addition: There were 50 or more warnings (use warnings() to see the first 50)
    Execution halted**

I suspect it is a naming issue so this is the first line of my samplesheet - I tried attaching it as an image but the website broke on me. I have a total of 51 samples and 3 groups: Non inflammed vs Responders, Non inflammed vs Non responders and Responders vs Non responders.

I have used Res, Non_Res and Non to identify the different samples and when I did not want a sample to be used in a comparison I used NA

filename_bed    Sample_ID       Sample_Group    Non_Inflamed_vs_Res     Non_Inflamed_vs_Non_Res Res_vs_Non_Res
PN0272_044_S44_L001_R1_001.sorted.markDups_CpG.bedGraph.output.bedGraph PN0272_044_S44_L001_R1_001_SUDV27_A     Res     Res     NA      Res
PN0272_045_S45_L001_R1_001.sorted.markDups_CpG.bedGraph.output.bedGraph PN0272_045_S45_L001_R1_001_SUDV27_B     Res     Res     NA      Res

All the 51 files as they appear in the folder where the samplesheet.txt is located follow this naming structure:

PN0192_0001_S1_L001_R1_001.sorted.markDups_CpG.bedGraph.output.bedGraph

Again, we know it is ugly and redundant but we were given the files and my colleague did not want to change the names.

And finally, below is the code I am using to run this differential methylation analysis:

library(RnBeads)
library(RnBeads.hg19)
library(dplyr)
library(purrr)
library(grid)
library(readr)
library(sva)

data.dir <- ("/home/sdata/epigastro_alejandra/store2/arodriguez/RnBeads_Lola_Analysis/AllLocation_responders/")
bed.dir<-file.path(data.dir) #just for loading SAMPLES for RnBeads
sample.annotation<-file.path(data.dir, "samplesheet.txt") #specify name comparisons
analysis.dir<-file.path(data.dir, "AllLoc_Analysis") # specify name of analysis directory
report.dir<- file.path(analysis.dir,"AllLoc_Report") # specify name of report directory

####### SETTING PARAMETERS ####################################################
rnb.options(analysis.name = "AllLoc_analysis") #Name this analysis
rnb.options(email="alejandrarodrigu21@rcsi.com") #email contact for this analysis
rnb.options(logging = TRUE) #Default setting TRUE: Creating records of the analysis
rnb.options(identifiers.column="Sample_ID", assembly="hg19", import.bed.style="bismarkCov", import.table.separator="\t")

####### COVARIATE ####################################################

rnb.options(inference=TRUE)
rnb.options(inference.targets.sva="Batch", inference.sva.num.method="be")
rnb.options(inference.age.prediction=FALSE)
rnb.options(covariate.adjustment.columns=c("Age", "Gender"))

####### DIFFENTIAL MET ANALYSIS

#APPLY BATCH CORRECTION
rnb.options(differential.adjustment.sva=TRUE)
#LOOK FOR IMMUNE CELL CONTENT
rnb.options(differential.comparison.columns.all.pairwise = c("Histology","Stage_sum"))
rnb.options(export.types=c("sites"))
rnb.options()
rnb.getOption("assembly")
rnb.options("differential.enrichment.go"=FALSE)
rnb.options("differential.enrichment.lola"=FALSE)
rnb.options(export.to.csv=TRUE)

rnb.options(differential.comparison.columns="Histology", columns.pairing=c("Histology"="Patient_ID"))
gc()

analysis <- rnb.run.analysis(dir.reports=report.dir,initialize.reports=TRUE, sample.sheet=sample.annotation, data.source=bed.dir, data.type="bs.bed.dir")

I thought that maybe you can see something that I am not seeing right now. This is the end bit of the .log file:

2023-09-01 16:44:01    47.0  STATUS         COMPLETED Methylation Value Distributions - Site Categories
2023-09-01 16:44:01    47.0  STATUS         STARTED Sample Clustering
2023-09-01 16:44:01    47.0  STATUS             STARTED Agglomerative Hierarchical Clustering
2023-09-01 16:45:04    49.7  STATUS                 Performed clustering on sites using correlation as a distance metric
2023-09-01 16:46:18    49.7  STATUS                 Performed clustering on sites using manhattan as a distance metric
2023-09-01 16:47:32    49.7  STATUS                 Performed clustering on sites using euclidean as a distance metric
2023-09-01 16:47:36    49.7  STATUS                 Performed clustering on tiling using correlation as a distance metric
2023-09-01 16:47:40    49.7  STATUS                 Performed clustering on tiling using manhattan as a distance metric
2023-09-01 16:47:45    49.7  STATUS                 Performed clustering on tiling using euclidean as a distance metric
2023-09-01 16:47:46    49.7  STATUS                 Performed clustering on genes using correlation as a distance metric
2023-09-01 16:47:46    49.7  STATUS                 Performed clustering on genes using manhattan as a distance metric
2023-09-01 16:47:47    49.7  STATUS                 Performed clustering on genes using euclidean as a distance metric
2023-09-01 16:47:47    49.7  STATUS                 Performed clustering on promoters using correlation as a distance metric
2023-09-01 16:47:48    49.7  STATUS                 Performed clustering on promoters using manhattan as a distance metric
2023-09-01 16:47:48    49.7  STATUS                 Performed clustering on promoters using euclidean as a distance metric
2023-09-01 16:47:49    49.7  STATUS                 Performed clustering on cpgislands using correlation as a distance metric
2023-09-01 16:47:49    49.7  STATUS                 Performed clustering on cpgislands using manhattan as a distance metric
2023-09-01 16:47:50    49.7  STATUS                 Performed clustering on cpgislands using euclidean as a distance metric
2023-09-01 16:47:50    49.7  STATUS             COMPLETED Agglomerative Hierarchical Clustering
2023-09-01 16:47:52    51.6  STATUS             STARTED Clustering Section
2023-09-01 16:47:55    52.7  STATUS                 STARTED Generating Heatmaps
2023-09-01 16:47:56    52.7  STATUS                     STARTED Region type: sites
2023-09-01 16:57:02    50.0  STATUS                     COMPLETED Region type: sites
2023-09-01 16:57:02    50.0  STATUS                     STARTED Region type: tiling
2023-09-01 17:06:06    46.9  STATUS                     COMPLETED Region type: tiling
2023-09-01 17:06:06    46.9  STATUS                     STARTED Region type: genes
2023-09-01 17:15:30    46.9  STATUS                     COMPLETED Region type: genes
2023-09-01 17:15:30    46.9  STATUS                     STARTED Region type: promoters
2023-09-01 17:24:43    46.9  STATUS                     COMPLETED Region type: promoters
2023-09-01 17:24:43    46.9  STATUS                     STARTED Region type: cpgislands
2023-09-01 17:34:01    46.9  STATUS                     COMPLETED Region type: cpgislands
2023-09-01 17:34:01    46.9  STATUS                     Created 540 heatmaps based on the clustering results
2023-09-01 17:34:01    46.9  STATUS                 COMPLETED Generating Heatmaps
2023-09-01 17:34:01    46.9  STATUS                 STARTED Adding Color Legends
2023-09-01 17:38:30    46.9  STATUS                 COMPLETED Adding Color Legends
2023-09-01 17:38:30    46.9  STATUS                 STARTED Estimating Optimal Numbers of Clusters
2023-09-01 17:39:27    46.9  STATUS                     Estimated number of clusters based on mean silhouette value
2023-09-01 17:39:27    46.9  STATUS                 COMPLETED Estimating Optimal Numbers of Clusters
2023-09-01 17:39:27    46.9  STATUS                 STARTED Overlapping Clusters with Sample Traits
2023-09-01 17:39:27    46.9  STATUS                     Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_1.csv
2023-09-01 17:39:27    46.9  STATUS                     Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_2.csv
2023-09-01 17:39:27    46.9  STATUS                     Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_3.csv
2023-09-01 17:39:27    46.9  STATUS                     Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_4.csv
2023-09-01 17:39:27    46.9  STATUS                     Computed adjusted rand indices and saved to exploratory_analysis_data/adjusted_rand_indices_5.csv
2023-09-01 17:40:23    46.9  STATUS                 COMPLETED Overlapping Clusters with Sample Traits
2023-09-01 17:40:23    46.9  STATUS             COMPLETED Clustering Section
2023-09-01 17:40:23    46.9  STATUS         COMPLETED Sample Clustering
2023-09-01 17:40:23    46.9  STATUS     COMPLETED Exploratory Analysis
2023-09-01 17:40:24    46.9    INFO     Initialized report index and saved to index.html
2023-09-01 17:40:24    46.9  STATUS     STARTED Differential Methylation
2023-09-01 17:40:24    46.9    INFO         Number of cores: 1
2023-09-01 17:40:24    46.9  STATUS         STARTED Analysis
2023-09-01 17:40:24    46.9    INFO             Using 0 permutation tests
2023-09-01 17:40:24    46.9    INFO             Using columns: Histology
2023-09-01 17:40:24    46.9    INFO             Using region types: tiling,genes,promoters,cpgislands
2023-09-01 17:40:24    46.9  STATUS             STARTED Retrieving comparison info

I tried attaching the whole file but I cannot - however I'd be happy to copy the contents if someone believes there would be something relevant in there. I have checked all of it though and I had no errors whatsoever and everything seemed to be running smoothly. I have the .html for quality control, exploratory analysis, etc., but when it arrives to differential methylation, it halts.

Since it says "not all names are present in the annotation", this is what makes me think that maybe it is a naming convention issue, but I really cannot see what is the issue with it, so I'd appreciate suggestions on how to figure out this issue!

Error in rnb.sample.groups(x, columns = pheno.cols, columns.pairs = columns.pairs) :
          invalid value for columns.pairs; not all names are present in the annotation

Thanks,
Alejandra

rnbeads RNA-seq methylation • 361 views

ADD COMMENT • link updated 7 months ago by Ram 43k • written 7 months ago by aroso491 • 0