Question

Diffbind- results different every time I run it

0

Entering edit mode

4.1 years ago

francesca3 ▴ 140

Hi everyone. I'm new at Diffbind. I'm trying to analyze my data but I noticed that if I try to run the same analysis more times on the same data (restarting from the first line of code), the results change. How is it possible? I clean the environment everytime because initially I thought that it could be a cache memory problem. Any hint?

This is the code samplesdbprova<-read.csv("gruppisenza1421.csv"), dbObjprova <- dba(sampleSheet=samplesdbprova), dbObjprova <- dba.count(dbObjprova,bUseSummarizeOverlaps=TRUE, minOverlap=2), contrastprova <- dba.contrast(dbObjprova, dbObjprova$masks$SIRT630W, dbObjprova$masks$W30,"SIRT630w", "wt30"), bObjprova <- dba.analyze(contrastprova, method=DBA_ALL_METHODS)

In this case I uploaded a matrix which has 18 samples divided in 4 conditions. For the diff analysis I selected just two of the four conditions (the analysis was carried out on 6 samples vs 5 samples).

I tried also to prepare a matrix for each analysis but the problems still are present.

Sometimes an other error came out , but also if there isn't this error the results change time by time [W::hts_idx_load2] The index file is older than the data file [E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes

This is the information about my Rstudio version

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763) Matrix products: default locale: [1] LC_COLLATE=Italian_Italy.1252 LC_CTYPE=Italian_Italy.1252 LC_MONETARY=Italian_Italy.1252 LC_NUMERIC=C [5] LC_TIME=Italian_Italy.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_3.6.1 tools_3.6.1

diffbind software error • 1.2k views

ADD COMMENT • link updated 4.0 years ago by Michael 54k • written 4.1 years ago by francesca3 ▴ 140

score 0 · Answer 1 · 2020-05-06

Some algorithms, specifically those that use pseudo-random generators, e.g. for initialization or sampling will produce different results each time. I don't see a problem with that per se if it is essential for the algorithm. Sometimes setting the random seed will produce reproducible results. See the function set.seed. E.g. if you use set.seed(12345) before each run, the results may be identical for each run, iff the code doesn't use anything other than R-internal pseudo-random code (assuming seeding is properly implemented in all of these) and it doesn't mess with set.seed itself.