Dear all,
I am using Oligo and pdInfoBuilder R packages (on Ubuntu) to analyse some data data from Nimblegen Microarray 12x135K (CGH).
I am trying to create an "NgsTilingPDInfoPkgSeed" in R in the following manner
> seed <- new("NgsTilingPDInfoPkgSeed", ndfFile="/data/100718_HG18_WG_CGH_v3.1_HX12.ndf", xysFile="/data/500819A02_2011-07-25_11-39_532.autosomes.xys", posFile="/data/100718_HG18_WG_CGH_v3.1_HX12.pos", author="Boris Johnson", email="bj@brexit.com", biocViews="AnnotationData", genomebuild="HG 18", organism="Human", species="Homo Sapiens", chipName="100718_HG18_WG_CGH_v3.1_HX12")
> makePdInfoPackage(seed, destDir="/data/", quiet=F)
This gives me the following error:
=======================================================================================
Building annotation package for Nimblegen Tiling Array
NDF: 100718_HG18_WG_CGH_v3.1_HX12.ndf
POS: 100718_HG18_WG_CGH_v3.1_HX12.pos
XYS: 500819A02_2011-07-25_11-39_532.autosomes.xys
=======================================================================================
Parsing file: 100718_HG18_WG_CGH_v3.1_HX12.ndf... OK
Parsing file: 100718_HG18_WG_CGH_v3.1_HX12.pos... OK
Merging NDF and POS files... OK
Parsing file: 500819A02_2011-07-25_11-39_532.autosomes.xys... OK
Error in .local(x, ...) : no rows to aggregate
>
I get the xys file using the script found https://stat.ethz.ch/pipermail/bioconductor/2013-June/053378.html - I pick one the _532 one, as we have 532 and 635 for each run. a brief sample of each file looks like
The .ndf file
==> 100718_HG18_WG_CGH_v3.1_HX12.ndf <==
PROBE_DESIGN_ID CONTAINER DESIGN_NOTE SELECTION_CRITERIA SEQ_ID PROBE_SEQUENCE MISMATCH MATCH_INDEX FEATURE_ID ROW_NUM COL_NUM PROBE_CLASS PROBE_ID POSITION DESIGN_ID X Y
535113_0001_0001 NGS_CONTROLS crosshhybe bright CROSSHYBE TAGCGTTGCTTAGGCGTACGCAGTCTGATGCGTCGTTAGCATCGGCAANNNNNNNNNN 0 14538087 14538087 1 1 control:crosshybe XENOTRACK48P01 1 535113 1 1
535113_0001_0002 NGS_CONTROLS EMPTY dark EMPTY N 0 14540272 14540272 2 1 control:empty EMPTY 0 535113 1 2
535113_0001_0003 NGS_CONTROLS crosshhybe bright CROSSHYBE CAAGCCGCGAGATAACGGCGATAACCGTATCACAGATCTCGAGTCTGGNNNNNNNN 0 14540271 14540271 3 1 control:crosshybe XENOTRACK48P13 13 535113 1 3
535113_0001_0004 NGS_CONTROLS EMPTY dark EMPTY N 0 14540270 14540270 4 1 control:empty EMPTY 0 535113 1 4
535113_0001_0005 NGS_CONTROLS crosshhybe bright CROSSHYBE GCGCCTAAGCGACGGTATTAGTCATCCATCGTAGTCGATAGGTCGCGANNNNNNNN 0 14540269 14540269 5 1 control:crosshybe XENOTRACK48P12 12 535113 1 5
535113_0001_0006 NGS_CONTROLS EMPTY dark EMPTY N 0 14540268 14540268 6 1 control:empty EMPTY 0 535113 1 6
535113_0001_0007 NGS_CONTROLS crosshhybe bright CROSSHYBE CAAGAGCCGACTATATAAGGCGCGGCGCAGTAGCGTAACCGGTGTATTNNNNNN 0 14540267 14540267 7 1 control:crosshybe XENOTRACK48P11 11 535113 1 7
535113_0001_0008 NGS_CONTROLS EMPTY dark EMPTY N 0 14540266 14540266 8 1 control:empty EMPTY 0 535113 1 8
535113_0001_0009 NGS_CONTROLS crosshhybe bright CROSSHYBE CGCGCGTGAGTATATATGCACTGCGGCCGTATATTATAAGGCGCCACGNNNNNNNNN 0 14540265 14540265 9 1 control:crosshybe XENOTRACK48P10 10 535113 1 9
and the .pos file:
==> 100718_HG18_WG_CGH_v3.1_HX12.pos <==
PROBE_ID SEQ_ID CHROMOSOME POSITION COUNT LENGTH GC
CHR01FS000037196 chr1:1-247249719 chr1 37196 3 60 0.33
CHR01FS000052308 chr1:1-247249719 chr1 52308 3 60 0.38
CHR01FS000357503 chr1:1-247249719 chr1 357503 9 60 0.47
CHR01FS000443361 chr1:1-247249719 chr1 443361 6 60 0.48
CHR01FS000530358 chr1:1-247249719 chr1 530358 6 60 0.43
CHR01FS000547649 chr1:1-247249719 chr1 547649 10 60 0.45
CHR01FS000560348 chr1:1-247249719 chr1 560348 10 60 0.37
CHR01FS000580411 chr1:1-247249719 chr1 580411 9 60 0.37
CHR01FS000614850 chr1:1-247249719 chr1 614850 9 60 0.40
and the (translated) .xys file
==> 500819A02_2011-07-25_11-39_532.autosomes.xys <==
# software=NimbleScan version=2.5.26 imagefile=E:\Data\NimbleScan\Analysis\01 Sep 2011\500819A02_2011-07-25_11-39_532.tif designfile=E:\Data\NimbleScan\Design files\100718_HG18_WG_CGH_v3.1_HX12_HX12\100718_HG18_WG_CGH_v3.1_HX12.ndf designname=100718_HG18_WG_CGH_v3.1_HX12 designid=535113 date=Thu Sep 01 09:32:37 BST 2011 border=0 ul_x=240.016 ul_y=259.240 ur_x=3351.253 ur_y=272.318 lr_x=3335.857 lr_y=4580.855 ll_x=224.218 ll_y=4567.531 score=0.116 qcscore=1.939 locallyaligned=no correctAstig=no Knots= auto=no
X Y SIGNAL COUNT
135 3 5895.12 1
137 3 9601.78 1
139 3 13201.43 1
141 3 11655.82 1
143 3 13352.55 1
145 3 10366.75 1
147 3 7971.31 1
149 3 6923.84 1
Note that there are overlaps between the ndf file SEQ_ID and the pos file SEQ_ID columns. I am saying this because I noticed that traceback() gives:
> traceback()
10: stop("no rows to aggregate")
9: .local(x, ...)
8: aggregate(as.data.frame(x), by = by, FUN = FUN, ..., simplify = simplify)
7: aggregate(as.data.frame(x), by = by, FUN = FUN, ..., simplify = simplify)
6: .local(x, ...)
5: aggregate(ndfdata[["POSITION"]], by = list(SEQ_ID = ndfdata[["SEQ_ID"]]),
min)
4: aggregate(ndfdata[["POSITION"]], by = list(SEQ_ID = ndfdata[["SEQ_ID"]]),
min)
3: parseNgsTrio(object@ndfFile, object@posFile, object@xysFile,
verbose = !quiet)
2: makePdInfoPackage(seed, destDir = "/data/",
quiet = F)
1: makePdInfoPackage(seed, destDir = "/data/",
quiet = F)
Here is my sessionInfo()
> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
[6] LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] pdInfoBuilder_1.34.1 oligo_1.34.2 Biostrings_2.38.4 XVector_0.10.0 IRanges_2.4.8 S4Vectors_0.8.11
[7] oligoClasses_1.32.0 affxparser_1.42.0 RSQLite_1.0.0 DBI_0.4-1 Biobase_2.30.0 BiocGenerics_0.16.1
loaded via a namespace (and not attached):
[1] GenomicRanges_1.22.4 splines_3.2.3 zlibbioc_1.16.0 bit_1.1-12 foreach_1.4.3
[6] GenomeInfoDb_1.6.3 tools_3.2.3 SummarizedExperiment_1.0.2 ff_2.2-13 iterators_1.0.8
[11] preprocessCore_1.32.0 affyio_1.40.0 codetools_0.2-14 BiocInstaller_1.20.3
Any idea what is the issue here? We are interested in CNV analysis (deletions) - does Oligo support this?
I understand this is a rather length post, mostly due to the many parts involved. I'd be glad to hear of other people's experience analysing Nimblegen CGH data - or possibly other pipelines. This machine/data is relatively old, and the world moved on so I am a bit stuck.
Thanks, JP