Question: Analyse some data data from Nimblegen Microarray 12x135K (CGH)
0
gravatar for jp
3.4 years ago by
jp100
jp100 wrote:

Dear all,

I am using Oligo and pdInfoBuilder R packages (on Ubuntu) to analyse some data data from Nimblegen Microarray 12x135K (CGH).

I am trying to create an "NgsTilingPDInfoPkgSeed" in R in the following manner

> seed <- new("NgsTilingPDInfoPkgSeed", ndfFile="/data/100718_HG18_WG_CGH_v3.1_HX12.ndf", xysFile="/data/500819A02_2011-07-25_11-39_532.autosomes.xys", posFile="/data/100718_HG18_WG_CGH_v3.1_HX12.pos", author="Boris Johnson", email="bj@brexit.com", biocViews="AnnotationData", genomebuild="HG 18", organism="Human", species="Homo Sapiens", chipName="100718_HG18_WG_CGH_v3.1_HX12")
> makePdInfoPackage(seed, destDir="/data/", quiet=F)

This gives me the following error:

=======================================================================================
Building annotation package for Nimblegen Tiling Array
NDF: 100718_HG18_WG_CGH_v3.1_HX12.ndf
POS: 100718_HG18_WG_CGH_v3.1_HX12.pos
XYS: 500819A02_2011-07-25_11-39_532.autosomes.xys
=======================================================================================
Parsing file: 100718_HG18_WG_CGH_v3.1_HX12.ndf... OK
Parsing file: 100718_HG18_WG_CGH_v3.1_HX12.pos... OK
Merging NDF and POS files... OK
Parsing file: 500819A02_2011-07-25_11-39_532.autosomes.xys... OK
Error in .local(x, ...) : no rows to aggregate
>

I get the xys file using the script found https://stat.ethz.ch/pipermail/bioconductor/2013-June/053378.html - I pick one the _532 one, as we have 532 and 635 for each run. a brief sample of each file looks like

The .ndf file

==> 100718_HG18_WG_CGH_v3.1_HX12.ndf <==
PROBE_DESIGN_ID CONTAINER   DESIGN_NOTE SELECTION_CRITERIA  SEQ_ID  PROBE_SEQUENCE  MISMATCH    MATCH_INDEX FEATURE_ID  ROW_NUM COL_NUM PROBE_CLASS PROBE_ID    POSITION    DESIGN_ID   X   Y
535113_0001_0001    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   TAGCGTTGCTTAGGCGTACGCAGTCTGATGCGTCGTTAGCATCGGCAANNNNNNNNNN  0   14538087    14538087    1   1   control:crosshybe   XENOTRACK48P01  1   535113  1   1
535113_0001_0002    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540272    14540272    2   1   control:empty   EMPTY   0   535113  1   2
535113_0001_0003    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   CAAGCCGCGAGATAACGGCGATAACCGTATCACAGATCTCGAGTCTGGNNNNNNNN    0   14540271    14540271    3   1   control:crosshybe   XENOTRACK48P13  13  535113  1   3
535113_0001_0004    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540270    14540270    4   1   control:empty   EMPTY   0   535113  1   4
535113_0001_0005    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   GCGCCTAAGCGACGGTATTAGTCATCCATCGTAGTCGATAGGTCGCGANNNNNNNN    0   14540269    14540269    5   1   control:crosshybe   XENOTRACK48P12  12  535113  1   5
535113_0001_0006    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540268    14540268    6   1   control:empty   EMPTY   0   535113  1   6
535113_0001_0007    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   CAAGAGCCGACTATATAAGGCGCGGCGCAGTAGCGTAACCGGTGTATTNNNNNN  0   14540267    14540267    7   1   control:crosshybe   XENOTRACK48P11  11  535113  1   7
535113_0001_0008    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540266    14540266    8   1   control:empty   EMPTY   0   535113  1   8
535113_0001_0009    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   CGCGCGTGAGTATATATGCACTGCGGCCGTATATTATAAGGCGCCACGNNNNNNNNN   0   14540265    14540265    9   1   control:crosshybe   XENOTRACK48P10  10  535113  1   9

and the .pos file:

==> 100718_HG18_WG_CGH_v3.1_HX12.pos <==
PROBE_ID    SEQ_ID  CHROMOSOME  POSITION    COUNT   LENGTH  GC
CHR01FS000037196    chr1:1-247249719    chr1    37196   3   60  0.33
CHR01FS000052308    chr1:1-247249719    chr1    52308   3   60  0.38
CHR01FS000357503    chr1:1-247249719    chr1    357503  9   60  0.47
CHR01FS000443361    chr1:1-247249719    chr1    443361  6   60  0.48
CHR01FS000530358    chr1:1-247249719    chr1    530358  6   60  0.43
CHR01FS000547649    chr1:1-247249719    chr1    547649  10  60  0.45
CHR01FS000560348    chr1:1-247249719    chr1    560348  10  60  0.37
CHR01FS000580411    chr1:1-247249719    chr1    580411  9   60  0.37
CHR01FS000614850    chr1:1-247249719    chr1    614850  9   60  0.40

and the (translated) .xys file

==> 500819A02_2011-07-25_11-39_532.autosomes.xys <==
# software=NimbleScan   version=2.5.26  imagefile=E:\Data\NimbleScan\Analysis\01 Sep 2011\500819A02_2011-07-25_11-39_532.tif    designfile=E:\Data\NimbleScan\Design files\100718_HG18_WG_CGH_v3.1_HX12_HX12\100718_HG18_WG_CGH_v3.1_HX12.ndf   designname=100718_HG18_WG_CGH_v3.1_HX12 designid=535113 date=Thu Sep 01 09:32:37 BST 2011   border=0    ul_x=240.016    ul_y=259.240    ur_x=3351.253   ur_y=272.318    lr_x=3335.857   lr_y=4580.855   ll_x=224.218    ll_y=4567.531   score=0.116 qcscore=1.939   locallyaligned=no   correctAstig=no Knots=  auto=no
X   Y   SIGNAL  COUNT
135 3   5895.12 1
137 3   9601.78 1
139 3   13201.43    1
141 3   11655.82    1
143 3   13352.55    1
145 3   10366.75    1
147 3   7971.31 1
149 3   6923.84 1

Note that there are overlaps between the ndf file SEQ_ID and the pos file SEQ_ID columns. I am saying this because I noticed that traceback() gives:

> traceback()
10: stop("no rows to aggregate")
9: .local(x, ...)
8: aggregate(as.data.frame(x), by = by, FUN = FUN, ..., simplify = simplify)
7: aggregate(as.data.frame(x), by = by, FUN = FUN, ..., simplify = simplify)
6: .local(x, ...)
5: aggregate(ndfdata[["POSITION"]], by = list(SEQ_ID = ndfdata[["SEQ_ID"]]), 
       min)
4: aggregate(ndfdata[["POSITION"]], by = list(SEQ_ID = ndfdata[["SEQ_ID"]]), 
       min)
3: parseNgsTrio(object@ndfFile, object@posFile, object@xysFile, 
       verbose = !quiet)
2: makePdInfoPackage(seed, destDir = "/data/", 
       quiet = F)
1: makePdInfoPackage(seed, destDir = "/data/", 
       quiet = F)

Here is my sessionInfo()

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pdInfoBuilder_1.34.1 oligo_1.34.2         Biostrings_2.38.4    XVector_0.10.0       IRanges_2.4.8        S4Vectors_0.8.11    
 [7] oligoClasses_1.32.0  affxparser_1.42.0    RSQLite_1.0.0        DBI_0.4-1            Biobase_2.30.0       BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
 [1] GenomicRanges_1.22.4       splines_3.2.3              zlibbioc_1.16.0            bit_1.1-12                 foreach_1.4.3             
 [6] GenomeInfoDb_1.6.3         tools_3.2.3                SummarizedExperiment_1.0.2 ff_2.2-13                  iterators_1.0.8           
[11] preprocessCore_1.32.0      affyio_1.40.0              codetools_0.2-14           BiocInstaller_1.20.3

Any idea what is the issue here? We are interested in CNV analysis (deletions) - does Oligo support this?

I understand this is a rather length post, mostly due to the many parts involved. I'd be glad to hear of other people's experience analysing Nimblegen CGH data - or possibly other pipelines. This machine/data is relatively old, and the world moved on so I am a bit stuck.

Thanks, JP

ADD COMMENTlink written 3.4 years ago by jp100
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2297 users visited in the last hour