Question

Analyse some data data from Nimblegen Microarray 12x135K (CGH)

0

Entering edit mode

9.0 years ago

jp ▴ 120

Dear all,

I am using Oligo and pdInfoBuilder R packages (on Ubuntu) to analyse some data data from Nimblegen Microarray 12x135K (CGH).

I am trying to create an "NgsTilingPDInfoPkgSeed" in R in the following manner

> seed <- new("NgsTilingPDInfoPkgSeed", ndfFile="/data/100718_HG18_WG_CGH_v3.1_HX12.ndf", xysFile="/data/500819A02_2011-07-25_11-39_532.autosomes.xys", posFile="/data/100718_HG18_WG_CGH_v3.1_HX12.pos", author="Boris Johnson", email="bj@brexit.com", biocViews="AnnotationData", genomebuild="HG 18", organism="Human", species="Homo Sapiens", chipName="100718_HG18_WG_CGH_v3.1_HX12")
> makePdInfoPackage(seed, destDir="/data/", quiet=F)

This gives me the following error:

=======================================================================================
Building annotation package for Nimblegen Tiling Array
NDF: 100718_HG18_WG_CGH_v3.1_HX12.ndf
POS: 100718_HG18_WG_CGH_v3.1_HX12.pos
XYS: 500819A02_2011-07-25_11-39_532.autosomes.xys
=======================================================================================
Parsing file: 100718_HG18_WG_CGH_v3.1_HX12.ndf... OK
Parsing file: 100718_HG18_WG_CGH_v3.1_HX12.pos... OK
Merging NDF and POS files... OK
Parsing file: 500819A02_2011-07-25_11-39_532.autosomes.xys... OK
Error in .local(x, ...) : no rows to aggregate
>

I get the xys file using the script found https://stat.ethz.ch/pipermail/bioconductor/2013-June/053378.html - I pick one the _532 one, as we have 532 and 635 for each run. a brief sample of each file looks like

The .ndf file

==> 100718_HG18_WG_CGH_v3.1_HX12.ndf <==
PROBE_DESIGN_ID CONTAINER   DESIGN_NOTE SELECTION_CRITERIA  SEQ_ID  PROBE_SEQUENCE  MISMATCH    MATCH_INDEX FEATURE_ID  ROW_NUM COL_NUM PROBE_CLASS PROBE_ID    POSITION    DESIGN_ID   X   Y
535113_0001_0001    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   TAGCGTTGCTTAGGCGTACGCAGTCTGATGCGTCGTTAGCATCGGCAANNNNNNNNNN  0   14538087    14538087    1   1   control:crosshybe   XENOTRACK48P01  1   535113  1   1
535113_0001_0002    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540272    14540272    2   1   control:empty   EMPTY   0   535113  1   2
535113_0001_0003    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   CAAGCCGCGAGATAACGGCGATAACCGTATCACAGATCTCGAGTCTGGNNNNNNNN    0   14540271    14540271    3   1   control:crosshybe   XENOTRACK48P13  13  535113  1   3
535113_0001_0004    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540270    14540270    4   1   control:empty   EMPTY   0   535113  1   4
535113_0001_0005    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   GCGCCTAAGCGACGGTATTAGTCATCCATCGTAGTCGATAGGTCGCGANNNNNNNN    0   14540269    14540269    5   1   control:crosshybe   XENOTRACK48P12  12  535113  1   5
535113_0001_0006    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540268    14540268    6   1   control:empty   EMPTY   0   535113  1   6
535113_0001_0007    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   CAAGAGCCGACTATATAAGGCGCGGCGCAGTAGCGTAACCGGTGTATTNNNNNN  0   14540267    14540267    7   1   control:crosshybe   XENOTRACK48P11  11  535113  1   7
535113_0001_0008    NGS_CONTROLS    EMPTY   dark    EMPTY   N   0   14540266    14540266    8   1   control:empty   EMPTY   0   535113  1   8
535113_0001_0009    NGS_CONTROLS    crosshhybe  bright  CROSSHYBE   CGCGCGTGAGTATATATGCACTGCGGCCGTATATTATAAGGCGCCACGNNNNNNNNN   0   14540265    14540265    9   1   control:crosshybe   XENOTRACK48P10  10  535113  1   9

and the .pos file:

==> 100718_HG18_WG_CGH_v3.1_HX12.pos <==
PROBE_ID    SEQ_ID  CHROMOSOME  POSITION    COUNT   LENGTH  GC
CHR01FS000037196    chr1:1-247249719    chr1    37196   3   60  0.33
CHR01FS000052308    chr1:1-247249719    chr1    52308   3   60  0.38
CHR01FS000357503    chr1:1-247249719    chr1    357503  9   60  0.47
CHR01FS000443361    chr1:1-247249719    chr1    443361  6   60  0.48
CHR01FS000530358    chr1:1-247249719    chr1    530358  6   60  0.43
CHR01FS000547649    chr1:1-247249719    chr1    547649  10  60  0.45
CHR01FS000560348    chr1:1-247249719    chr1    560348  10  60  0.37
CHR01FS000580411    chr1:1-247249719    chr1    580411  9   60  0.37
CHR01FS000614850    chr1:1-247249719    chr1    614850  9   60  0.40

and the (translated) .xys file

==> 500819A02_2011-07-25_11-39_532.autosomes.xys <==
# software=NimbleScan   version=2.5.26  imagefile=E:\Data\NimbleScan\Analysis\01 Sep 2011\500819A02_2011-07-25_11-39_532.tif    designfile=E:\Data\NimbleScan\Design files\100718_HG18_WG_CGH_v3.1_HX12_HX12\100718_HG18_WG_CGH_v3.1_HX12.ndf   designname=100718_HG18_WG_CGH_v3.1_HX12 designid=535113 date=Thu Sep 01 09:32:37 BST 2011   border=0    ul_x=240.016    ul_y=259.240    ur_x=3351.253   ur_y=272.318    lr_x=3335.857   lr_y=4580.855   ll_x=224.218    ll_y=4567.531   score=0.116 qcscore=1.939   locallyaligned=no   correctAstig=no Knots=  auto=no
X   Y   SIGNAL  COUNT
135 3   5895.12 1
137 3   9601.78 1
139 3   13201.43    1
141 3   11655.82    1
143 3   13352.55    1
145 3   10366.75    1
147 3   7971.31 1
149 3   6923.84 1

Note that there are overlaps between the ndf file SEQ_ID and the pos file SEQ_ID columns. I am saying this because I noticed that traceback() gives:

> traceback()
10: stop("no rows to aggregate")
9: .local(x, ...)
8: aggregate(as.data.frame(x), by = by, FUN = FUN, ..., simplify = simplify)
7: aggregate(as.data.frame(x), by = by, FUN = FUN, ..., simplify = simplify)
6: .local(x, ...)
5: aggregate(ndfdata[["POSITION"]], by = list(SEQ_ID = ndfdata[["SEQ_ID"]]), 
       min)
4: aggregate(ndfdata[["POSITION"]], by = list(SEQ_ID = ndfdata[["SEQ_ID"]]), 
       min)
3: parseNgsTrio(object@ndfFile, object@posFile, object@xysFile, 
       verbose = !quiet)
2: makePdInfoPackage(seed, destDir = "/data/", 
       quiet = F)
1: makePdInfoPackage(seed, destDir = "/data/", 
       quiet = F)

Here is my sessionInfo()

> sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pdInfoBuilder_1.34.1 oligo_1.34.2         Biostrings_2.38.4    XVector_0.10.0       IRanges_2.4.8        S4Vectors_0.8.11    
 [7] oligoClasses_1.32.0  affxparser_1.42.0    RSQLite_1.0.0        DBI_0.4-1            Biobase_2.30.0       BiocGenerics_0.16.1 

loaded via a namespace (and not attached):
 [1] GenomicRanges_1.22.4       splines_3.2.3              zlibbioc_1.16.0            bit_1.1-12                 foreach_1.4.3             
 [6] GenomeInfoDb_1.6.3         tools_3.2.3                SummarizedExperiment_1.0.2 ff_2.2-13                  iterators_1.0.8           
[11] preprocessCore_1.32.0      affyio_1.40.0              codetools_0.2-14           BiocInstaller_1.20.3

Any idea what is the issue here? We are interested in CNV analysis (deletions) - does Oligo support this?

I understand this is a rather length post, mostly due to the many parts involved. I'd be glad to hear of other people's experience analysing Nimblegen CGH data - or possibly other pipelines. This machine/data is relatively old, and the world moved on so I am a bit stuck.

Thanks, JP

CGH Nimblegen oligo pdinfobuilder microarray • 2.2k views

ADD COMMENT • link 9.0 years ago by jp ▴ 120