Question: Reading Genotyping Data From Illumina Genomestudio Into R
gravatar for Farrel
10.2 years ago by
Pittsburgh, PA, USA
Farrel200 wrote:

We have recently conducted 1.1 million snp/cnv genotyping on a sample of subjects using the infinium assay. The data is currently a project within Illumina GenomeStudio. I have imported some columns containing pedigree, affected status, race and ethnicity to the project but I also have that data in a separate table.

How do I read the data from genomestudio into R? Are you aware of any published examples or case vignettes?

Is beadarraySNP the package to use?

The data seems to be stored in a directory with the following files.

tabledat.bin, pairtable.bin, seqdata.bin, sd.bin, heredity.bin, Duplicates.bin, projdat.bin, PairedData.bin, ad.bin, ld.bin

How does one go from those files to reading the data into R?

I want to end up with a dataframe that has as many rows as I have subjects and as many columns as I have snp markers + cnv markers + pedigree fields + phenotype fields

ADD COMMENTlink written 10.2 years ago by Farrel200
gravatar for Jan Oosting
10.2 years ago by
Jan Oosting920
Leiden, NL
Jan Oosting920 wrote:

To import your data into R with beadarraySNP, you'll have to create a report from Genomestudio through the report wizard

  • From the Analysis menu choose Reports > Report wizard...
  • Now choose Final report
  • Select the samples you want included, click next
  • Choose the Standard radio-button on top, and the Tab radio-button in General options
  • Now you can select the fields you want in your report. At the very least beadarraySNP requires the SNP Name and Sample ID fields. Read the BeadStudio Data section of the read.SnpSetIllumina() man page to get all options.
  • Check the Create MAP files to get a head start on creating a sample sheet
  • Click Next and Finish to create the report files

The data can now be read into R with a command like

myData <-read.SnpSetIllumina(Sample_Map2Samplesheet("Sample_Map.txt"),reportfile="myData_FinalReport.txt")

Do not forget to add the nochecks=TRUE when you did not put all required fields in your report.

Data columns are put in matrices in the assayData slot of the resulting object, while annotation fields are put in the featureData slot of the object.

ADD COMMENTlink modified 17 months ago by RamRS30k • written 10.2 years ago by Jan Oosting920
gravatar for Neilfws
10.2 years ago by
Sydney, Australia
Neilfws49k wrote:

As Jan says, R/Bioconductor works best with the reports exported from Illumina's proprietary "Studio" software. There are very few (if any) options for processing raw, binary data files directly using R.

I recently made some notes about Illumina and Bioconductor packages on our (internal) wiki. I've pasted them below, almost "as is" - maybe you can glean something from them. In summary: the best approach is to export from Illumina software to text files and import to R using read.table().


  • reads bead-level or bead-summary data
    • bead-summary requires at minimum the file SampleProbeProfile.txt
    • data files are generated by Illumina BeadStudio software (gene expression module)
    • method readBeadSummaryData() creates ExpressionSetIllumina object
    • bead-level requires txt/csv files and optionally, TIFFs, targets.txt, annotation and metrics files
    • these are generated by Illumina BeadScan software
    • method readIllumina() creates BeadLevelList object


  • reads binary idat files from the Illumina scanner (+ a CSV description file)
  • method readIdatFiles() creates NChannelSet object


  • reads "the Illumina raw data output of the Illumina Bead Studio toolkit from version 1 to version 3"
  • the "probe profile" output is preferred
  • method lumiR() creates a LumiBatch object


read.SnpSetIllumina() method notes:

BeadStudio Data

  • To process experiments that were processed with BeadStudio, only two files are needed; the sample sheet and the Final Report file
  • The sample sheet must contain the same columns as for GenCall, the report file should contain the following columns: ‘SNP Name’, ‘Sample ID’, ‘GC Score’, ‘Allele1 - AB’, ‘Allele2 - AB’, ‘GT Score’, ‘X Raw’, and ‘Y Raw’
  • ‘SNP Name’ and ‘Sample ID’ are used to form rows and columns in the experimental data, ‘GC Score’ is put in the callProbability matrix, ‘Allele1 - AB’ and ‘Allele2 - AB’ are combined into the call matrix, ‘GT Score’ is added to the featureData slot, ‘X Raw’ is put in the R matrix and ‘Y Raw’ in the G matrix.
  • Other columns in the report file are added as matrices in the assayData slot, or columns in the featureData slot if values are identical for all samples in the reportfile
ADD COMMENTlink written 10.2 years ago by Neilfws49k

Or convert it to PLINK then handel with GenABEL, etc: Converting illumina raw genotype data into PLINK PED format

ADD REPLYlink modified 7.5 years ago • written 7.5 years ago by gaow0
gravatar for Daniel Swan
10.2 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:


ADD COMMENTlink modified 17 months ago by RamRS30k • written 10.2 years ago by Daniel Swan13k

This works with the data that was created by scan studio, not Genome Studio

ADD REPLYlink written 10.2 years ago by Jan Oosting920

Didn't know that Jan, cheers.

ADD REPLYlink written 10.2 years ago by Daniel Swan13k
gravatar for Abc
10.0 years ago by
Abc10 wrote:

The Bonsai report-plug-in allows GenomeStudio to export data directly as Rdata suitable for the bioconductor package snpMatrix. There are other goodies on the sourceforge web site also. The author is apparently working on CNV analysis lately, and had managed to run GenomeStudio on linux. Don't know how it is done though.

ADD COMMENTlink written 10.0 years ago by Abc10
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1950 users visited in the last hour