Question: Reading Genotyping Data From Illumina Genomestudio Into R
11
gravatar for Farrel
8.6 years ago by
Farrel170
Pittsburgh, PA, USA
Farrel170 wrote:

We have recently conducted 1.1 million snp/cnv genotyping on a sample of subjects using the infinium assay. The data is currently a project within Illumina GenomeStudio. I have imported some columns containing pedigree, affected status, race and ethnicity to the project but I also have that data in a separate table.

How do I read the data from genomestudio into R? Are you aware of any published examples or case vignettes?

Is beadarraySNP the package to use?

The data seems to be stored in a directory with the following files.

tabledat.bin, pairtable.bin, seqdata.bin, sd.bin, heredity.bin, Duplicates.bin, projdat.bin, PairedData.bin, ad.bin, ld.bin

How does one go from those files to reading the data into R?

I want to end up with a dataframe that has as many rows as I have subjects and as many columns as I have snp markers + cnv markers + pedigree fields + phenotype fields

ADD COMMENTlink written 8.6 years ago by Farrel170
3
gravatar for Jan Oosting
8.6 years ago by
Jan Oosting860
Leiden, NL
Jan Oosting860 wrote:

To import your data into R with beadarraySNP, you'll have to create a report from Genomestudio through the report wizard

  • From the Analysis menu choose Reports > Report wizard...
  • Now choose Final report
  • Select the sampples you want included, click next
  • Choose the 'Standard' radiobutton on top, and the 'Tab' radiobutton in General options
  • Now you can select the fields you want in your report. At the very least beadarraySNP requires the 'SNP Name' and 'Sample ID' fields. Read the 'BeadStudio Data' section of the read.SnpSetIllumina() man page to get all options.
  • Check the 'Create MAP files' to get a head start on creating a sample sheet
  • Click Next and Finish to create the report files

The data can now be read into R with a command like

myData<-read.SnpSetIllumina(Sample_Map2Samplesheet("Sample_Map.txt"),reportfile="myData_FinalReport.txt")

Do not forget to add the nochecks=TRUE when you did not put all required fields in your report.

Data columns are put in matrices in the assayData slot of the resulting object, while annotation fields are put in the featureData slot of the object.

ADD COMMENTlink written 8.6 years ago by Jan Oosting860
2
gravatar for Neilfws
8.6 years ago by
Neilfws48k
Sydney, Australia
Neilfws48k wrote:

As Jan says, R/Bioconductor works best with the reports exported from Illumina's proprietary "Studio" software. There are very few (if any) options for processing raw, binary data files directly using R.

I recently made some notes about Illumina and Bioconductor packages on our (internal) wiki. I've pasted them below, almost "as is" - maybe you can glean something from them. In summary: the best approach is to export from Illumina software to text files and import to R using read.table().

beadarray

  • reads bead-level or bead-summary data
    • bead-summary requires at minimum the file SampleProbeProfile.txt
    • data files are generated by Illumina BeadStudio software (gene expression module)
    • method readBeadSummaryData() creates ExpressionSetIllumina object
    • bead-level requires txt/csv files and optionally, TIFFs, targets.txt, annotation and metrics files
    • these are generated by Illumina BeadScan software
    • method readIllumina() creates BeadLevelList object

crlmm

  • reads binary idat files from the Illumina scanner (+ a CSV description file)
  • method readIdatFiles() creates NChannelSet object

lumi

  • reads "the Illumina raw data output of the Illumina Bead Studio toolkit from version 1 to version 3"
  • the "probe profile" output is preferred
  • method lumiR() creates a LumiBatch object

beadarraySNP

read.SnpSetIllumina() method notes:

BeadStudio Data

  • To process experiments that were processed with BeadStudio, only two files are needed; the sample sheet and the Final Report file
  • The sample sheet must contain the same columns as for GenCall, the report file should contain the following columns: ‘SNP Name’, ‘Sample ID’, ‘GC Score’, ‘Allele1 - AB’, ‘Allele2 - AB’, ‘GT Score’, ‘X Raw’, and ‘Y Raw’
  • ‘SNP Name’ and ‘Sample ID’ are used to form rows and columns in the experimental data, ‘GC Score’ is put in the callProbability matrix, ‘Allele1 - AB’ and ‘Allele2 - AB’ are combined into the call matrix, ‘GT Score’ is added to the featureData slot, ‘X Raw’ is put in the R matrix and ‘Y Raw’ in the G matrix.
  • Other columns in the report file are added as matrices in the assayData slot, or columns in the featureData slot if values are identical for all samples in the reportfile
ADD COMMENTlink written 8.6 years ago by Neilfws48k

Or convert it to PLINK then handel with GenABEL, etc: Converting illumina raw genotype data into PLINK PED format

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by gaow0
1
gravatar for Daniel Swan
8.6 years ago by
Daniel Swan13k
Aberdeen, UK
Daniel Swan13k wrote:

crlmm? http://ukpmc.ac.uk/articles/PMC2752620

ADD COMMENTlink written 8.6 years ago by Daniel Swan13k

This works with the data that was created by scan studio, not Genome Studio

ADD REPLYlink written 8.6 years ago by Jan Oosting860

Didn't know that Jan, cheers.

ADD REPLYlink written 8.6 years ago by Daniel Swan13k
1
gravatar for Abc
8.4 years ago by
Abc10
Abc10 wrote:

The Bonsai report-plug-in allows GenomeStudio to export data directly as Rdata suitable for the bioconductor package snpMatrix. There are other goodies on the sourceforge web site http://outmodedbonsai.sourceforge.net/ also. The author is apparently working on CNV analysis lately, and had managed to run GenomeStudio on linux. Don't know how it is done though.

ADD COMMENTlink written 8.4 years ago by Abc10
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 833 users visited in the last hour