Question: From genotype raw data .idat to PLINK files
0
gravatar for Armand
6.3 years ago by
Armand20
Spain
Armand20 wrote:

Dear all,

I have several raw data (exome genotyping) :   *_Red.idat   *_Grn.idat 

.. and also the illumina data mapping, a file with this columns :

"Family ID","Individual ID","Sample ID","Genotyping Chip Barcode","Genotyping Chip Type","Final Report Name","Sex","Study Role","Birth Year Month"

.... 

(where Genotyping Chip Barcode is something like "4252475888_A" and Genotyping Chip Type like "1M-Duov3")

I have different platforms, but now I am focused the data from 1M-Duov3)

I would like to generate the PLINK file. I am using the crlmm R package in order to try to get, at least, the .ped plink genotype file. I am figuring out how to launch successfully genotype.Illumina function.

I am following : http://master.bioconductor.org/packages/release/bioc/manuals/crlmm/man/crlmm.pdf

cnSet <- genotype.Illumina(sampleSheet=samplesheet_subset,
                             arrayNames=samplesheet_subset$Sample.ID,
                             path=datadir,
                             arrayInfoColNames=samplesheet[wh_array_name_pos,"Genotyping.Chip.Barcode"],
                             cdfName="human1mduov3b",
                             batch=rep("1", nrow(samplesheet_subset)))

     It seems that cdfName according to 1M-Duov3 should be "human1mduov3b".

     samplesheet_subset a subset data.frame illumina data mapping file with a subset of .idat files. (I am using 38           samples -parents, probands, sibiling, ..)

     arrayNames I don't know what it reefers to... (I try to pass the different sample ID  samplesheet_subset$Sample.ID)

     batch following the example ... (the number of rows of samplesheet_subset)

When I launch, I got this error :

"

Instantiate CNSet container.
Error en constructInf(sampleSheet = sampleSheet, arrayNames = arrayNames,  : 
  Missing some of the *Grn.idat files

"

But I think that all the *.idat files are there ...(*_R01C01_Grn.idat, *_R01C02_Grn.idat, *_R01C01_Red.idat,*_R01C02_Red.idat)

[... and I suppose that every .idat file contain variouse samples ..]

Thanks for your help,

Cheers,

sequencing genotype R • 5.2k views
ADD COMMENTlink modified 7 months ago by freeseek90 • written 6.3 years ago by Armand20

I encounter the same error.

Have you manage to solve that?

ADD REPLYlink written 4.5 years ago by nadne40
0
gravatar for eva.gradovich
15 months ago by
eva.gradovich0 wrote:

Had the same issue, make sure your files have a .idat extension and are not gzip-compressed. And that the path directory begins with a / and is input as a string (with " "). Worked for me

ADD COMMENTlink modified 15 months ago • written 15 months ago by eva.gradovich0
0
gravatar for freeseek
7 months ago by
freeseek90
freeseek90 wrote:

If instead of using CRLMM you are okay with using Illumina proprietary GenCall algorithm to generate GTC files out of IDAT file, there are now two approaches:

(i) using the Illumina Array Analysis Platform

(ii) using the Illumina Beeline/AutoConvert software

I describe how to use either approach on Linux here

You can use my own bcftools plugin gtc2vcf to convert GTC files to VCF

Then it is easy to convert a VCF file to PLINK format using best practices

ADD COMMENTlink written 7 months ago by freeseek90
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1689 users visited in the last hour