Question: Converting Illumina Raw Genotype Data Into Plink Ped Format
2
gravatar for P.NJ
9.0 years ago by
P.NJ50
Germany
P.NJ50 wrote:

Hello,

I have the "FinalReport.txt" for Illumina raw genotype data generated from Genome Bead Studio for 2.5M (GSGT Version 1.8.4 ).

For my further analysis, I would like to convert this into PLINK format preferably.

Is there any way of doing this ? I would appreciate any suggestions.

Thank you.

ADD COMMENTlink modified 4.0 years ago by forever80 • written 9.0 years ago by P.NJ50
2
gravatar for Wen.Huang
9.0 years ago by
Wen.Huang1.2k
Wen.Huang1.2k wrote:

you may also consider the .lgen format, which is just taking the first few columns of the FinalReport. Plink has an option to read the .lgen format and convert it to PED file.

ADD COMMENTlink written 9.0 years ago by Wen.Huang1.2k
2
gravatar for P.NJ
9.0 years ago by
P.NJ50
Germany
P.NJ50 wrote:

Thank you for your suggestions. But I tried the -lgen format which did not work for me. I tried to create .map, .fam and .lgen file but when I try to run it,the resulting output for my .ped file contains

0 sample_name 0 0 2 1 0 0 0 0 0 ......

I have pasted some of the info from my file formats, maybe you could tell me if I am going wrong somewhere.

.map

chr# SNPName GeneticDistance bp units

24 GA008510 0 11771305

24 GA008524 0 19612089

.fam

Family_ID Individual_Name Paternal_ID Maternal_ID Sex Phenotype

0 sample_name 0 0 2 1

.lgen

Sample_Index Sample_ID SNP_Name Allele1Fwd Allele2Fwd

0 5528_C01 GA008510 C C

0 5528_C01 GA008524 T T

and then I try to run

plink --lfile test --recode

Any clue as to where I am going wrong ?

ADD COMMENTlink modified 9.0 years ago • written 9.0 years ago by P.NJ50

in your .fam file, did you actually put "5528_C01" as your sample name or you just put "sample_name"? I just ran a test and it worked for me.

ADD REPLYlink written 9.0 years ago by Wen.Huang1.2k

ahh, okay, I had not changed that... thank you very much. its worked for me as well.

ADD REPLYlink written 9.0 years ago by P.NJ50

I have the same issue and I try to get my ped file from lgen but I get the below error:

Error: Duplicate ID '100 100'.

I appreciate any help.

C:\Users\fadhl\Dropbox\plink_win64>plink --lfile Plate3_final_report --recode PLINK v1.90b3.41 64-bit (10 Sep 2016) https://www.cog-genomics.org/plink2 (C) 2005-2016 Shaun Purcell, Christopher Chang GNU General Public License v3 Logging to plink.log. Options in effect: --lfile Plate3_final_report --recode

16322 MB RAM detected; reserving 8161 MB for main workspace. Error: Duplicate ID '100 100'.

ADD REPLYlink written 4.0 years ago by forever80
1
gravatar for Biomed
9.0 years ago by
Biomed4.7k
Bethesda, MD, USA
Biomed4.7k wrote:

Genome Studio has an export module that creates Plink input files from your SNP data. http://www.illumina.com/Documents/products/technotes/technote_cnv_algorithms.pdf

ADD COMMENTlink written 9.0 years ago by Biomed4.7k

You can find the module here and it's super easy to export to plink format.

ADD REPLYlink written 4.1 years ago by nadne40
0
gravatar for P.NJ
9.0 years ago by
P.NJ50
Germany
P.NJ50 wrote:

I would like to know if I am converting Illumina 2.5M array into plink, as the output should I not get approx 2-2.5M of SNPs ? I am getting approx 1M SNPs... does this happen ?

ADD COMMENTlink written 9.0 years ago by P.NJ50
0
gravatar for forever
4.0 years ago by
forever80
forever80 wrote:

My final report header looks like: [Header] GSGT Version 1.9.4 Processing Date 8/1/2012 2:35 PM Content Cardio-Metabo_Chip_11395247_A.bpm Num SNPs 196725 Total SNPs 196725 Num Samples 60 Total Samples 60 [Data] SNP Name Sample ID Allele1 - Top Allele2 - Top GC Score SNP chr1:109457160 2 C C 0.8609 [T/G]

I do not have SNP_map file? How did you do it?

ADD COMMENTlink written 4.0 years ago by forever80
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1732 users visited in the last hour