Question: Formatting Beadstudio Final Report Into Plink
3
gravatar for romsen
7.0 years ago by
romsen60
romsen60 wrote:

Hello again

I ´have to convert Illumina HumanHap chip data into PLINK (PED file). I'll proceed as described here. But my generated ped file shows only 0 for each genotype. Plink is warning during the process:

[...] 50 males, 50 females, and 0 of unspecified sex

Before frequency and genotyping pruning, there are 1000000 SNPs

100 founders and 0 non-founders found

1000000 SNPs with no founder genotypes observed

Warning, MAF set to 0 for these SNPs (see --nonfounders)

Writing list of these SNPs to [ plink.nof ]

Total genotyping rate in remaining individuals is 0 [...]

fam-file:

    1    192    0    0    1    0
    2    193    0    0    2    0
    3    213    0    0    1    0
    4    214    0    0    1    0

map-file:

1    rs3934834    0    995669
1    rs3737728    0    1011278
1    rs6687776    0    1020428
1    rs9651273    0    1021403

lgen-file:

[Header]                
BSGT Version    3.0.27            
Processing Date        
Content        
Num SNPs    1000000            
Total SNPs    1000000            
Num Samples    100        
Total Samples    100            
[Data]                
Sample Index    Sample Name    SNP Name    Allele1     Allele2
1    192    rs10000010    A    G
2    193    rs10000010    A    G
3    213    rs10000010    A    G

My lgen file has a 10 row header then the data-rows are following. The information about the genotype is given by the forward alleles exportet via beadstudio (With Top Alleles the same sobering result)

After running plink to reconstruct ped file I get this ped file with missing genotypes:

1 192 0 0 1 -9 0 0 0 0 0 0 0 0 [...]
2 193 0 0 2 -9 0 0 0 0 0 0 0 0 [...]
3 213 0 0 1 -9 0 0 0 0 0 0 0 0 [...]

Perhaps one of you, find the mistake or have an idea to solve the problem. Do I need a reference file or is the title in the lgen-file the problem? Thank you very much.

illumina plink • 5.7k views
ADD COMMENTlink modified 5.2 years ago by lhvkl20 • written 7.0 years ago by romsen60
2
gravatar for Matt Shirley
7.0 years ago by
Matt Shirley9.1k
Cambridge, MA
Matt Shirley9.1k wrote:

I would try removing the header from your lgen file. The PLINK documentation gives an example without the header. If you are using Linux, try:

egrep '^[0-9]+' lgen-file > lgen-file.noheader
sed 's/-.+-$/0 0/g' lgen-file.noheader > lgen-file.noheader.missingalleles

This will remove the header, and then should replace all occurrences of "- -", which seems to be Illumina's notation for missing alleles, with "0 0", which seems to be PLINK's notation for missing alleles.

ADD COMMENTlink modified 7.0 years ago • written 7.0 years ago by Matt Shirley9.1k
0
gravatar for romsen
7.0 years ago by
romsen60
romsen60 wrote:

Perfect. Thank you. PLINK starts to work now but there is a new error. In my file there are too many Allels.

ERROR: Locus rs10000023 has >2 alleles:
       individual 12 070 has genotype [ - - ]
       but we've already seen [ T ] and [ G ]
ADD COMMENTlink written 7.0 years ago by romsen60

I've edited my answer to address the issue.

ADD REPLYlink written 7.0 years ago by Matt Shirley9.1k

nice. I had the same idea. I use windows therefore do you have a script for plink or perl?

ADD REPLYlink written 7.0 years ago by romsen60

If you can manage to open the file in a text editor and perform "find and replace" on all "-" to "0", I think that should work, otherwise, if you are going to do much bioinformatics work in Windows I would suggest installing and becoming familiar with Cygwin.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Matt Shirley9.1k

Unfortunatly it's to big. I can't open it in notepad.

ADD REPLYlink written 7.0 years ago by romsen60

As a slightly less intimidating alternative to installing Cygwin for sed functionality, you can probably use this blog post about Powershell.

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by Matt Shirley9.1k

Hehe, thanks i check this. Now I get it with perl.

perl -p -i.bak -e "~s|-|0|" file.lgen

ADD REPLYlink modified 7.0 years ago • written 7.0 years ago by romsen60
0
gravatar for lhvkl
5.2 years ago by
lhvkl20
United Kingdom
lhvkl20 wrote:

Hi.  I'm facing a similar issue to the above.  I have made a .ped file from a beadstudio report but my missing values are specified as "-" rather than 0.  The file is too big to find and replace using Nano and the above perl command replaces only the first occurence (in this case changing the phenotype specification "-9" to "09).  I'm not familiar with perl or command line operations and wondered if anyone could help?

ADD COMMENTlink written 5.2 years ago by lhvkl20

I've managed to get around the phenotype issue by using:

perl -p -i.bak -e "~s|- |0 |" file.lgen

But this is still only dealing with the first occurences in the ped file.

ADD REPLYlink written 5.2 years ago by lhvkl20

perl -p -i.bak -e "~s|- |0 |g" file.lgen

Fixes this for anyone encountering a similar problem.

ADD REPLYlink written 5.2 years ago by lhvkl20
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 680 users visited in the last hour