Question: GWAS analysis using PLINK
0
gravatar for ngsgene
4.3 years ago by
ngsgene350
United States
ngsgene350 wrote:

I have a GenomeStudio genotype file with missing genotypes denoted by "-"

Using this file I generated, for each chromosome the map, fam and lgen files and using the --recode option in plink converted them to ped format. To overcome the plink "Error: Locus has >2 alleles" I used the --missing-genotype option with the "-"

After ped files for each chromosome were successfully generated, there are a couple issues am facing:

My lgen file corresponds to the map file - but after recode the ped file has way more columns than the rows. I excpect the number of columns to be rows x 2 (both alleles) that of the map file.

When I try to merge all the chromosomes for evaluating summary statistics the "-" in the data doesn't seem to be excluded and continue to give errors.

Would convert all the "-" to 0 is the solution here? Am trying to understand how to exclude such data and best practices.

Thanks for any suggestions/feedback.

 

missing-genotype plink merge gwas • 3.1k views
ADD COMMENTlink modified 4.3 years ago by chrchang5234.9k • written 4.3 years ago by ngsgene350
1
gravatar for chrchang523
4.3 years ago by
chrchang5234.9k
United States
chrchang5234.9k wrote:

1. You probably want to use both "--missing-genotype -" and "--output-missing-genotype 0" during your conversion; this tells PLINK that the input fileset uses -, but you want the output fileset to use 0 so you don't have more headaches down the line.

2. Can you explain what you mean by the "ped file has way more columns than [you expected]"?  How many columns does it have?  How many rows does the map file have?

3. Is there any particular reason you are converting to .ped/.map instead of PLINK's preferred .bed/.bim/.fam format?

ADD COMMENTlink written 4.3 years ago by chrchang5234.9k

Thanks for your response chrchang523, will give "--output-missing-genotype 0" a try to get the format working.

The map files have various number of rows, pertaining to the number of SNPs in each chromosome, for example I have ~ 180000 for chr1, so I expect the ped file to have 180000 * 2 columns.

The only reason for .ped is to be able to see what data am generating, aim is to work with .bed/.bim format once the file formatting is taken care of

 

ADD REPLYlink written 4.3 years ago by ngsgene350

How many columns does the .ped actually have?

You might want to try converting to .tped/.tfam ("--recode --transpose") instead, that text format might be easier to read (and it's definitely more convenient for PLINK to work with).

ADD REPLYlink written 4.3 years ago by chrchang5234.9k

The --output-missing-genotype 0  option has helped replace all "-" to "0". But in either case the --merge option (using this to merge data from all chr) still reports an "ERROR: Problem with MAP file line:" there doesn't seem to be a way for me to track down which snp in particular is giving the issue as its reporting the first 6 columns for sample identifier and genotype info from the lgen file.

The .ped file now has ~180000 * 2 + 6 columns so that seems to have  been correctly generated. Thanks for tip on transpose, are there other pros transposing the data - or this a preferred file format? Plan to impute this using 1000 Genomes, none of the info on Shapeit/Impute2 has suggested a .tped file yet - but please let me know if you have experience with that.

ERROR: Problem with MAP file line:

0 ###-# 0 0 1 -9 G G A A A A C C C C A G A A C C C C G G A G C T T C A A C C G G A A T T A A C T C C A G G G C C C T T C T T T T T T A A C T C C C C G G G G G A T C C C C T A G G G C C A G G G A A A A G G A A T T T T T T G G A A C C C C C C G G G G A A C

ADD REPLYlink modified 4.3 years ago • written 4.3 years ago by ngsgene350

The "problematic MAP file line" is a properly formatted .ped file line.  Try swapping the order of the arguments you're passing to --merge.

.tped files have fewer columns than .ped files, so I find them easier to work with in a text editor.  If you're using --merge, though, .ped/.map lets you avoid an extra conversion step.

ADD REPLYlink written 4.3 years ago by chrchang5234.9k

Thanks chrchang523! I am able to merge the files successfully, seems the order of .map .ped in the file list was causing the issue. Take home msg: the order of the file list to be merged should be .ped .map / .bed .bim .fam

ADD REPLYlink written 4.3 years ago by ngsgene350
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1523 users visited in the last hour