Question: Association Tests Using Impute2 Output
2
gravatar for Khader Shameer
6.3 years ago by
Manhattan, NY
Khader Shameer18k wrote:

I have .gprobs, .metrics and .sample output file from IMPUTE2 and am trying to run association test using PLINK. I have uploaded first 5 lines of chromosome 12 here:

I have tried to run --dosage analysis using the .gprobs file and .sample to do the association test on a chromosome level.

But I am getting several warnings for:

- "Duplicate individual found"

and error:

- ERROR: Badly aligned columns for: SNP A1 A2

I have also tried to convert .gprobs and .sample to native ped and fam using gtools and tried the association test using PLINK but the output files also did not worked with --assoc command. I am wondering if there any file conversion required before taking IMPUTE2 output to PLINK or Do you recommend any other tool for association testing using IMPUTE2 output ?

PS. I have tried to ask this question(s) to IMPUTE2 mailing list, but they haven't approved me even after 24 hours after confirming my email.

gwas imputation plink • 8.2k views
ADD COMMENTlink modified 5.6 years ago by Kantale120 • written 6.3 years ago by Khader Shameer18k
4
gravatar for zx8754
6.3 years ago by
zx87548.4k
London
zx87548.4k wrote:

Following usually works for me:

Make a map file from Chr12_head_5.gprobs file (Note: 12 is in this case represents chr12):

awk '{print 12,$2,0,$3}' Chr12_head_5.gprobs > Chr12_head_5.map

Make a fam file from Chr12_head_5.sample file: remove 2 top rows, add fam file columns.

awk 'NR>2 {print $1,$2,0,0,$4,$5}' Chr12_head_5.sample > Chr12_head_5.fam

As there are 3 samples, I cut the gprobs file for those samples (Columns: chr,snp,bp,a1,a2, then 3 columns per individual representing AA, AB, BB for each snp):

--- 12-60076 60076 A C 0.603 0.346 0.050 0.171 0.506 0.323 0.248 0.659 0.094
--- 12-60252 60252 A G 0.989 0.011 0 0.935 0.065 0 0.898 0.101 0
--- 12-60317 60317 C T 0.998 0.002 0 1 0 0 0.991 0.009 0
--- 12-60474 60474 G A 0.987 0.013 0 0.923 0.076 0 0.848 0.149 0.003
--- 12-60628 60628 T C 0.996 0.004 0 1 0 0 0.985 0.015 0

Then run plink command:

plink \
--noweb \
--dosage Chr12_head_5.gprobs \
format=3 skip0=1 skip1=1 noheader \
--map Chr12_head_5.map \
--fam Chr12_head_5.fam \
--out Chr12_head_5

I run this commands to get "rough" associations, as --dosage doesn't accept --covar, --within options to correct for covariates and stratas. I then convert it to MACH (see Conversion of ped/map or bim/bim/ fam files to dosage for GWAs mit Probable and comparison with imputated genotypes) and run associations using R.

Regarding errors, first one means FID and IID is not unique, log file should show the duplicated individuals, second one is probably badly formatted headers on map file.

SNPTEST is supposed to work with IMPUTE output "seamlessly", but from my experience it doesn't and I avoid it.

ADD COMMENTlink modified 3.1 years ago • written 6.3 years ago by zx87548.4k

Thanks zx8754 for a great answer. Do you know whether the .gen format that you mentioned and .gprobs that I have are the same ?

ADD REPLYlink written 6.3 years ago by Khader Shameer18k
1

Not sure how .gprobs file looks like, but added how my .gen files look from IMPUTE2 output.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by zx87548.4k
1

This a first line from the file. I think its the same format. --- 16-60180 60180 G C 1 0 0 0.894 0.106 0 0.993 0.007 0 1 0 0 1 0 0 1 ...

ADD REPLYlink written 6.3 years ago by Khader Shameer18k
1

Google tells me that .gprobs is a BEAGLE output?

ADD REPLYlink written 6.3 years ago by zx87548.4k

True. It's a native BEAGLE format. I have downloaded this dataset from dbGAP, as per phenotype description the data is imputed using IMPUTE2, but the output file extension is given as "chromosome-specific genotype probabilities files".

ADD REPLYlink written 6.3 years ago by Khader Shameer18k
1

From your dropbox files, I created map and fam files and cut the gprobs file for 3 samples (as there were 3 samples in the .sample file), and --dosage did work.

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by zx87548.4k

That's great ! Can you please add that part also to your answer ?

ADD REPLYlink modified 6.3 years ago • written 6.3 years ago by Khader Shameer18k
1

Answer is updated, according to data provided.

ADD REPLYlink written 6.3 years ago by zx87548.4k
1

That's great ! In the meantime, I was able to run SNPTEST2 on my data-sets seamlessly - I will post a detailed reply here so that biostars with IMPUTE2 data could try both way.

ADD REPLYlink written 6.3 years ago by Khader Shameer18k
1
gravatar for Kantale
5.6 years ago by
Kantale120
Groningen, Netherlands
Kantale120 wrote:

I know that you said that you want to perform association analysis with plink, but I would recommend to try SNPTEST. The reason is that since you did the imputation with IMPUTE2, SNPTEST can process (relatively) nicely IMPUTE2 output files. You can use QCTOOL to convert IMPUTE2 output to SNPTEST input. Check this: https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html#input_file_formats 

 

ADD COMMENTlink written 5.6 years ago by Kantale120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1324 users visited in the last hour