Question

michigan imputation server

2

Entering edit mode

6.6 years ago

Teresa ▴ 20

Hi, I performed imputation on my GWAS data using Michigan imputation server. Now I have two output files: 1).dose.vcf.gz and 2).info.gz

Michigan imputation server use mimimac3 (--format GT,DS,GP) and in the output file ".dose.vcf.gz" are present all the three formats. I'm new on this kind of analysis, so I need some advice to manage and anayze this data.

I used DosageConvertor to convert them into PLINK dosage format using this command:

$DosageConvertor --vcfDose myfile.vcf.gz --info myfile.info.gz --prefix out_myfile --type plink --format 1

output files: 1) .dosage and 2) .fam.

To perform quality control steps on the imputed data, is this command right?

fcgene --dosage out_myfile.dosage --fam out_myfile.fam --filter-snp crate=0.9,hwe=1e-6,maf=0.05 --filter-indiv crate=0.9 --rsq 0.3 --oformat plink-dosage --out myfile_QC

conversion michigan imputation server • 9.9k views

ADD COMMENT • link updated 2.9 years ago by Fazil • 0 • written 6.6 years ago by Teresa ▴ 20

0

Entering edit mode

Have you looked through the manual under 'Chapter 4 - Quality control and format conversion of plink-formatted data'? - see page 20 of http://www.bx.psu.edu/~giardine/tests/tmp/fcgene-1.0.7.pdf

Your filtering criteria per SNP are

Call-rate, 90%
HWE p value, 0.000001
MAF, 5%

Your filtering criteria per sample is:

Call-rate, 90%

For the --rsq parameter, you have to take note as it's not what would be typically expected for such a study. The manual states:

Option - -rsq 0.3 will filter those SNPs which have Rsq value smaller or equal to 0.3 and change the format of only those SNPs which have Rsq ≥ 0.3. Rsq is a measure which estimates the squared correlation between imputed and true genotypes

ADD REPLY • link 6.6 years ago by Kevin Blighe 88k

0

Entering edit mode

I would like to work with dosage genotypes rather than hard call genotypes. It is really difficult for me to find a way to obtain a final data with dosage genotypes (calculated in the imputation process performed by Michigan imputation server) and at the same time taking into account QC steps pe- marker and per- individual and the quality of imputation.

It seems that fcgene is able to perform QC steps on dosage data but the option ---rsq does not work on my input file. Rsq is an indicator of the quality of imputation so I think is really important to consider this parameter. But I dont't know how.

Nevertheless, I found a way to work with hard call genotype (even if it is not what I wanted in the first place). Below the code I used:

gunzip myfile.info.gz
plink --vcf myfile.dose.vcf.gz --make-bed --double-id --out s1

QC steps using PLINK

plink --bfile file_finale --maf 0.01 --hwe 0.000001 --snps-only --make-bed --out qc1
plink --bfile s1 --bmerge s1 --merge-mode 6
plink --bfile s1 --exclude plink.missnp --make-bed --out s2
plink --bfile s2 --list-duplicate-vars
plink --bfile s2 --exclude plink.dupvar --make-bed --out s3
plink --bfile s3 --qual-scores myfile.info 7 1 1 --qual-threshold 0.3 --make-bed --out s4
plink --bfile s4 --maf 0.01 --hwe 0.000001 --snps-only --make-bed --out s5
plink --bfile s5 --mind 0.10 --geno 0.10 --make-bed --out FINAL_DATA_QC

Is there a way to perform this steps on the dosage data?

ADD REPLY • link 6.6 years ago by Teresa ▴ 20

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @Kevin's answer.

ADD REPLY • link 6.6 years ago by GenoMax 146k

0

Entering edit mode

Hi Teresa,

I was wondering if you have noticed that there is a discrepancy in the number of variants in the s4.bim file and when it is loaded for the next step to generate s5?

Here is the example:

plink --bfile s3 --qual-scores myfile.info 7 1 1 --qual-threshold 0.3 --make-bed --out s4

..................................................................
1238706 variants and 7326 people pass filters and QC.
Note: No phenotypes present.
--make-bed to s4.bed + s4.bim + s4.fam ... done.

plink --bfile s4 --maf 0.01 --hwe 0.000001 --snps-only --make-bed --out s5
...................................................
1133252 variants loaded from .bim file.

I couldn't figure out where are the remaining variants (a total of 105,454). I have tried in 2 other different chromosomes and I noticed the same issue.

Was wondering if anyone had similar issue?

ADD REPLY • link updated 6.5 years ago by GenoMax 146k • written 6.5 years ago by lw537 ▴ 10

0

Entering edit mode

Hi Teresa, I am interested in if you have found a way to solve the problem with the dosage data.

I wonder what did you do with your DosageConvertor output file? How did you QC on that? Thanks!

ADD REPLY • link 5.9 years ago by Molly_K ▴ 60

0

Entering edit mode

Hello Molly and Teresa, I am wondering whether you have figured out how to perform QC steps on the dosage data received after using the DosageConvertor tool? In case you have succeeded please do let me know. Thank you !!

ADD REPLY • link 5.4 years ago by dnyanada.gokhale ▴ 10

0

Entering edit mode

Can you confirm that you have first contacted Michigan Imputation Server for help, please?

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k

0

Entering edit mode

Hi,

I have imputed my data using minimac4 on the Michigan imputation server. Is Dosage convertor the only way to produce plink files? I have tried doing this using plink --vcf but it provides an odd output like this (where 2nd column has chr:pos:A1:A2;snp) - basically an invalid bim file.

22      22:16053843:G:A;rs181029838     0       16053843        A       G
22      22:16054839:G:A;rs73877820      0       16054839        A       G
22      22:16055207:C:T;rs7291810       0       16055207        T       C
22      22:16055230:T:C;rs140593956     0       16055230        C       T
22      22:16055965:G:T;rs587706951     0       16055965        T       G

I would really appreciate any advice.

ADD REPLY • link updated 4.2 years ago by GenoMax 146k • written 4.2 years ago by kl ▴ 10

0

Entering edit mode

Please show all commands used, please. Also confirm that you have first contacted Michigan Imputation Server.

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k

0

Entering edit mode

DosageConverter says nothing about producing a bim file - where did you read that? https://genome.sph.umich.edu/wiki/DosageConvertor#Convert_to_PLINK_Files

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k

0

Entering edit mode

hi! Teresa, i imputed my data with michigan server imputation, and i would like to ask that if the fcgene could work to convert the plink dosage data to plink bim bed data, if not, could you please give me some advice on it. great thanks!

ADD REPLY • link 4.2 years ago by qiong1012 • 0

0

Entering edit mode

Hi, plink --vcf looks like normally working, but in case if you have bim file with markers like 22:16053843:G:A;rs181029838 you can use unix one liner: The command awk -F " " '{print $1,$3,$4,$5,$6,$2}' your_bim_file.bim | awk -F ";" '{print $1,$2}' | awk -F " " '{print $1,$7,$2,$3,$4,$5}' > new_bim_file.bim will keep only rs number and
awk -F " " '{print $1,$3,$4,$5,$6,$2}' your_bim_file.bim | awk -F ";" '{print $1,$2}' | awk -F " " '{print $1,$6,$2,$3,$4,$5}' > new_bim_file.bim will keep only 22:16053843:G:A parts as markers. You can replaces your .bim file with the new file.

ADD REPLY • link 2.9 years ago by Fazil • 0