Question: michigan imputation server
2
gravatar for Teresa
2.5 years ago by
Teresa20
United States
Teresa20 wrote:

Hi, I performed imputation on my GWAS data using Michigan imputation server. Now I have two output files: 1).dose.vcf.gz and 2).info.gz

Michigan imputation server use mimimac3 (--format GT,DS,GP) and in the output file ".dose.vcf.gz" are present all the three formats. I'm new on this kind of analysis, so I need some advice to manage and anayze this data.

I used DosageConvertor to convert them into PLINK dosage format using this command:

$DosageConvertor --vcfDose myfile.vcf.gz --info myfile.info.gz --prefix out_myfile --type plink --format 1

output files: 1) .dosage and 2) .fam.

To perform quality control steps on the imputed data, is this command right?

fcgene --dosage out_myfile.dosage --fam out_myfile.fam --filter-snp crate=0.9,hwe=1e-6,maf=0.05 --filter-indiv crate=0.9 --rsq 0.3 --oformat plink-dosage --out myfile_QC
ADD COMMENTlink modified 24 days ago by qiong10120 • written 2.5 years ago by Teresa20

Have you looked through the manual under 'Chapter 4 - Quality control and format conversion of plink-formatted data'? - see page 20 of http://www.bx.psu.edu/~giardine/tests/tmp/fcgene-1.0.7.pdf

Your filtering criteria per SNP are

  • Call-rate, 90%
  • HWE p value, 0.000001
  • MAF, 5%

Your filtering criteria per sample is:

  • Call-rate, 90%

For the --rsq parameter, you have to take note as it's not what would be typically expected for such a study. The manual states:

Option - -rsq 0.3 will filter those SNPs which have Rsq value smaller or equal to 0.3 and change the format of only those SNPs which have Rsq ≥ 0.3. Rsq is a measure which estimates the squared correlation between imputed and true genotypes

ADD REPLYlink written 2.5 years ago by Kevin Blighe63k

I would like to work with dosage genotypes rather than hard call genotypes. It is really difficult for me to find a way to obtain a final data with dosage genotypes (calculated in the imputation process performed by Michigan imputation server) and at the same time taking into account QC steps pe- marker and per- individual and the quality of imputation.

It seems that fcgene is able to perform QC steps on dosage data but the option ---rsq does not work on my input file. Rsq is an indicator of the quality of imputation so I think is really important to consider this parameter. But I dont't know how.

Nevertheless, I found a way to work with hard call genotype (even if it is not what I wanted in the first place). Below the code I used:

gunzip myfile.info.gz
plink --vcf myfile.dose.vcf.gz --make-bed --double-id --out s1

QC steps using PLINK

plink --bfile file_finale --maf 0.01 --hwe 0.000001 --snps-only --make-bed --out qc1
plink --bfile s1 --bmerge s1 --merge-mode 6
plink --bfile s1 --exclude plink.missnp --make-bed --out s2
plink --bfile s2 --list-duplicate-vars
plink --bfile s2 --exclude plink.dupvar --make-bed --out s3
plink --bfile s3 --qual-scores myfile.info 7 1 1 --qual-threshold 0.3 --make-bed --out s4
plink --bfile s4 --maf 0.01 --hwe 0.000001 --snps-only --make-bed --out s5
plink --bfile s5 --mind 0.10 --geno 0.10 --make-bed --out FINAL_DATA_QC

Is there a way to perform this steps on the dosage data?

ADD REPLYlink written 2.5 years ago by Teresa20

Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.

This comment belongs under @Kevin's answer.

ADD REPLYlink written 2.5 years ago by genomax87k

Hi Teresa,

I was wondering if you have noticed that there is a discrepancy in the number of variants in the s4.bim file and when it is loaded for the next step to generate s5?

Here is the example:

plink --bfile s3 --qual-scores myfile.info 7 1 1 --qual-threshold 0.3 --make-bed --out s4

..................................................................
1238706 variants and 7326 people pass filters and QC.
Note: No phenotypes present.
--make-bed to s4.bed + s4.bim + s4.fam ... done.

plink --bfile s4 --maf 0.01 --hwe 0.000001 --snps-only --make-bed --out s5
...................................................
1133252 variants loaded from .bim file.

I couldn't figure out where are the remaining variants (a total of 105,454). I have tried in 2 other different chromosomes and I noticed the same issue.

Was wondering if anyone had similar issue?

ADD REPLYlink modified 2.4 years ago by genomax87k • written 2.4 years ago by lw53710

Hi Teresa, I am interested in if you have found a way to solve the problem with the dosage data.

I wonder what did you do with your DosageConvertor output file? How did you QC on that? Thanks!

ADD REPLYlink written 21 months ago by Molly_K60

Hello Molly and Teresa, I am wondering whether you have figured out how to perform QC steps on the dosage data received after using the DosageConvertor tool? In case you have succeeded please do let me know. Thank you !!

ADD REPLYlink written 15 months ago by dnyanada.gokhale10

Can you confirm that you have first contacted Michigan Imputation Server for help, please?

ADD REPLYlink written 4 weeks ago by Kevin Blighe63k

Hi,

I have imputed my data using minimac4 on the Michigan imputation server. Is Dosage convertor the only way to produce plink files? I have tried doing this using plink --vcf but it provides an odd output like this (where 2nd column has chr:pos:A1:A2;snp) - basically an invalid bim file.

22      22:16053843:G:A;rs181029838     0       16053843        A       G
22      22:16054839:G:A;rs73877820      0       16054839        A       G
22      22:16055207:C:T;rs7291810       0       16055207        T       C
22      22:16055230:T:C;rs140593956     0       16055230        C       T
22      22:16055965:G:T;rs587706951     0       16055965        T       G

I would really appreciate any advice.

ADD REPLYlink modified 4 weeks ago by genomax87k • written 4 weeks ago by kl0

Please show all commands used, please. Also confirm that you have first contacted Michigan Imputation Server.

ADD REPLYlink written 4 weeks ago by Kevin Blighe63k

DosageConverter says nothing about producing a bim file - where did you read that? https://genome.sph.umich.edu/wiki/DosageConvertor#Convert_to_PLINK_Files

ADD REPLYlink written 4 weeks ago by Kevin Blighe63k

hi! Teresa, i imputed my data with michigan server imputation, and i would like to ask that if the fcgene could work to convert the plink dosage data to plink bim bed data, if not, could you please give me some advice on it. great thanks!

ADD REPLYlink written 24 days ago by qiong10120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1369 users visited in the last hour