How to analyze genotype data and imputed data in GWAS study
2
0
Entering edit mode
2.4 years ago
Đạt • 0

Dear users, I am new to GWAS study, I have a concern about genetic association procedures. After having real genotyped data from the experiment ( we use the array for genotyping), we need to impute the genotype data. However, I am quite confused that how to do genetic association for the imputed data. I read a paper in which they use different packages in R to analyze the genetic association with real genotyped data and imputed genotyped data. Other people say they combine real genotyped data and imputed genotype data and base on some criteria to filter the imputed SNP with low quality (like infor score to filter the SNP) Could anyone can share your experience about this? How do you often do in this case? Thank you very much.

data analysis association GWAS imputed • 1.1k views
ADD COMMENT
0
Entering edit mode
2.4 years ago
Sam ★ 4.7k

Assuming your imputed data is in bgen format, then you can just use software such as Regenie, SAIGE, Bolt-LMM or even PLINK-2 directly to get the association results (after performing QCs such as filtering by info score)

ADD COMMENT
0
Entering edit mode
2.4 years ago
Olifabu • 0

You have to know first why you need to impute your data. After computing the rate of missing data, especially for GWAS, many scientists prefer to filter their genotype data by 5% MAF and 20% missing data. These thresholds can be different depending on data and what really you want to achieve. Then, for especially GWAS, you need to impute the remaining missing data for the best of models you will use for association analyses and in order to have homogeneity in your data. If you are working with species with available HapMap reference then will be easy to impute your data using Beagle or other related tools. Or if you could create your own reference will be cool. There are pretty good tutorials available here on how you can do this. Also, Tassel could help you to impute your data if you don't have a reference. Mixing imputed and unimputed data seems to be more customizable than standardizable. Probably doing this could depend on the data that you have. In essence, imputing your data means guessing the missing data by using a certain model. Having your data imputed doesn't mean you have the real data, though you can test the imputation accuracy. For this reason, you will need to use some rigorous thresholds to remove spurious associations during your post GWAS analysis. Good luck!

ADD COMMENT

Login before adding your answer.

Traffic: 1987 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6