Calculate disease risk based on the genoytype of some SNPs?
0
3
Entering edit mode
4.6 years ago
Miguel ▴ 30

Hi everybody,

I am new in bioinformatics, and I don't understnad many concepts and routines. I'm learning by myself looking for information, and I find this community... I would like to know how I could calculate the risk of a disease in a person knowing the genotype of several SNPs related to this disease. For example, I know the genotype configuration of a person in 22 SNPs related to thrombophilia. How can I determine if this person is in low, moderate or high risk of suffering trombo events? I read about PRS and GWAS, but I am not sure that tools help me in the question above. And I haven't found a good tutorial that explain step by step the procedure of PRS and GWAS. All I have read is not for beginners...

Thank you for your help.

SNP prs gwas PRSice Plink • 3.1k views
3
Entering edit mode

Imputing the combined risk from multiple genotypes based on published data on individual genotype risks is a bit of a holy grail but kudos to your ambition. Consider a systems biology approach and see a recent Nature article

0
Entering edit mode

I though that's what polygenic risk score or GWAS calculate... And if the risk of a SNP for a disease can be estimated matematically, why can't several SNPs risk be calculated as well? I have several SNPs related to a disease, and I would like to know the risk score for that person to have events related to that disease (low, moderate, high) in base to those SNPs. An example of data: SNP CHR Allele1 Allele2 rs1234 2 A A rs5678 19 G C rs9012 X T A rs3456 6 C C

Thank you for your help.

0
Entering edit mode

'polygenic risk score' is actually just a generic term, and there are many ways to generate a 'risk score' from multiple genotypes. The algorithms that I have seen (and including my own algorithm) start with the beta coefficients, which are obtained once you fit a regression model to your data. As you are a beginner, you may try the PRS that is in-built with PLINK - I believe there is one, no?

0
Entering edit mode

I don't know whether Plink has a built-in PRS. That's what I'm trying to find out, but I don't find a tutorial for beginners...

Thank you for your help.

0
Entering edit mode

Take a look at PRSice: http://www.prsice.info/ The developers of both PRSice and PLINK are active on Biostars.

Perhaps consider adding both of these as tags to your question.

0
Entering edit mode

I took a look to PRSice, but I didn't understand much. It's not very helpful for beginners, in my opinion.... I will add some more tags to my question. Let's see if someone could help me to understand better the basics concepts of PRS.

Thank you for your help.

2
Entering edit mode

Hi, I'd suggest you to first read our guide paper which has layout some of the challenges and problems of PRS. As for the PRSice tutorial, have you try following our step-by-step tutorial? We are also trying to construct an independent tutorial for the guide paper, which you can find here. However, please note that it is still under construction. Do feel free to let us know if you found anything unclear or problematic. Good luck

0
Entering edit mode

Thanks Sam! Was waiting for you to arrive :)

0
Entering edit mode

1
Entering edit mode

Mind letting us know which part of the document do you find to be too complex? One of the main goal of PRSice and our paper is to explain the basics of PRS to people who would like to perform PRS analysis. While we try to make the instructions as simple as possible, our background might sometimes make us blind to problems that new users might find difficult. It will therefore be great if we can know which part of the tutorial or the documentation are too difficult or are unclear so that we can improve upon. Thanks

0
Entering edit mode

Thank you for your patience. Regards.

1
Entering edit mode

In that case, at least for the topic of PRS, I guess the closest you will get will be this tutorial we made for the guide paper. The problem with PRS analysis is that it is a slightly advance additional analysis based on GWAS data. Without knowledge on the basics of GWAS, it will be very difficult to understand the ideas behind PRS. And as GWAS itself is a rather complicated topic, it'd be a bit too much to include in a single guide. A good starting point might be this paper.

Given the background, it might be best for you to study how GWAS were performed, the statistics behind the GWAS analysis and the assumptions, before you jump into PRS analysis. Good luck!

0
Entering edit mode

Thank you again for your answer. I only wanted to calculate a genetic risk score based on the genotype of several SNPs related to a disease. I agree that it would be much easier to perform a PRS analysis if you know GWAS, but going on with my previous example, I am able to change the air, fuel and oil filters of my car without having mechanics knwoledgement, because someone taught me how to change them. And by the way, you didn't answer the questions above, overall that related to the filetypes needed by PRSice (only GWAS?, what other formats?). I don't need luck, I only need a good tool to calculate the risk and someone willing to help me in an easy way... Regards.

1
Entering edit mode

The summary statistics file are geneated from GWAS studies. Without performing a GWAS, you won't have an estimation of each SNP's effect size. GWAS file can come with many formats (there isn't any standard), but to perform PRS analysis, you will need at least the following columns: SNP_ID, P-value, Effect size, Effective allele

The thing is, as with most bioinformatic analysis, it is usually a bad idea to simply follow a tutorial and run the analysis without the background knowledge. A lot of stuff in bioinformatics are work in progress, for example, models have their own hypothesis and assumptions. Without understanding the problem, and without acquiring the background knowledges, you will very likely misinterpret the findings. For example, if you don't know what's a effect size from a GWAS, how can you understand the polygenic score model, which is the weighted sum of effect size? And if you don't know what's polygenic score, how can you interpret the result? So while it might be good to have a full detail guide for a program to teach you how to perform a analysis, it is vital for you to understand the background.

Finally, just so you know, the PRSice release comes with Toy Data, which allows you to follow the PRSice tutorial. Our tutorial also provide detail description of the expected file format though we don't go into how you obtain those file (performing GWAS).

0
Entering edit mode

Hi Sam, and thanks again both for your help and explanations. The problem with GWAS is that I have samples but I don't have any controls to compare with. I would need a tutorial or a step-bystep guide related to GWAS. I would like to learn, but I don't find a good tutorial....

1
Entering edit mode

For binary traits, you cannot perform a GWAS without the samples.

In a GWAS, you are trying to find out whether a SNP is more likely to be observed in the case when compared to the controls. Without the controls, you cannot perform a GWAS.

In this case, even if you can perform the GWAS with your samples, unless you have additional independent samples, you cannot perform PRS, as PRS require the genotype samples to be independent from the GWAS samples, otherwise it will lead to invalid results.

If you really want to learn, I'd suggest you google "GWAS tutorial". There are a lot of tutorials and even videos available online.

0
Entering edit mode

Ok, thank you again... Do you know any tool that could perform a genetic risk score in an easier way? I will take a look to GWAS tutorial on internet or in videos.

0
Entering edit mode

The easiest tool is PRSice. Then there're lassosum, LDpred, PRS-CS and if you want, you can use plink. The tutorial I posted should contain most of the info

0
Entering edit mode

Thank you again for all your help and your patience. Regards.

0
Entering edit mode

You clearly state that you are a beginner, but the project that you are aiming to do is somewhat advanced, at least from my perspective. Are you at least familiar with regression analysis, and know how to conduct this in the context of genetic variants? What data do you currently have (paste an example here)?

0
Entering edit mode

I'm a beginner, but I have tried with Plink. The files I have are those generated with Plink, both 2 files and 3 files. I'm not familiar neither with regression analysis nor with how to conduct this in the context of genetic variants.

An example of data: SNP CHR Allele1 Allele2 rs1234 2 A A rs5678 19 G C rs9012 X T A rs3456 6 C C

Those SNPs are related to a disease. I woul like to calculate the risk of that person to suffer that disease (a risk score).

Thank you for your help.