Question: Alzheimer's disease and its classification
0
gravatar for Uday Rangaswamy
8 months ago by
Indian Institute of Technology, Madras, India
Uday Rangaswamy120 wrote:

I'm interning on an Alzheimer's disease project where I'm asked to build a classification model to classify the same. As of now, I have a dataset wherein its proven that the rs ids with a p-value less than 0.01 is sure to affect the gene expression for the disease and rs ids with p-value greater than 0.8 is considered healthy. So my question is, where can I find a dataset wherein I should be able to extract features like eQTLs, DNA stability, propensity value and build a classification model using the same. Any suggestions will be much appreciated. Thanks.

snp machine learning R • 422 views
ADD COMMENTlink modified 8 months ago • written 8 months ago by Uday Rangaswamy120
2

Have you tried GEO or dbGAP ? Also, why are you using a subset of IGAP only ? All of IGAP summary stats are here : IGAP

ADD REPLYlink modified 8 months ago • written 8 months ago by Nandini820

Thank you. Could you please elaborate this data? Are these the SNPs that influence Alzheimer's? If so, where can I find it's sequence to extract features out of it? Please help.

ADD REPLYlink written 8 months ago by Uday Rangaswamy120
1

SNPs that influence Alzheimer's

That kind of causal link is difficult to prove. Best you can hope for is a correlation between samples that have that variant and a particular diagnosis/marker for Alzheimer's.

ADD REPLYlink written 8 months ago by RamRS22k

Got you sir. Thanks a ton for your insight.

ADD REPLYlink written 8 months ago by Uday Rangaswamy120
1

all SNPs in IGAP identified to date are "susceptible loci" -> meaning there are mostly likely to be associated with a RISK of developing late onset Alzheimer's . The link in my comment above gives you the data or the summary statistics for IGAP. IGAP was a big study that constituted of many many groups across Europe and USA to collaborate and share summary stats together to perform meta analysis. One of them is ADGC as well. The link provided by you is only a subset of IGAP. I would suggest to read the IGAP paper (link is provided by Ram below in his comments) to try to understand how the analysis has been done and what conclusion the authors have drawn from it.

ADD REPLYlink written 8 months ago by Nandini820
1

try ADNI and AMP-AD

ADD REPLYlink written 8 months ago by cpad011211k

ADNI requires registration and I don't seem to find any SNP related datasets in AMP-AD. Thanks for your time.

ADD REPLYlink written 8 months ago by Uday Rangaswamy120

rs IDs with a p-value

Sorry, what? How did you obtain these p-values? And how can a p-value > 0.8 mean anything in any statistical test?

ADD REPLYlink written 8 months ago by RamRS22k

The dataset is in the following link : https://www.niagads.org/igap-summary-statistics-adgc-only

It is the result obtained after a certain experiment which is why they're able to say so. So ya, any dataset that you're aware of that could be of any help to me please?

ADD REPLYlink written 8 months ago by Uday Rangaswamy120

Can you please show me where your resource says that a p-value above a threshold signifies anything? A larger p-value only means one thing in statistics: "The odds you're seeing this by chance is pretty high", which means "your results are not statistically significant". No inference can be made from such a p-value.

EDIT: The only mention I see is in the IGAP paper:

The results from stages 1 and 2 and from the combined stage 1 and stage 2 data sets, which represent a secondary discovery effort, are shown in Table 2. With the exception of CD33 and DSG2, we nominally replicated all loci that surpassed the genome-wide significance level in stage 1. Inability to replicate DSG2 is not surprising, as evidence of association for this locus was based on data for a single SNP and was not supported by data from surrounding SNPs in linkage disequilibrium (LD, r2 > 0.8; Supplementary Fig. 7b)

Is this what you're referring to? If it is, I can't make the connection between a r2 value, which is a measure of correlation and a p-value threshold.

ADD REPLYlink modified 8 months ago • written 8 months ago by RamRS22k

I'll get back to you in a couple of days regarding this coz this is what I was told by my mentor. On the other hand, as for my understanding, I have a list of rs ids that influence alzheimer's. I need to extract a certain features out of it and build a classification model to classify whether a certain rs id falls within its class or not. How can I go ahead with this? Please help.

ADD REPLYlink written 8 months ago by Uday Rangaswamy120

The terms you use and the approach you speak about looks a lot like machine learning. For classification, you'd need a well annotated truth set for training. I'm not a Machine Learning expert, maybe someone else can help you with that.

ADD REPLYlink written 8 months ago by RamRS22k

Kindly refer to the paper of the above mentioned link.

ADD REPLYlink written 8 months ago by Uday Rangaswamy120

I think I can retrieve the data I need from an R package "rsnps". Could anybody tell me how I could select feature for the classification purpose please. Thanks for your time all of you :).

ADD REPLYlink written 8 months ago by Uday Rangaswamy120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1507 users visited in the last hour