Question

Alzheimer's disease and its classification

0

Entering edit mode

5.5 years ago

bioinfo456 ▴ 150

I'm interning on an Alzheimer's disease project where I'm asked to build a classification model to classify the same. As of now, I have a dataset wherein its proven that the rs ids with a p-value less than 0.01 is sure to affect the gene expression for the disease and rs ids with p-value greater than 0.8 is considered healthy. So my question is, where can I find a dataset wherein I should be able to extract features like eQTLs, DNA stability, propensity value and build a classification model using the same. Any suggestions will be much appreciated. Thanks.

SNP R machine learning • 1.6k views

ADD COMMENT • link 5.5 years ago by bioinfo456 ▴ 150

2

Entering edit mode

Have you tried GEO or dbGAP ? Also, why are you using a subset of IGAP only ? All of IGAP summary stats are here : IGAP

ADD REPLY • link 5.5 years ago by NB ▴ 960

0

Entering edit mode

Thank you. Could you please elaborate this data? Are these the SNPs that influence Alzheimer's? If so, where can I find it's sequence to extract features out of it? Please help.

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150

1

Entering edit mode

SNPs that influence Alzheimer's

That kind of causal link is difficult to prove. Best you can hope for is a correlation between samples that have that variant and a particular diagnosis/marker for Alzheimer's.

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

Got you sir. Thanks a ton for your insight.

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150

1

Entering edit mode

all SNPs in IGAP identified to date are "susceptible loci" -> meaning there are mostly likely to be associated with a RISK of developing late onset Alzheimer's . The link in my comment above gives you the data or the summary statistics for IGAP. IGAP was a big study that constituted of many many groups across Europe and USA to collaborate and share summary stats together to perform meta analysis. One of them is ADGC as well. The link provided by you is only a subset of IGAP. I would suggest to read the IGAP paper (link is provided by Ram below in his comments) to try to understand how the analysis has been done and what conclusion the authors have drawn from it.

ADD REPLY • link 5.5 years ago by NB ▴ 960

1

Entering edit mode

try ADNI and AMP-AD

ADD REPLY • link 5.5 years ago by cpad0112 21k

0

Entering edit mode

ADNI requires registration and I don't seem to find any SNP related datasets in AMP-AD. Thanks for your time.

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150

0

Entering edit mode

rs IDs with a p-value

Sorry, what? How did you obtain these p-values? And how can a p-value > 0.8 mean anything in any statistical test?

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

The dataset is in the following link : https://www.niagads.org/igap-summary-statistics-adgc-only

It is the result obtained after a certain experiment which is why they're able to say so. So ya, any dataset that you're aware of that could be of any help to me please?

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150

0

Entering edit mode

Can you please show me where your resource says that a p-value above a threshold signifies anything? A larger p-value only means one thing in statistics: "The odds you're seeing this by chance is pretty high", which means "your results are not statistically significant". No inference can be made from such a p-value.

EDIT: The only mention I see is in the IGAP paper:

The results from stages 1 and 2 and from the combined stage 1 and stage 2 data sets, which represent a secondary discovery effort, are shown in Table 2. With the exception of CD33 and DSG2, we nominally replicated all loci that surpassed the genome-wide significance level in stage 1. Inability to replicate DSG2 is not surprising, as evidence of association for this locus was based on data for a single SNP and was not supported by data from surrounding SNPs in linkage disequilibrium (LD, r2 > 0.8; Supplementary Fig. 7b)

Is this what you're referring to? If it is, I can't make the connection between a r² value, which is a measure of correlation and a p-value threshold.

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

I'll get back to you in a couple of days regarding this coz this is what I was told by my mentor. On the other hand, as for my understanding, I have a list of rs ids that influence alzheimer's. I need to extract a certain features out of it and build a classification model to classify whether a certain rs id falls within its class or not. How can I go ahead with this? Please help.

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150

0

Entering edit mode

The terms you use and the approach you speak about looks a lot like machine learning. For classification, you'd need a well annotated truth set for training. I'm not a Machine Learning expert, maybe someone else can help you with that.

ADD REPLY • link 5.5 years ago by Ram 43k

0

Entering edit mode

Kindly refer to the paper of the above mentioned link.

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150

0

Entering edit mode

I think I can retrieve the data I need from an R package "rsnps". Could anybody tell me how I could select feature for the classification purpose please. Thanks for your time all of you :).

ADD REPLY • link 5.5 years ago by bioinfo456 ▴ 150