Help a beginner, results from plink logistic regression don't seem to make sense
1
0
Entering edit mode
2.2 years ago
Frédéric • 0

Hi everyone,

I'm a Msc. student in epidemiology with absolutely no background in genetics and Plink who's in over his head in his genetic epidemiology class, so I'm sorry in advance if my question seems trivial or if it's not explained well.

I have a homework that was in two parts, the first part being quality control/cleaning a dataset and this part which is a statistical analysis of the data we cleaned earlier. The first part went well, I have the same amount of SNPs and participants remaining as in the teacher's solution.

The final dataset contains 500 SNPs and around 15 000 participants. Our SNPs are all 300kb around the PITX2 gene. This is a case-control study in which we're trying to find the association between SNPs and one specific phenotype using logistic regression. It's also not explicitly stated, but the way things are worded, I have a feeling I'm supposed to find one SNP associated with the phenotype and then talk about it.

We are using Plink and for building our model the teacher gives us the code line, we basically just have to choose which covariates to include. I did exactly that and went in R to make a manhattan plot and look at my results. I created a column to get my -log10 of my p-values and realized that most of my 500 SNPs are statistically significant. About one-fifth of them have a -log10 p-value of around 40 and not a single one of them stands out from the rest. That was the code I used for my regression.

plink \
  --bfile cohorte3rs \
  --logistic sex \
  --ci 0.95 \
  --covar covariables.txt \
  --covar-name AGE SEX BMI EDUYRS C1 C2 C3 C4 C5 \
  --hide-covar \
  --out model1

Now, I'm a total beginner but that doesn't seem to make sense, so I'm wondering what could have went wrong?

I'm pretty sure about the covariates I'm using, even when I try changing them a bit the result doesn't seem to really change.

Is it possible that I screwed up somewhere in the data cleaning part but was still able to get exactly the same number of remaining SNPs and participants as intended?

Sorry for the long post, if any of you can give me some assistance, that would be greatly appreciated, thanks!

plink logistic-regression homework • 1.1k views
ADD COMMENT
2
Entering edit mode
2.2 years ago
Sam ★ 4.7k

This is possible. When we generate our teaching materials, we select significant regions because it'd be rather boring to ask students to find null results (according to my supervisor anyway). Your model seems alright, thought you can ignore SEX in your covar-name section unless the SEX information in your fam file differ from those in your covariates as --logistic sex already help you included the sex information in the fam file to your regression model (you can check by removing the hide-covar option), e.g.

plink \
  --bfile cohorte3rs \
  --logistic hide-covar sex \
  --ci 0.95 \
  --covar covariables.txt \
  --covar-name AGE BMI EDUYRS C1 C2 C3 C4 C5 \
  --out model1

(I think --logistic hide-covar is the same as --logistic --hide-covar but the former is what I usually use)

ADD COMMENT
0
Entering edit mode

Thank you for taking the time to answer!

I copy/pasted the version in which I forgot to remove SEX from my covariates. I didn't expect this type of results since we usually read articles and it's not what we're used to see. I tried a couple of my highest -log10 p values on https://genetics.opentargets.org/ and most seem to be strongly associated with my phenotype so it actually makes sense.

Thanks again!

ADD REPLY

Login before adding your answer.

Traffic: 2898 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6