Hi, I encountered an issue while performing SNP analysis using PLINK. Before proceeding with PRS calculation using LDpred2 and PRSice-2, I conducted a phenotype-based SNP analysis. Here's the situation:
I used a snp_filtered_binary
file as input and two different phenotype .txt
files for analysis. All NA (missing) values were already excluded from the phenotype files beforehand. However, when running the analysis, I noticed discrepancies in the number of phenotype values reported in the output:
For the first phenotype file, the output states:
6441 phenotype values present after --pheno.
The actual number of individuals without missing data should be 7246, meaning 805 phenotype data points are unexpectedly missing.
For the second phenotype file, the output states:
3201 phenotype values present after --pheno.
The actual number of individuals without missing data should be 3436, meaning 235 phenotype data points are unexpectedly missing.
Here are the options I used for this analysis:
--bfile /serotonin_snps_filtered_binary \
--autosome \
--linear \
--beta \
--ci 0.95 \
--pheno /Continuous_Clean.txt \
--out /Continuous_Clean_NoX
Could anyone help me understand why these individuals are being excluded? Thank you for your time!
Why do you think the .fam file still contains every sample in your phenotype files?