Hi, all. I am learning to construct a polygenetic score in UKBB using published GWAS summary statistics. So I was trying to apply for certain SNPs in the AMS. When applying for SNPs one by one, I was asked to enter SNP (Affy) IDs. However, I came into the situation that most SNP rs ids unmatched in UKBB when using the searching function in UKBB websites (only 40 out of 400 matched). When I used the annotation file named "Axiom_UKB_WCSG.na34.annot" and "Axiom_UKBiLEVE.na34.annot", It's the same. Does anyone know how it is going on? Any could anyone tell me which file shall I use to match SNP ids? When the chromosome: position was present instead of rs ids, can I also match the information as well?
Are you using the plink genotyped files, bgen imputed genotype files, exome sequencing files, or the whole genome sequencing files?
I haven't applied for any genomics data before. However, in my new project, I'd like to create a ploygenetic score in individual level. So I wonder to apply for certain required SNPs from UKBB addtionally.
The way the SNP information stored in the data type I mentioned are different. For example, the SNP ids and secondary IDs are stored compressed for the bgen imputed data. Whereas the sequencing files might contain large portion of SNPs without a dedicated RS ID
You might check out existing PRS software to see (though most only work on genotyped data)
Thanks for your guide, Sam! The bgen imputed genotype file maybe used in the study. But the overlap between the GWAS snp ids and "Axiom_UKB_WCSG.na34.annot.csv" or "Axiom_UKBiLEVE.na34.annot", directly downloaded from the home page of UK Biobank, was quite limited. Do you know more about the issue?
I guess it might be easier for you to directly extract the SNP IDs from the bgen files. Use PLINK2 and convert the bgen file to pgen file, which generated the pvar file containing all the SNP ID. You might also want to do some light QC on that.
Depending on the source of your GWAS, it is completely normal for there to be a low overlap. If I remember correctly, the overlap between the GIANT GWAS summary statistic and the UKB genotyped (not imputed) data is less than 30%.
Thanks a lot, Sam! I'll have a try! Thank you!