Genetic data QC prior to imputation
1
0
Entering edit mode
3 months ago
kl ▴ 10

Hi,

I have data genotyped on the GSA array. I have a few questions and I would appreciate advice from someone with experience.

Should SNPs that have this sort of name 'exm_...." be removed from genetic data at the QC stage. Should SNPs with alleles codes such as 0 and A be removed giving one is missing? Should SNPs that have a kgp or JHU prefix be removed? What are they? Should chromosome 26 be removed? What about SNPs that have a SNP name such as 1 seq 0 12002028. A. G

Thanks!

imputation • 465 views
ADD COMMENT
1
Entering edit mode
3 months ago

Hi there,

Should SNPs that have this sort of name 'exm_...." be removed from genetic data at the QC stage.

Not necessarily, they used this ID cause it was part of their ExomeSNP array, probably because there was no RSID at the time, for example this one:

https://www.ncbi.nlm.nih.gov/projects/SNP/snp_ss.cgi?subsnp_id=ss1958317049

Should SNPs with alleles codes such as 0 and A be removed giving one is missing?

Yes, they are probably not SNPs, so it should be safe to remove them.

Should SNPs that have a kgp or JHU prefix be removed? What are they?

Here is how you can convert them to rsIDS:

https://github.com/nhettige/Updating-kgp-IDs-to-rs-IDs-for-SNPs-on-Illumina-HumanOmni2.5M-array

There should be other methods as well, like annotating with DBsnp, I wouldn't worry too much about the IDs as long the quality of the data looks ok.

Should chromosome 26 be removed?

Yes you can remove, here is anther post about them:

QC of genetic data

What about SNPs that have a SNP name such as 1 seq 0 12002028. A. G

Probably SNPs that still don't have an rsid, like in the first example.

ADD COMMENT
0
Entering edit mode

Hi Raony,

Thank you for your response. I am doing the sex check part of the QC but I only have 19 variants with MAF>0.05 and the sex check is messed up. Do you have any experience with this/advice on how to proceed? I would really appreciate it. PS: My sample is small (850 people)

ADD REPLY
0
Entering edit mode

Check how many variants you have without filtering for MAF, maybe use a MAF of 0.01? You could try the --impute-sex ycount or y-only. Do you have variants in the non-par region of chrX? What is the heterozigosity on for these 19 variants? What is the heterozigosity on all variants in chromossome X ? Normally chr Y have very few variants on snp arrays but that's usually enough to determine the sex. Try some other tools like peddy or somalier.

[1] https://github.com/brentp/peddy [2] https://github.com/brentp/somalier

ADD REPLY

Login before adding your answer.

Traffic: 2285 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6