Question: removing duplicate SNPs (same position) with lowest call rate
0
jani.p.heikkinen • 0 wrote:
I am trying to solve a problem with my genotyped array data set. For reason or another, the data set has duplicate or with three different names pointing to the same position. For example:
index | SNP | pos | A1 | A2 | F_MISS |
---|---|---|---|---|---|
2046 | snp_1 | 113890304 | C | T | 0 |
2047 | snp_2 | 113890304 | C | T | 0.000422 |
2048 | snp_3 | 113890304 | C | T | 0 |
I want to build a list for SNP names to be removed (so I can exclude them in PLINK).
So from the SNPs above, snp_1 or snp_3 and snp_2 should be in removal list.
How would I achieve this?
ADD COMMENT
• link
•
modified 4.2 years ago
by
Biostar ♦♦ 20
•
written
5.0 years ago by
jani.p.heikkinen • 0