7.1 years ago
Simo ▴ 50

Starting from microarray data, I retrieved the same positions in other populations from the 1000 genomes. In few cases I've found that for the same locus there are two different rs IDs or more.

Sometimes they are simply separated by a semicolon:

19 123 rs123; rs432


Other times they are reported as two different sites with different rs IDs:

19 123 rs879
19 123 rs123; rs432


How can I deal with them? And what does this multiple rs codes for a locus mean?

Thanks

vcf SNP
This might be due to rs Id are merged sometimes, but your question is not clear. Some real examples might help.

This is an example of what I got:

CHR   POS        RS
3     104431873  rs59034722
3     104431873  rs563518351
11    2127926    rs369593278
11    2127926    rs536803896;rs143027169
11    2127926    rs555178206;rs574327794


From Microarray data I have some positions with no rs ID (marked as ---), so I retrieved them from the 1000 genomes. Now, since I have the situation I've shown you, how can I know which rs ID should be taken for that position, and which has to be removed?

Thanks

variation doesn't work that way. it isn't defined by a position only, but also by the allele chane. if several rs are positioned in the same location it means that there are several variations occurring there, so you'll have to see which variant is being tested on the microarray (chromosome, position, reference and alternative allele) and then get the corresponding rs.