How to tell Plink to merge based on Hg19 coordinates, not based on SNP ID
1
0
Entering edit mode
8.8 years ago
devenvyas ▴ 740

I am merging data sets, and the SNP IDs between the two are inconsistent, likely because Affx numbers changed between annotations. For example, merging SNP lists (in R) based on SNP ID loses me ~50,000 SNPs, but merging based on coordinate will only lose me ~10,000 SNPs (i.e., 545,956 sites vs. 585,413 sites).

If I have a command as such

plink --file data1 --merge data2.ped data2.map --recode --out merge

What do I do to tell Plink to ignore the SNP IDs from data2 and merge based on coordinate, not SNP ID?

Thanks!

-Deven

plink SNP • 4.4k views
ADD COMMENT
0
Entering edit mode

You could change the SNP if by his position in the map or bim file: chr1:123456789

ADD REPLY
0
Entering edit mode

I am not following quite what you mean, and I think that method may actually take longer than just getting Plink to ignore the .

Currently I have:

a) the two data sets in map/ped format

b) a list of 585,413 coordinates that match

c) a map file for the failed merger containing the 545,956 sites where both the SNP id and the coordinates both match

To begin with, I am not sure how to properly isolate the coordinates for the 39,457 that do not have matching SNP ids. After doing that I would need to find the old SNP id and the new SNP id for each.

Isn't there some simple way to tell Plink to ignore the SNP ids during merger?

ADD REPLY
3
Entering edit mode
8.8 years ago

I don't think PLINK can merge by the coordinates. I think PLINK merge by using the SNP id because you can have more than one SNP at one genomic position.

If you want to merge by the coordinate, change the SNP id (rs123456789) by the genomic position in the map file.

a) replace the SNP id by the genomic position in both map files: replace "rs123456789" by "chr:position". If you are using Linux, use awk (awk '{ print $1"\t"$1":"$4"\t0\t"$4 }' old-file.map > new-file.map

b) merge both dataset by using the new map files. Since the SNP id is now the genomic position in both map files, you should have 585,413 SNPs in the merge file.

c) merge the new map file with one of the old map files with R to recover the real SNP id. You need to choose which map file you are using for SNP id since your old map files are not identical.

EDIT

New idea. Why not change the SNP id from one map file with the names of the other map file by using the command --update-map.

ADD COMMENT
0
Entering edit mode

I ended up going with the --update-map method.

I had a list of coordinates that matched (whether or not the SNP id matched) in the format Chr#-bp#. I modified the two map files, so they were simply SNPid \t Chr#-bp#. I used R to get the intersections, so I had one output table with the old IDs and one with the new IDs from which I put together a table for --update-map. After that, everything is merging as it should.

ADD REPLY

Login before adding your answer.

Traffic: 1243 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6