updating .bim with new rsids
2
0
Entering edit mode
21 months ago
hi.there • 0

Hello,

I am a novice with plink and working with a .bim, .fam and .bed data with outdated rsids for chrom position. With help with the UCSC genome browser, I have a new .txt file in hopes of updating the rsids by chrom position:

system("./plink --bim oldBim --update-name onlySNPs.uniqLocAndld.txt 2 4 --make-just-bim -out newBim")

.... Unfortunately, I am not getting any updated rsids (it runs through with zero updates). Would anybody know what I may be doing wrong?

The new onlySNPs.uniqLocAndld.txt was generated following these commands to get uniqueness:

curl -O https://hgdownload.gi.ucsc.edu/goldenPath/hg38/database/snp151Common.txt.gz
gunzip -c snp151Common.txt.gz | cut -f 2,4,5,12,17 | grep single.exact | cut -f 1-3 > onlySNPs.tsv
sort -k3 -u onlySNPs.tsv | sort -k1,2 -u > onlySNPs.uniqLocAndId.tsv 

Any help would be greatly appreciated.

plink chrom update rsid • 1.3k views
ADD COMMENT
1
Entering edit mode
21 months ago

--update-name requires both old and new IDs; I'm guessing the file you provided did not include the old IDs, so plink didn't know what to update.

plink 2.0's --set-all-var-ids flag is useful here for forcing the old IDs into a position-based format that's easy to generate in the --update-name file.

ADD COMMENT
0
Entering edit mode

I see. Thank you. I was under the impression that the end chromosome position in the .txt file was all that was needed to search to replace the rsid with --update-name with correct fields "2, 4" specified. I suppose that isn't correct.

I have updated to plink 2.0 but the documentation for --set-all-var-ids is unclear. How would one use it in this instance? Also, for future reference of "--update-name", how would I reformat my onlySNPs.uniqLocAndld.txt file in order for the --update-name command to work?

ADD REPLY
1
Entering edit mode

You should take a step back and learn how to use some Unix text-processing tools. The most relevant ones are "cut", "paste", "head"/"tail", and to a lesser degree "sed" and "awk" ("sed" and "awk" are somewhat complicated, but you only need to be aware of their existence and learn a few simple usage patterns for now). I can see from your question that you should already be aware of "cut".

After you've done this, you should be able to come up with a coherent way to use --set-all-var-ids in this context.

ADD REPLY
0
Entering edit mode
21 months ago
hi.there • 0

Thank you again. I will review my my unix commands. Are there any additional resources besides the plink documentation (link below) where I can get a better feel for how --set-all-var-ids is used?

https://www.cog-genomics.org/plink/2.0/data

There are plenty of biostar posts where --set-all-var-ids is recommended but I could not find any committed lines and the documentation explanation of how to use "\$r,\$a" and "@:#" is unclear to me in this usage case.

ADD COMMENT
1
Entering edit mode

Just try a few things out on a small dataset? This isn't a complicated command. (Technically, you don't even need it; you can use a Unix one-liner to do the same thing.)

ADD REPLY
0
Entering edit mode

Okay. Thanks. Perhaps tinkering with the command will help me learn it.

ADD REPLY

Login before adding your answer.

Traffic: 2959 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6