set-all-var-ids misuse in plink 2.0
1
0
Entering edit mode
7 weeks ago
hi.there • 0

Hello,

I have been fumbling around to become familiar with the set-all-var-ids command in plink 2.0 to update rsids in a .bim file based on chrome end position but it doesn't seem to be working. Would somebody point me towards a better direction? I have tried....

system("./plink2 --bim originalFile.bim --set-all-var-ids @_# --make-just-bim -out newBim")

...but it seems to overwrite the rsid column with the chrome end positions so the newBim file now has 2 chrome end position columns with no rsid column. I have tried swapping the columns of both the file containing the updated rsids and chrome end positions and I have tried swapping the columns of the original bim file. The first swap doesn't change how --set-all-var-ids updates. The second swap doesn't do anything as it forces the column swapped .bim file back into its original column order.

Additionally, I have tried plink 1.9's command: system("./plink --bim oldBim --update-name updateRSIDsChrome.txt 2 4 --make-just-bim -out newBim")

...but it's not updating any rsids. Again any help would be greatly appreciated.

plink rsid • 335 views
ADD COMMENT
1
Entering edit mode
7 weeks ago

One strategy is to break the rsID update into the following steps:

  1. Use --set-all-var-ids to update .bim IDs to something containing position and alleles.
  2. Use a short shell script to process the rsID resource to generate an --update-name input file where the old names use the --set-all-var-ids scheme you just specified, and the new names are the rsIDs you actually want.
  3. Run --update-name.
ADD COMMENT
0
Entering edit mode

Thank you. So I have isolated the chromosome, rsids and chromosome positions from the original bim file into its own file (I'm not sure if that is the step 1 you suggested) and have extracted the corresponding chromosomes, rsids and chromosome positions from the updated database from ucsc.

And I have used awk to join the two files on chromosome and chromosome end positions (string concatenated) into its own file. awk 'FNR==NR{a[$1]=$2 FS $3;next}{ print $0, a[$1]}' updated.txt fromOriginalBim.txt > combined.txt .... which did not produce 3 columns.

So as a sanity check, I checked for matches between chromosome with chromosome end positions between the two files for the first 20 rows. I have not found a single match of chromosome with end position overlap between the two files. Would you know of what I may be doing wrong as the number of rsids in the bim file is quite small compared to the ucsc file? In addition to RSIDs in the bim id column there are ids with common variant numbers with a preface of 'CNVI' and ids with a preface of 'MITO'. I am new to genetic data preprocessing so forgive me if this is a novice mistake.

ADD REPLY

Login before adding your answer.

Traffic: 831 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6