Question: Full-stop/period as SNP identifier in BIM file causes error in Plink v1.9 --read-freq command
gravatar for olavur
3.3 years ago by
Tórshavn, Faroe Islands
olavur100 wrote:

In the example shown below, you see that certain variants have a full-stop/period instead of an identifier.

1       rs1495237       0       4372049 T       C
1       .       0       4372921 0       T
1       .       0       4372921 0       T
1       rs1353341       0       4372992 A       G
1       rs12080695      0       4375410 A       G

When I try to run a Plink command with --read-freq, I get an error. For example:

plink --bfile output_data/results_pruned --read-freq output_data/results_freq.frq --make-bed --out temp

Gives me the output:

PLINK v1.90b3.36 64-bit (31 Mar 2016)
(C) 2005-2016 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to temp.log.
Options in effect:
  --bfile output_data/results_pruned
  --out temp
  --read-freq output_data/results_freq.frq

64386 MB RAM detected; reserving 32193 MB for main workspace.
Allocated 13581 MB successfully, after larger attempt(s) failed.
151955 variants loaded from .bim file.
48 people (26 males, 22 females) loaded from .fam.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 32 founders and 16 nonfounders present.
Calculating allele frequencies... done.
Total genotyping rate is 0.999511.
Error: Duplicate ID '.'.

I have only encountered this problem when using --read-freq.

Obviously, one thing I can do is remove all variants with these missing identifiers. Does any have an other solution, or perhaps just an explanation?

snp plink • 1.8k views
ADD COMMENTlink modified 3.3 years ago by sbk40 • written 3.3 years ago by olavur100

Check if it runs when NA is replaced with "." NA is more standard for missing values than "."

ADD REPLYlink written 3.3 years ago by Santosh Anand5.1k
gravatar for sbk
3.3 years ago by
sbk40 wrote:

Hi @Olavur,

You can probably replace the ID column with "chr:pos:ref:alt". Example: 1:4372049:T: C. You can do this to all rows also if you don't need rsID column for further analysis. This can be achieved using following awk script:

awk 'BEGIN{FS=OFS="\t"}{$2=$1":"$4":"$5":"$6;print}' filename.bim

if you would like to keep the rsID column for which ever is available then add a if loop

awk 'BEGIN{FS=OFS="\t"}{if($2~/^rs/){$2=$1":"$4":"$5};print}' filename.bim

ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by sbk40
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 688 users visited in the last hour