How to change my SNP Id format from chr1_847228_C_T to chr1_847228 in my .bim file.
1
0
Entering edit mode
11 months ago
geno89 ▴ 10

My .bim file has format of SNP ID as chr1_847228_C_T and i want to change it to chr1_847228 format. The reason to do is i want to apply --update name flag to update my SNP ids into rs ID and the reference file have the SNP id format as chr1_847228 .

Chromosome SNP position Plink unix rsid • 647 views
ADD COMMENT
2
Entering edit mode
11 months ago
Sam ★ 4.4k

awk '{split($2,a,"_"); print $1,a[1]"_"a[2],$3,$4,$5,$6}' bim > new_bim

might work. This will "tokenize" your rs_id by _ and then make a new id by combining the first and second item together.

ADD COMMENT
0
Entering edit mode

Hi. Thanks it worked well to remove allele from chr1_847228_C_T but as my SNP ID column contains some other formats too like rs2880024 and exm888888 so it changed them to rs2880024_ and exm888888_. How can i remove this "_" from them. Thanks

ADD REPLY
0
Entering edit mode

Then you can do awk '{n=split($2,a,"_"); if(n>=2){print $1,a[1]"_"a[2],$3,$4,$5,$6}}else{print $1,$2,$3,$4,$5,$6}' bim > new_bim

You can also do print $0 for the else part, though I am not sure if that will mess up the file separator (forgot the default of PLINK)

ADD REPLY

Login before adding your answer.

Traffic: 1529 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6