Question

How to change SNP identifier & position to chr:start:end

1

Entering edit mode

5.3 years ago

OAJn8634 ▴ 60

I have a Plink file that contains chr, rs and position for 30K variants. I would like to find a way that would convert this information into chr, start and end. Is there an easy way of achieving this? I will be grateful for any advice.

SNP annotation conversion genome plink • 4.1k views

ADD COMMENT • link updated 5.3 years ago by zx8754 11k • written 5.3 years ago by OAJn8634 ▴ 60

0

Entering edit mode

Hello,

could you please give some examples of your input? Should the desired output a bed file? I'm asking because one have to consider the 0-based vs 1-based interval problematic.

fin swimmer

ADD REPLY • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

Hello. Thank you for your response. My ultimate aim is to create a .txt file that will contain chr, start and end data for my 30K variants. I will then use this .txt file for other analysis. So I really do not mind the format for the output for a long as I can read it in R. My current bim file looks like this:

Chr         rs                  Pos   Base-pair coordinate  A1 A2 
23          rs34557243  24.7104       60425                 C  A
23          rs28419004  24.7103       60692                 T  C
23          rs28705946  230.9480      60882                 T  G

Please let me know if this is helpful. Thank you

ADD REPLY • link 5.3 years ago by OAJn8634 ▴ 60

score 1 · Answer 1 · 2019-01-21

1

Entering edit mode

5.3 years ago

finswimmer 16k

You can use awk to extract the columns with the chromosome name and the position to create a valid bed file:

$ awk -v FS="\t" -v OFS="\t" 'NR>1 {print $1, $4-1, $4, $2}' input.bim > output.bed

The coordinates in the bim files are 1-based. But bed uses 0-based coordinates. That's why we have to subtract 1 ($4-1) from the given position for the start coordinate.

This will create:

23  60424   60425   rs34557243
23  60691   60692   rs28419004
23  60881   60882   rs28705946

fin swimmer

ADD COMMENT • link 5.3 years ago by finswimmer 16k

0

Entering edit mode

This is perfect. Thank you very much

ADD REPLY • link 5.3 years ago by OAJn8634 ▴ 60

score 0 · Answer 2 · 2019-01-21

0

Entering edit mode

5.3 years ago

zx8754 11k

If we are going to read it into R, why create intermediate files? Just do it within R:

library(data.table)

fread("myBim.txt", skip = 1)[, list(V1, V4, V4)]
#    V1    V4    V4
# 1: 23 60425 60425
# 2: 23 60692 60692
# 3: 23 60882 60882