How to change SNP identifier & position to chr:start:end
2
1
Entering edit mode
5.3 years ago
OAJn8634 ▴ 60

I have a Plink file that contains chr, rs and position for 30K variants. I would like to find a way that would convert this information into chr, start and end. Is there an easy way of achieving this? I will be grateful for any advice.

SNP annotation conversion genome plink • 4.1k views
ADD COMMENT
0
Entering edit mode

Hello,

could you please give some examples of your input? Should the desired output a bed file? I'm asking because one have to consider the 0-based vs 1-based interval problematic.

fin swimmer

ADD REPLY
0
Entering edit mode

Hello. Thank you for your response. My ultimate aim is to create a .txt file that will contain chr, start and end data for my 30K variants. I will then use this .txt file for other analysis. So I really do not mind the format for the output for a long as I can read it in R. My current bim file looks like this:

Chr         rs                  Pos   Base-pair coordinate  A1 A2 
23          rs34557243  24.7104       60425                 C  A
23          rs28419004  24.7103       60692                 T  C
23          rs28705946  230.9480      60882                 T  G

Please let me know if this is helpful. Thank you

ADD REPLY
1
Entering edit mode
5.3 years ago

You can use awk to extract the columns with the chromosome name and the position to create a valid bed file:

$ awk -v FS="\t" -v OFS="\t" 'NR>1 {print $1, $4-1, $4, $2}' input.bim > output.bed

The coordinates in the bim files are 1-based. But bed uses 0-based coordinates. That's why we have to subtract 1 ($4-1) from the given position for the start coordinate.

This will create:

23  60424   60425   rs34557243
23  60691   60692   rs28419004
23  60881   60882   rs28705946

fin swimmer

ADD COMMENT
0
Entering edit mode

This is perfect. Thank you very much

ADD REPLY
0
Entering edit mode
5.3 years ago
zx8754 11k

If we are going to read it into R, why create intermediate files? Just do it within R:

library(data.table)

fread("myBim.txt", skip = 1)[, list(V1, V4, V4)]
#    V1    V4    V4
# 1: 23 60425 60425
# 2: 23 60692 60692
# 3: 23 60882 60882
ADD COMMENT
0
Entering edit mode

Thank you so much! This is so helpful!

ADD REPLY

Login before adding your answer.

Traffic: 2432 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6