Question: How to change SNP identifier & position to chr:start:end
1
gravatar for OAJn8634
6 months ago by
OAJn863450
OAJn863450 wrote:

I have a Plink file that contains chr, rs and position for 30K variants. I would like to find a way that would convert this information into chr, start and end. Is there an easy way of achieving this? I will be grateful for any advice.

ADD COMMENTlink modified 6 months ago by zx87547.8k • written 6 months ago by OAJn863450

Hello,

could you please give some examples of your input? Should the desired output a bed file? I'm asking because one have to consider the 0-based vs 1-based interval problematic.

fin swimmer

ADD REPLYlink written 6 months ago by finswimmer11k

Hello. Thank you for your response. My ultimate aim is to create a .txt file that will contain chr, start and end data for my 30K variants. I will then use this .txt file for other analysis. So I really do not mind the format for the output for a long as I can read it in R. My current bim file looks like this:

Chr         rs                  Pos   Base-pair coordinate  A1 A2 
23          rs34557243  24.7104       60425                 C  A
23          rs28419004  24.7103       60692                 T  C
23          rs28705946  230.9480      60882                 T  G

Please let me know if this is helpful. Thank you

ADD REPLYlink modified 6 months ago • written 6 months ago by OAJn863450
1
gravatar for finswimmer
6 months ago by
finswimmer11k
Germany
finswimmer11k wrote:

You can use awk to extract the columns with the chromosome name and the position to create a valid bed file:

$ awk -v FS="\t" -v OFS="\t" 'NR>1 {print $1, $4-1, $4, $2}' input.bim > output.bed

The coordinates in the bim files are 1-based. But bed uses 0-based coordinates. That's why we have to subtract 1 ($4-1) from the given position for the start coordinate.

This will create:

23  60424   60425   rs34557243
23  60691   60692   rs28419004
23  60881   60882   rs28705946

fin swimmer

ADD COMMENTlink modified 6 months ago • written 6 months ago by finswimmer11k

This is perfect. Thank you very much

ADD REPLYlink written 6 months ago by OAJn863450
0
gravatar for zx8754
6 months ago by
zx87547.8k
London
zx87547.8k wrote:

If we are going to read it into R, why create intermediate files? Just do it within R:

library(data.table)

fread("myBim.txt", skip = 1)[, list(V1, V4, V4)]
#    V1    V4    V4
# 1: 23 60425 60425
# 2: 23 60692 60692
# 3: 23 60882 60882
ADD COMMENTlink written 6 months ago by zx87547.8k

Thank you so much! This is so helpful!

ADD REPLYlink written 5 months ago by OAJn863450
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 616 users visited in the last hour