Question

Separating text from one column in R

0

Entering edit mode

4.9 years ago

e.jabbari • 0

Hi!

I appreciate this is a very basic question but I'm new to the R game so any help offered would be very much appreciated.

I have a column called "SNP" in a very large genetics dataset (~8million SNPs). The format of the data in each row of the SNP column is exactly the same, e.g. x1.752566.G.A_G.

All I need to do is to create 2 new columns:

"chr" which takes the number after the letter x and before the first full stop
"bp" which takes the number between the two full stops

Can anyone please tell me how to do this in R???

Much appreciated,
Ed

R • 1.2k views

ADD COMMENT • link updated 11 months ago by Ram 43k • written 4.9 years ago by e.jabbari • 0

score 2 · Answer 1 · 2019-06-17

2

Entering edit mode

4.9 years ago

Asaf 10k

With tidyverse:

library(tidyverse)
a <- as.tibble(a)
a <- separate(a, SNP, c(NA, 'chr', 'pos', NA, NA), "[x.]")

ADD COMMENT • link 4.9 years ago by Asaf 10k

0

Entering edit mode

Hi Asaf, Thanks so much for your reply.

The first command line was fine but I got an error message with the 2nd command line that says object 'chr' not found which I guess is correct as I want to create both chr and pos as new columns. Do they have to be created first before running the 2nd command and if so, how?

Ed

ADD REPLY • link 4.9 years ago by e.jabbari • 0

0

Entering edit mode

Sorry, forgot the quotes. edited

ADD REPLY • link 4.9 years ago by Asaf 10k

0

Entering edit mode

Please do not add answers unless you're answering the top level question. This post belongs as a comment to Asaf's answer, and should have been added using the Add Comment button on his answer.

ADD REPLY • link 4.9 years ago by Ram 43k

0

Entering edit mode

Tidyverse is loaded, why not just go with magrittr's %>%?

a <- a %>% as.tibble() %>% separate(SNP, c('chr', 'pos', NA, NA), "[x.]")

ADD REPLY • link 4.9 years ago by Ram 43k

0

Entering edit mode

Apologies Ram. Hopefully I've done it correctly this time. Your input is also appreciated. Best wishes, Ed

ADD REPLY • link 4.9 years ago by e.jabbari • 0

0

Entering edit mode

Perfectly done this time :-)

ADD REPLY • link 4.9 years ago by Ram 43k

0

Entering edit mode

Hi Asaf. This worked perfectly, thank you. However, 2 queries have arisen: 1) The original "SNP" column has disappeared and has been replaced by the newly created "chr" and "pos" columns. How should the command be edited to keep the SNP column but create the chr and pos columns in the way that has already been done? 2) I used the head function to see what the data looks like. It all looks great but missing data (always listed as "NA") and negative values now appear in red whereas they always appeared in black previously. Does that have any significance and if so, how can that be resolved??

Thanks again for your ongoing help.

Best wishes, Ed

ADD REPLY • link 4.9 years ago by e.jabbari • 0

1

Entering edit mode

before the separate() step, add a mutate() step to create a temporary copy of the SNP column for separate() purposes like so:

a <- a %>% as.tibble() %>% mutate(snp_throwaway = SNP) %>% separate(snp_throwaway, c('chr', 'pos', NA, NA), "[x.]")

The red is probably because a is now a tibble and tidyverse adds some rich display stuff to RStudio/R.

ADD REPLY • link 4.9 years ago by Ram 43k

0

Entering edit mode

I used exactly the above but got the following error message - Error in eval_tidy(enquo(var), var_env) : object 'sno_throwaway' not found ???

ADD REPLY • link 4.9 years ago by e.jabbari • 0

1

Entering edit mode

I'd made a typo - please try again. Also, always double check code before you run it on your computer - don't trust strangers on the internet so easily :-)