Separating text from one column in R
1
0
Entering edit mode
4.9 years ago
e.jabbari • 0

Hi!

I appreciate this is a very basic question but I'm new to the R game so any help offered would be very much appreciated.

I have a column called "SNP" in a very large genetics dataset (~8million SNPs). The format of the data in each row of the SNP column is exactly the same, e.g. x1.752566.G.A_G.

All I need to do is to create 2 new columns:

  1. "chr" which takes the number after the letter x and before the first full stop
  2. "bp" which takes the number between the two full stops

Can anyone please tell me how to do this in R???

Much appreciated,
Ed

R • 1.2k views
ADD COMMENT
2
Entering edit mode
4.9 years ago
Asaf 10k

With tidyverse:

library(tidyverse)
a <- as.tibble(a)
a <- separate(a, SNP, c(NA, 'chr', 'pos', NA, NA), "[x.]")
ADD COMMENT
0
Entering edit mode

Hi Asaf, Thanks so much for your reply.

The first command line was fine but I got an error message with the 2nd command line that says object 'chr' not found which I guess is correct as I want to create both chr and pos as new columns. Do they have to be created first before running the 2nd command and if so, how?

Ed

ADD REPLY
0
Entering edit mode

Sorry, forgot the quotes. edited

ADD REPLY
0
Entering edit mode

Please do not add answers unless you're answering the top level question. This post belongs as a comment to Asaf's answer, and should have been added using the Add Comment button on his answer.

ADD REPLY
0
Entering edit mode

Tidyverse is loaded, why not just go with magrittr's %>%?

a <- a %>% as.tibble() %>% separate(SNP, c('chr', 'pos', NA, NA), "[x.]")
ADD REPLY
0
Entering edit mode

Apologies Ram. Hopefully I've done it correctly this time. Your input is also appreciated. Best wishes, Ed

ADD REPLY
0
Entering edit mode

Perfectly done this time :-)

ADD REPLY
0
Entering edit mode

Hi Asaf. This worked perfectly, thank you. However, 2 queries have arisen: 1) The original "SNP" column has disappeared and has been replaced by the newly created "chr" and "pos" columns. How should the command be edited to keep the SNP column but create the chr and pos columns in the way that has already been done? 2) I used the head function to see what the data looks like. It all looks great but missing data (always listed as "NA") and negative values now appear in red whereas they always appeared in black previously. Does that have any significance and if so, how can that be resolved??

Thanks again for your ongoing help.

Best wishes, Ed

ADD REPLY
1
Entering edit mode

before the separate() step, add a mutate() step to create a temporary copy of the SNP column for separate() purposes like so:

a <- a %>% as.tibble() %>% mutate(snp_throwaway = SNP) %>% separate(snp_throwaway, c('chr', 'pos', NA, NA), "[x.]")

The red is probably because a is now a tibble and tidyverse adds some rich display stuff to RStudio/R.

ADD REPLY
0
Entering edit mode

I used exactly the above but got the following error message - Error in eval_tidy(enquo(var), var_env) : object 'sno_throwaway' not found ???

ADD REPLY
1
Entering edit mode

I'd made a typo - please try again. Also, always double check code before you run it on your computer - don't trust strangers on the internet so easily :-)

ADD REPLY
0
Entering edit mode

It worked, thank you! That's good advice too. Ed

ADD REPLY

Login before adding your answer.

Traffic: 2942 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6