Entering edit mode
5.2 years ago
e.jabbari
•
0
Hi!
I appreciate this is a very basic question but I'm new to the R game so any help offered would be very much appreciated.
I have a column called "SNP" in a very large genetics dataset (~8million SNPs). The format of the data in each row of the SNP column is exactly the same, e.g. x1.752566.G.A_G.
All I need to do is to create 2 new columns:
- "chr" which takes the number after the letter x and before the first full stop
- "bp" which takes the number between the two full stops
Can anyone please tell me how to do this in R???
Much appreciated,
Ed
Hi Asaf, Thanks so much for your reply.
The first command line was fine but I got an error message with the 2nd command line that says object 'chr' not found which I guess is correct as I want to create both chr and pos as new columns. Do they have to be created first before running the 2nd command and if so, how?
Ed
Sorry, forgot the quotes. edited
Please do not add answers unless you're answering the top level question. This post belongs as a comment to Asaf's answer, and should have been added using the
Add Comment
button on his answer.Tidyverse is loaded, why not just go with magrittr's
%>%
?Apologies Ram. Hopefully I've done it correctly this time. Your input is also appreciated. Best wishes, Ed
Perfectly done this time :-)
Hi Asaf. This worked perfectly, thank you. However, 2 queries have arisen: 1) The original "SNP" column has disappeared and has been replaced by the newly created "chr" and "pos" columns. How should the command be edited to keep the SNP column but create the chr and pos columns in the way that has already been done? 2) I used the head function to see what the data looks like. It all looks great but missing data (always listed as "NA") and negative values now appear in red whereas they always appeared in black previously. Does that have any significance and if so, how can that be resolved??
Thanks again for your ongoing help.
Best wishes, Ed
before the
separate()
step, add amutate()
step to create a temporary copy of the SNP column forseparate()
purposes like so:The red is probably because
a
is now a tibble and tidyverse adds some rich display stuff to RStudio/R.I used exactly the above but got the following error message - Error in eval_tidy(enquo(var), var_env) : object 'sno_throwaway' not found ???
I'd made a typo - please try again. Also, always double check code before you run it on your computer - don't trust strangers on the internet so easily :-)
It worked, thank you! That's good advice too. Ed