Merge two columns by having alternating rows
1
0
Entering edit mode
3.2 years ago

Hi Biostars,

I have been trying to merge two columns into a single column in R but I think I am missing something.

My dataframe looks something like this:

> df

Gene_name                     Sequence    
GAPDH                            ATTTCGGGA     
ENAM                              GGGCTTACG    
KRAS                              AAATGCTTTC

I would like to create a single-column dataframe that will show the gene and the sequence right underneath like so:

> Merge                    

GAPDH                            
ATTTCGGGA
ENAM                              
GGGCTTACG
KRAS         
AAATGCTTTC

Ive tried this: test<-cat(df$Gene_name,"\n",df$Sequence)

Doesnt seem to work..

Any ideas?

Many thanks, Gina

R Rstudio Dataframe • 3.7k views
ADD COMMENT
0
Entering edit mode

Do you want an actual new line or have all in the same column?

ADD REPLY
0
Entering edit mode

Something like df %>% transmute(col1 = paste0(col1, ",", col2)) %>% separate_rows(col1, sep = ",")? (The packages you'll need for this are margittr, dplyr, and tidyr.)

ADD REPLY
0
Entering edit mode

I think you can use sed or something similar to convert spaces/tabs to newlines.

ADD REPLY
0
Entering edit mode
>data.frame(data=c(t(df)))

        data
1      GAPDH
2  ATTTCGGGA
3       ENAM
4  GGGCTTACG
5       KRAS
6 AAATGCTTTC

Converting df to parsable formats are better. Please try below:

> library(Biostrings)
> df_fas=DNAStringSet(df$Sequence)
> names(df_fas)=df$Gene_name
> df_fas

DNAStringSet object of length 3:
    width seq                          names               
[1]     9 ATTTCGGGA             GAPDH
[2]     9 GGGCTTACG             ENAM
[3]    10 AAATGCTTTC            KRAS

This way, you can further manipulate fasta in R.

ADD REPLY
2
Entering edit mode
3.2 years ago

Your example data

df <- structure(list(Gene_name = c("GAPDH", "ENAM", "KRAS"), Sequence = c("ATTTCGGGA", 
"GGGCTTACG", "AAATGCTTTC")), class = "data.frame", row.names = c(NA, 
-3L))

You can pivot the data to long format to accomplish this. I use the tidyverse here.

library("dplyr")
library("tidyr")

long <- df %>%
  pivot_longer(everything()) %>%
  select(!name)

> long
# A tibble: 6 x 1
  value     
  <chr>     
1 GAPDH      
2 ATTTCGGGA
3 ENAM      
4 GGGCTTACG 
5 KRAS      
6 AAATGCTTTC
ADD COMMENT

Login before adding your answer.

Traffic: 2592 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6