Question

Merge two columns by having alternating rows

0

Entering edit mode

3.2 years ago

georgia.stavrou • 0

Hi Biostars,

I have been trying to merge two columns into a single column in R but I think I am missing something.

My dataframe looks something like this:

> df

Gene_name                     Sequence    
GAPDH                            ATTTCGGGA     
ENAM                              GGGCTTACG    
KRAS                              AAATGCTTTC

I would like to create a single-column dataframe that will show the gene and the sequence right underneath like so:

> Merge                    

GAPDH                            
ATTTCGGGA
ENAM                              
GGGCTTACG
KRAS         
AAATGCTTTC

Ive tried this: test<-cat(df$Gene_name,"\n",df$Sequence)

Doesnt seem to work..

Any ideas?

Many thanks, Gina

R Rstudio Dataframe • 3.7k views

ADD COMMENT • link updated 3.2 years ago by rpolicastro 13k • written 3.2 years ago by georgia.stavrou • 0

0

Entering edit mode

Do you want an actual new line or have all in the same column?

ADD REPLY • link 3.2 years ago by Asaf 10k

0

Entering edit mode

Something like df %>% transmute(col1 = paste0(col1, ",", col2)) %>% separate_rows(col1, sep = ",")? (The packages you'll need for this are margittr, dplyr, and tidyr.)

ADD REPLY • link 3.2 years ago by Dunois ★ 2.5k

0

Entering edit mode

I think you can use sed or something similar to convert spaces/tabs to newlines.

ADD REPLY • link 3.2 years ago by Fatima ▴ 1000

0

Entering edit mode

>data.frame(data=c(t(df)))

        data
1      GAPDH
2  ATTTCGGGA
3       ENAM
4  GGGCTTACG
5       KRAS
6 AAATGCTTTC

Converting df to parsable formats are better. Please try below:

> library(Biostrings)
> df_fas=DNAStringSet(df$Sequence)
> names(df_fas)=df$Gene_name
> df_fas

DNAStringSet object of length 3:
    width seq                          names               
[1]     9 ATTTCGGGA             GAPDH
[2]     9 GGGCTTACG             ENAM
[3]    10 AAATGCTTTC            KRAS

This way, you can further manipulate fasta in R.

ADD REPLY • link 3.2 years ago by cpad0112 21k

score 2 · Answer 1 · 2021-02-02

Your example data

df <- structure(list(Gene_name = c("GAPDH", "ENAM", "KRAS"), Sequence = c("ATTTCGGGA", 
"GGGCTTACG", "AAATGCTTTC")), class = "data.frame", row.names = c(NA, 
-3L))

You can pivot the data to long format to accomplish this. I use the tidyverse here.

library("dplyr")
library("tidyr")

long <- df %>%
  pivot_longer(everything()) %>%
  select(!name)

> long
# A tibble: 6 x 1
  value     
  <chr>     
1 GAPDH      
2 ATTTCGGGA
3 ENAM      
4 GGGCTTACG 
5 KRAS      
6 AAATGCTTTC