Question

Transforming big file

0

Entering edit mode

3.4 years ago

mel22 ▴ 100

Hello, I have a very large text file with 500 000 columns and 220 rows inside each columns i have 4 different values like this ```

Sample_name       "VAR1"                                "VAR2"               
Sample1           "-0.0570 0.0113 1.035 0.061"          " -0.3631 0.0065 0.842 0.045" 
Sample2           "-0.0334 1.0000 0.013 0.813"          "-0.0604 0.9639 0.052 0.764"

Now i want to transform it like that

Var_name   Sample_name            V1         V2         V3     V4      
VAR1          Sample1             -0.0570  0.0113    1.035    0.061   
VAR2          Sample1            -0.3631  0.0065     0.842    0.045   
VAR1          Sample2            -0.0334   1.0000     0.013    0.813

Any idea about how can I do this please ?

SNP • 1.2k views

ADD COMMENT • link updated 3.4 years ago by bioinformatics2020 ▴ 820 • written 3.4 years ago by mel22 ▴ 100

0

Entering edit mode

The function is called transpose. https://stackoverflow.com/questions/4869189/how-to-transpose-a-dataset-in-a-csv-file https://unix.stackexchange.com/questions/60590/is-there-a-command-line-utility-to-transpose-a-csv-file

ADD REPLY • link 3.4 years ago by karl.stamm 4.1k

score 3 · Answer 1 · 2020-11-27

3

Entering edit mode

3.4 years ago

bioinformatics2020 ▴ 820

If your data.frame is named df

  Sample_name                       VAR1                       VAR2
1     Sample1 -0.0570 0.0113 1.035 0.061 -0.3631 0.0065 0.842 0.045
2     Sample2 -0.0334 1.0000 0.013 0.813 -0.0604 0.9639 0.052 0.764

And then using tidyr:

#install.packages("tidyr")
library(tidyr)
df <- pivot_longer(df, cols = !Sample_name)
df <- separate(df, col = value, sep = " ", into = c("V1","V2","V3","V4"))

The resulting data.frame will look like:

# A tibble: 4 x 6
  Sample_name name  V1      V2     V3    V4   
  <chr>       <chr> <chr>   <chr>  <chr> <chr>
1 Sample1     VAR1  -0.0570 0.0113 1.035 0.061
2 Sample1     VAR2  -0.3631 0.0065 0.842 0.045
3 Sample2     VAR1  -0.0334 1.0000 0.013 0.813
4 Sample2     VAR2  -0.0604 0.9639 0.052 0.764

ADD COMMENT • link 3.4 years ago by bioinformatics2020 ▴ 820

0

Entering edit mode

It's a transpose. The function is called t() and doesn't need tidyr or tibbles or pivots.

ADD REPLY • link 3.4 years ago by karl.stamm 4.1k

2

Entering edit mode

No, that is not the right solution. Please re-read OPs question. Besides the first column, their data.frame has columns with values that they would like separated out by a space (and into four subsequent columns.) But let's pretend that wasn't the case:

df <- data.frame(
  Sample_name = c("Sample1", "Sample2"),
  VAR1 = c("-0.0570 0.0113 1.035 0.061", "-0.0334 1.0000 0.013 0.813"),
  VAR2 = c("-0.3631 0.0065 0.842 0.045", "-0.0604 0.9639 0.052 0.764")
)

df <- t(df)

           [,1]                         [,2]                        
Sample_name "Sample1"                    "Sample2"                   
VAR1        "-0.0570 0.0113 1.035 0.061" "-0.0334 1.0000 0.013 0.813"
VAR2        "-0.3631 0.0065 0.842 0.045" "-0.0604 0.9639 0.052 0.764"

We are left with sample names as a row. And with the other manipulations we would need, it would equal the same steps as I posted.

ADD REPLY • link 3.4 years ago by bioinformatics2020 ▴ 820

0

Entering edit mode

Yes exactly , the problem its not only transpose but also to separate columns . Thank you very much , I will test this solution

ADD REPLY • link 3.4 years ago by mel22 ▴ 100