Question: Split excel column into different files
1
gravatar for goh
7 days ago by
goh10
goh10 wrote:

Hi,

I have been trying to use multiple command i can find online to spilt the columns into different files but couldnt.

I have an 10 columns excel file ( first 2 col is paramaters, the remaining 8 col is the data). I would like to split it into 8 separate files which all 8 contains first 2 col paramaters and 1 data col.

May i know does anyone have any idea to get around this ?

Thank you.

excel • 158 views
ADD COMMENTlink written 7 days ago by goh10
3

Good description of data. Please post some example data and expected output.@OP Export the text as tab separated file and install shell (bash preferably), cut in windows or wsl if you are using windows 10, try running following code with exported test.txt:

$ for i in $(seq 3 10); do  cut -f1,2,$i test.txt > file_$i.txt;done

using parallel (one needs to install gnu-parallel for this):

$ parallel  'cut -f1,2,{} test.txt > test_{}.txt' ::: {3..10}
ADD REPLYlink modified 7 days ago • written 7 days ago by cpad01127.6k

Hi,

Thanks. will try it out. The example is here. column B and E is my parameters. column K until R is my data which i would like to separate it into individual files. Thanks!

ADD REPLYlink written 6 days ago by goh10
1

Here is the code in R: Make sure that all for the files (excel files with .xls or .xlsx) are in the same format as in link furnished above (2nd column is mz, 5 is RT and 11th column onward samples). Code would create one excel file per sample and name of the excel file would be sample_"sample name".xls. It will have three columns: mz, RT and sample name.

library(readxl)
library(WriteXLS)
df=data.frame(read_xls("example.xls"))

for (i in 11:ncol(df)){
    temp_df=df[,c(2,5,i)]
    WriteXLS(temp_df, ExcelFileName = paste0("sample_",names(df)[i],".xls"))
}

if you have multiple xls files in the folder (and only xls files of interest):

setwd("~/Desktop/test") # change this to directory of interest
fn=list.files(pattern = "\\.xls") # lists files with .xls extension
library(readxl)
library(WriteXLS)

for (i in fn){
    df=data.frame(read_xls(i))
    for (j in 11:ncol(df)){
        temp_df=df[,c(2,5,j)]
        WriteXLS(temp_df, ExcelFileName = paste0(sub('\\.xls$', '', i) ,"_","sample_",names(df)[j],".xls"))
    }
    }

output files will have "xlsfilename_samplename.xls"

ADD REPLYlink modified 6 days ago • written 6 days ago by cpad01127.6k
1

in gnu-linux, one can do this from shell (and with the help of libreoffice):

for i in *.xls;
    do libreoffice --headless --convert-to csv $i --outdir .;
    nf=$(awk -F"," '{print NF;exit}' ${i%.*}.csv);
    for j in $(seq 11 18);
        do nf_head=$(awk -v j=$j -F"," '{print $j;exit}' ${i%.*}.csv); 
    cut -d"," -f 3,5,$j ${i%.*}.csv > ${i%.*}_$nf_head.csv;
    libreoffice --headless --convert-to xls ${i%.*}_$nf_head.csv  --outdir .;
    rm ${i%.*}_$nf_head.csv;
    done;
done

Since it starts with *.xls and writes output to xls files, be careful in running the script second time if there are any issues.

ADD REPLYlink modified 6 days ago • written 6 days ago by cpad01127.6k
1

By the time you found the solution you would have easily done this manually. Or is this a task which you have to do repeatedly?

The link with bioinformatics is not immediately obvious. Please elaborate.

Excel is not our 'favorite' tool but if you would convert the file to a text file you would have plenty of options.

ADD REPLYlink written 7 days ago by WouterDeCoster30k

Its a task i have to do it repeatedly on many different files. I have posted the example above.

I dont mind to convert the excel file into other types of file to get the job done, and then convert back to excel.

ADD REPLYlink written 6 days ago by goh10
1

@OP: You don't have to convert excel into other formats if your excel files are manageable in size and consists of uniform format. Both R and Python are capable of handling (input and output) excel files.

ADD REPLYlink modified 6 days ago • written 6 days ago by cpad01127.6k

Excel is most of the time not appropriate for data analysis. Only if you understand very well what you are doing...

For example it is not reproducible and doesn't track what you are doing. You cannot figure out what you did to a dataset months ago and you can silently change data without ever noticing. Keep your data in a normal file and use bash/R/Python/... and save your commands. That's reproducible and you can easily explain what happened and how you obtained a result.

ADD REPLYlink written 6 days ago by WouterDeCoster30k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1108 users visited in the last hour