Question: Split excel column into different files
1
gravatar for goh
9 weeks ago by
goh10
goh10 wrote:

Hi,

I have been trying to use multiple command i can find online to spilt the columns into different files but couldnt.

I have an 10 columns excel file ( first 2 col is paramaters, the remaining 8 col is the data). I would like to split it into 8 separate files which all 8 contains first 2 col paramaters and 1 data col.

May i know does anyone have any idea to get around this ?

Thank you.

excel • 226 views
ADD COMMENTlink written 9 weeks ago by goh10
1

By the time you found the solution you would have easily done this manually. Or is this a task which you have to do repeatedly?

The link with bioinformatics is not immediately obvious. Please elaborate.

Excel is not our 'favorite' tool but if you would convert the file to a text file you would have plenty of options.

ADD REPLYlink written 9 weeks ago by WouterDeCoster32k

Its a task i have to do it repeatedly on many different files. I have posted the example above.

I dont mind to convert the excel file into other types of file to get the job done, and then convert back to excel.

ADD REPLYlink written 9 weeks ago by goh10
1

@OP: You don't have to convert excel into other formats if your excel files are manageable in size and consists of uniform format. Both R and Python are capable of handling (input and output) excel files.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by cpad01128.8k

Excel is most of the time not appropriate for data analysis. Only if you understand very well what you are doing...

For example it is not reproducible and doesn't track what you are doing. You cannot figure out what you did to a dataset months ago and you can silently change data without ever noticing. Keep your data in a normal file and use bash/R/Python/... and save your commands. That's reproducible and you can easily explain what happened and how you obtained a result.

ADD REPLYlink written 9 weeks ago by WouterDeCoster32k
3
gravatar for cpad0112
9 weeks ago by
cpad01128.8k
India
cpad01128.8k wrote:

Good description of data. Please post some example data and expected output.@OP Export the text as tab separated file and install shell (bash preferably), cut in windows or wsl if you are using windows 10, try running following code with exported test.txt:

$ for i in $(seq 3 10); do  cut -f1,2,$i test.txt > file_$i.txt;done

using parallel (one needs to install gnu-parallel for this):

$ parallel  'cut -f1,2,{} test.txt > test_{}.txt' ::: {3..10}
ADD COMMENTlink modified 9 weeks ago • written 9 weeks ago by cpad01128.8k

Hi,

Thanks. will try it out. The example is here. column B and E is my parameters. column K until R is my data which i would like to separate it into individual files. Thanks!

ADD REPLYlink written 9 weeks ago by goh10
1

Here is the code in R: Make sure that all for the files (excel files with .xls or .xlsx) are in the same format as in link furnished above (2nd column is mz, 5 is RT and 11th column onward samples). Code would create one excel file per sample and name of the excel file would be sample_"sample name".xls. It will have three columns: mz, RT and sample name.

library(readxl)
library(WriteXLS)
df=data.frame(read_xls("example.xls"))

for (i in 11:ncol(df)){
    temp_df=df[,c(2,5,i)]
    WriteXLS(temp_df, ExcelFileName = paste0("sample_",names(df)[i],".xls"))
}

if you have multiple xls files in the folder (and only xls files of interest):

setwd("~/Desktop/test") # change this to directory of interest
fn=list.files(pattern = "\\.xls") # lists files with .xls extension
library(readxl)
library(WriteXLS)

for (i in fn){
    df=data.frame(read_xls(i))
    for (j in 11:ncol(df)){
        temp_df=df[,c(2,5,j)]
        WriteXLS(temp_df, ExcelFileName = paste0(sub('\\.xls$', '', i) ,"_","sample_",names(df)[j],".xls"))
    }
    }

output files will have "xlsfilename_samplename.xls"

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by cpad01128.8k
1

in gnu-linux, one can do this from shell (and with the help of libreoffice):

for i in *.xls;
    do libreoffice --headless --convert-to csv $i --outdir .;
    nf=$(awk -F"," '{print NF;exit}' ${i%.*}.csv);
    for j in $(seq 11 18);
        do nf_head=$(awk -v j=$j -F"," '{print $j;exit}' ${i%.*}.csv); 
    cut -d"," -f 3,5,$j ${i%.*}.csv > ${i%.*}_$nf_head.csv;
    libreoffice --headless --convert-to xls ${i%.*}_$nf_head.csv  --outdir .;
    rm ${i%.*}_$nf_head.csv;
    done;
done

Since it starts with *.xls and writes output to xls files, be careful in running the script second time if there are any issues.

ADD REPLYlink modified 9 weeks ago • written 9 weeks ago by cpad01128.8k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1402 users visited in the last hour