extract the rows from multiple excel files and print the output as the same name of input files - Shell script
2
0
Entering edit mode
2.3 years ago
Baka • 0

Hello,

I have 336 excel files and each excel file has 25 columns. I have to extract the rows from column 17 which do not have “efflux pump” (for example: extract the other rows with full entries which do not have efflux pump) and print the output as the same name of input files.

Kindly help regarding this. Thank you.

EDIT:

Below is example data from 1 file:

I have to extract kdpDE, pmr phosphoethanolamine transferase and its corresponding data and excludes entries which have efflux pump from column 17 (Q in the excel sheet).

enter image description here

linux shell-script excel • 4.3k views
ADD COMMENT
2
Entering edit mode
2.3 years ago

Try csvtk which has some subcommands that can be piped to accomplish the operations you want.

  • xlsx2csv, convert XLSX to CSV format
  • grep, grep data by selected fields with patterns/regular expressions
  • cut, select and arrange fields
  • csv2xlsx, convert CSV/TSV files to XLSX file

For batch proccessing, use GNU parallel, rush or other tools.

ADD COMMENT
2
Entering edit mode
2.3 years ago

Please do not delete post once you got suggestions. Here is another solution:

Please follow the instructions:

  1. Create a directory test
  2. Copy 5 excel files (not move) to the test directory
  3. Save following bash script in a file called 'script.sh' and place it inside test directory
  4. Make the script executable ( chmod +x script.sh) and execute the script (./script.sh) in test directory.
  5. A directory by name new will be created. Within new directory, 5 xls files will be present.
  6. Carefully check the output and see if it is as per your requirements.
  7. Run it on all excel files. Take a back up all xls files before you execute the script
  8. To run the script, you would need libreoffice (calc) installed on your machine.

Script:

#! /usr/bin/env bash

mkdir -p new
for i in *.xls;
    do libreoffice --headless --convert-to csv $i --outdir new ;
    awk -F "," '$17 !~ /efflux pump/ {print $17}' new/${i%.*}.csv  > ${i%.*}.csv ;
    libreoffice --headless --convert-to xls ${i%.*}.csv  --outdir new ;
    rm ${i%.*}.csv;
    rm new/${i%.*}.csv;
done
ADD COMMENT
0
Entering edit mode

Thank you very much @cpad0112

ADD REPLY

Login before adding your answer.

Traffic: 2426 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6