2

I have multiple excel files in this format.
They are clock in and clock out date and time.

---------------------------------
| Name     | Time               |
---------------------------------
| Person A | 03-Jul-17 8:15 AM  |
| Person A | 03-Jul-17 10:32 AM |
| Person A | 03-Jul-17 1:56 PM  |
| Person A | 03-Jul-17 6:15 PM  |
| Person A | 04-Jul-17 8:29 AM  |
| Person A | 04-Jul-17 8:58 AM  |
| Person A | 04-Jul-17 9:43 AM  |
| Person A | 04-Jul-17 1:03 PM  |
| Person A | 04-Jul-17 2:17 PM  |
| Person A | 04-Jul-17 5:58 PM  |
.
.
.
| Person A | 31-Jul-17 7:45 AM  |
| Person A | 31-Jul-17 8:10 AM  |
| Person A | 31-Jul-17 3:26 PM  |
| Person A | 31-Jul-17 7:29 PM  |
---------------------------------

I would want to extract this data and save it as a new excel file in this format:

---------------------------------------------
| Name     | Date      | Time In | Time Out |
---------------------------------------------
| Person A | 03-Jul-17 | 8:15 AM | 6:15PM   |
| Person A | 04-Jul-17 | 8:29 AM | 5:58PM   |
.
.
.
| Person A | 31-Jul-17 | 7:45 AM | 7:29PM   |
---------------------------------------------

Basically it's to arrange the data one entry per date with the earliest time for that date as the Time In and the latest time for that date as the Time Out.

There are multiple excel files in this format, and doing it manually is going to take too long.

If you want to convert them to .csv first edit then convert them back to .xlsx, it's cool.

PS: Bounty of 200 rep up for grabs.

Parto
  • 15,027
  • 24
  • 86
  • 114
  • cc @jacob-vlijm – Parto Aug 01 '17 at 07:39
  • Haha, thanks for trusting me like that :). I think you deserved your reputation, and if I post an answer, if it is useful, that is enough for me :). There are nice python libraries to edit excel files directly, but I never worked with it yet. I need to finish something urgently, and not sure where I will be after that, so I am not sure you should wait for my version of an answer :) – Jacob Vlijm Aug 01 '17 at 07:46
  • Hey @JacobVlijm you been a long time friend, that's why. It will be cool though to have your answer - I will wait. But since this is needed urgently, let me see if I can have something working in the meantime. – Parto Aug 01 '17 at 08:12
  • Don't wait! It could be over a month or so, in the meantime, please consider accepting a good answer! – Jacob Vlijm Aug 01 '17 at 08:13
  • Just a question, I assume you formatted the second column as text in excel? – Jacob Vlijm Aug 01 '17 at 10:23

3 Answers3

2

There are 2 scripts you need. The one to convert from XLS to CSV is a command xls2csv and the other one a script from github: csv2xls (another csv2xls). There are also csv2xlsx (and another csv2xlsx).

In between the 2 conversions you can edit the files using your favorite tool.

If you want to do it yourself: xlsx files (and the same applies ODT (open/libeoffice) are zipped archives and contain an XML with the data. You can unzip then and the data is in an XML. Manipulating the XML is a bit more difficult that a CSV sure but when the manipulating is automated it becomes rather efficient.

Rinzwind
  • 293,910
  • 41
  • 570
  • 710
1

I converted the files to csv and used a PHP script to parse the content creating it into how I actually wanted. The result was then saved in a new file then those files were converted back to xls then merged into one notebook.

The conversion and merging part was done manually. It's not the best solution but it's working for now.

Here's the script:

// get list of files from data directory
$files = array_diff(scandir('./data'), array('.', '..'));
foreach($files as $file):

    // get all data from the csv file and save in the $data array
    $csvFile = file('data/'.$file);
    $data = $list = [];
    foreach($csvFile as $line) {
        $data[] = str_getcsv($line);
    }
    unset($data[0]);

    // parse the data array and get the different sections: name. date and time
    foreach($data as $v) {
        $date = strtotime($v[1]);
        $list[date('d-m-Y',$date)][] = array(
            'name'=>$v[0],
            'date'=>date('d/m/Y',$date),
            'in'=>$date
        );
    }

    // create a new array and save parsed data in it with header columns
    $new = array(array('Name','Date','Time In','Time Out'));
    foreach($list as $k => $v) {
        $out = max(array_column($v, 'in'));
        $name = $v[0]['name'];
        $new[] = array(
            'name'=>ucwords(strtolower($name)),
            'date'=>$v[0]['date'],
            'in'=>date('h:i A', $v[0]['in']),
            'out'=>date('h:i A', $out)
        );
    }

    // The name of a new file in the new directory using this file name
    $filename = str_replace('.csv', '', basename($file));
    $fn = strtolower($filename.'-log.csv');

    // open the file and output the new array as CSV
    $out = fopen('new/'.$fn, 'w');
    foreach($new as $l) {
        fputcsv($out, $l, ",",'"');
    }
    fclose($out);

endforeach;
Parto
  • 15,027
  • 24
  • 86
  • 114
0

Thanks to @rinzwind I managed to create the following bash script that unzips the xlsx, uses sed to replace certain strings to something else, and then re-zip it again. The script sets autocalculate on, so that if you have any formulas, they are also updated (by default not).

#!/bin/bash
if ( [ -z $1 ] || [ -z $2 ] ); then
    echo "Usage: create-xlsx.sh <PLACEHOLDER_A> <PLACEHOLDER_B>"
    exit 1
fi

DIR=/tmp/mydir
rm -rf $DIR
mkdir -p $DIR
cd $DIR
#Note! xlsx is a zip-file
#Note! Original xlsx needs to have autocalculate set
#See https://stackoverflow.com/questions/18355691/set-xlsx-to-recalculate-formulae-on-open
unzip -d $DIR /path/to/original.xlsx
sed -i "s/calcPr iterateCount=\"100\"/calcPr calcMode=\"auto\" fullCalcOnLoad=\"1\" iterateCount=\"100\"/g" $DIR/xl/workbook.xml
sed -i "s/PLACEHOLDER_A/$1/g" $DIR/xl/worksheets/sheet1.xml
sed -i "s/PLACEHOLDER_B/$2/g" $DIR/xl/worksheets/sheet1.xml
zip -r /tmp/output.xlsx *
PHZ.fi-Pharazon
  • 269
  • 2
  • 4