| Document Conversion with UnoconvKrazyWorks

Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Featured, Files

Document Conversion with Unoconv

Submitted by Igor on August 4, 2018 – 4:08 pm

The other day I ran into the “Flexible Import/Export” article by Bruce Byfield in the March 2018 issue of Linux Pro Magazine and thought it could use some more detail. So here’s some more detail.

The unoconv utility is a part of LibreOffice. All examples below were ran on RHEL 7.3. The first step is to get the latest version of LibreOffice. This is not necessary, but may save you some time and aggravation.

Remove any existing installations of LibreOffice and install the latest stable release from one of the project’s mirrors:

yum -y remove openoffice* libreoffice*
cd && v="6.0.4" && wget http://ftp.utexas.edu/libreoffice/libreoffice/stable/${v}/rpm/x86_64/LibreOffice_${v}_Linux_x86-64_rpm.tar.gz
tar xfz LibreOffice_${v}_Linux_x86-64_rpm.tar.gz
cd LibreOffice_${v}*_Linux_x86-64_rpm/RPMS
yum -y install *rpm

Now, download a more up-to-speed version of unoconv and replace the one that came with your LibreOffice installation. Once again, this is not necessary, but is a good idea.

cd && git clone https://github.com/dagwieers/unoconv.git
/bin/cp -pf unoconv/unoconv /usr/bin

Add a startup file for the unoconv listener and add an appropriate selinux rule, if your system is using selinux.

cat << EOF > /etc/systemd/system/unoconv.service
[Unit]
Description=Unoconv listener for document conversions
Documentation=https://github.com/dagwieers/unoconv
After=network.target remote-fs.target nss-lookup.target

[Service]
Type=simple
Environment="UNO_PATH=/usr/lib64/libreoffice/program"
ExecStart=/usr/bin/unoconv --listener

[Install]
WantedBy=multi-user.target
EOF

systemctl enable unoconv.service
systemctl start unoconv.service

f=/etc/sysconfig/selinux
if [ -f "${f}" ] && [ "$(grep -oP "(?<=^SELINUX=)[a-z]{1,}(?=$)" "${f}")" != "disabled" ]; then
setsebool -P httpd_execmem on
fi

Now with the installation out of the way, here come the examples.

# Convert DOCX to PDF
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx"

# Convert DOCX to password-protected PDF
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" -e EncryptFile=true -e DocumentOpenPassword=admin123 "${i}.docx"

# Convert pages 2-3 of DOCX to PDF that cannot be printed unless permissions are unlocked using a password
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" -e EncryptFile=true -e Printing=0 -e RestrictPermissions=true -e PermissionPassword=admin123 -e PageRange=2-3 "${i}.docx"

# Convert multiple Word documents in the current directory to PDF
find . -maxdepth 1 -mindepth 1 -type f -regextype posix-extended -regex '^.*\.(docx|doc)$' | while read i; do unoconv -f pdf -o "./output/${i}" "${i}" 2>/dev/null; done

# Convert DOCX to multiple JPG
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx"
j=$(strings < "./output/${i}.pdf" | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*||p' | sort -rn | head -n 1)
k=1
while [ ${k} -le ${j} ]; do
unoconv -f pdf -o "./output/${i}_page_${k}.pdf" -e PageRange=${k}-${k} -e UseLosslessCompression=true "./output/${i}.pdf"
unoconv -f jpg -o "./output/${i}_page_${k}.jpg" -e Quality=94 "./output/${i}_page_${k}.pdf"
/bin/rm "./output/${i}_page_${k}.pdf"
(( k = k + 1 ))
done

# Convert DOCX to multiple JPG of specified resolution and dimensions
# Requires 'convert' utility: 
# yum -y install ImageMagick
i="Document Name"; unoconv -f pdf -o "./output/${i}.pdf" "${i}.docx"
j=$(strings < "./output/${i}.pdf" | sed -n 's|.*/Count -\{0,1\}\([0-9]\{1,\}\).*||p' | sort -rn | head -n 1)
k=1
while [ ${k} -le ${j} ]; do
unoconv -f pdf -o "./output/${i}_page_${k}.pdf" -e PageRange=${k}-${k} -e UseLosslessCompression=true "./output/${i}.pdf"
convert -density 400 "./output/${i}_page_${k}.pdf" -resize 2000x1500 "./output/${i}_page_${k}.jpg"
/bin/rm "./output/${i}_page_${k}.pdf"
(( k = k + 1 ))
done

# Convert XLSX to CSV
# Limitation: only the first sheet is converted
i="Spreadsheet Name"; unoconv -f csv -d spreadsheet -o "./output/${i}.csv" "${i}.xlsx"

With soda software, you can find additional options for the unoconv utility’s PDF import/export functionality here. There was some talk about adding a command-line option to unoconv to allow the user to specify the sheet name or number during the conversion of a multi-sheet spreadsheet.

I don’t know if anything came out of this. I was not able to find a version of unoconv with this capability. So not to leave this question unanswered, here’s how you can use xlsx2csv tool to work with multi-sheet spreadsheets.

# Convert XLSX to CSV using xlsx2csv
# https://github.com/dilshod/xlsx2csv
# Install xlsx2csv
cd && git clone https://github.com/dilshod/xlsx2csv.git && cd xlsx2csv && /bin/cp -p xlsx2csv.py /usr/bin/xlsx2csv

# Convert sheets 1-10, remove empty or non-existent sheets
for j in `seq 1 10`; do xlsx2csv -s ${j} ${i}.xlsx "./output/${i}_sheet_${j}.csv" 2>/dev/null; if [ $? -ne 0 ] || [ ! -s "./output/${i}_sheet_${j}.csv" ]; then /bin/rm -f "./output/${i}_sheet_${j}.csv"; fi; done

# Convert all sheets. This will create a subfolder with CSV files named after every sheet
xlsx2csv -a ${i}.xlsx "./output/${i}"