Creating an eBook PDF from your WordPress Posts Using Linux and WordPress Tools Only
Here is the research I did to find out how – using Worpress and what is free and repository available in linux – I could use to create a reasonably professional looking summary document of my Valve Amp Posts with a view to learning what is involved with PDF merging and how to overcome some of the hurdles one may face in getting your Posts formatted how you want and collated in the right order.
If you just want to create a PDF from another source document, Libre has an Export as PDF option.
The first step was to find a decent WordPress plugin to download the Posts in PDF format. This was a bit of a plugin trial and error farce for a while.
I finally found Print Friendly PDF which did a great job, with the added bonus of allowing your viewers to DL or Email a Post for themselves using a button you place on the Post:
This gives a nice preview before you DL it.
Set up the plugin button position and other features in Setup.
The formatting done by this plugin is slick, but make sure your Post is formatted well before the download – centre your pics etc. – else they will be offset permanently after like this:
Try and homogenise your fonts across your Posts also, so PF&PDF can have an easier time of creating a consistent look to your final PDF for font size. Colourisation and YouTube links will be lost, but an overall site web link will be inserted back to the original Post:
You can research other involved options for re-inserting sound and vision in PDFs with other software yourself – Libre has an option for media insertion too if you open a PDF but I didn't use it to edit at this stage.
Now for the important part!
To save you having PDF re-arrangement hassle later, think carefully of the Post title alpabetical order or in terms of time stamp that you download your Posts as it may cause you a headache later in sorting the files so pdfunite can merge the separate Post PDFs together in the final eBook.pdf.
For example, I wanted my Posts to run in the same order they ran on the site – oldest to newest, top to bottom – so these could be DL'd in that same order, but with the Main Page to be added at the start of the book, so – in retrospect – needed to be the FIRST download, so it had the oldest time stamp on my laptop, but I got the Posts first…
Also…I missed one Post out, which gave me a time stamp/title sorting problem later! I'll show what command options I found to fix that below.
Shown on the WordPress Posts page there are 43 Posts plus the Main Page to be DL'd for 44 PDFs total to be merged into a final ebook:
I practised with pdfunite to see how well it stitched PDFs together, but found it complained re "damaged files" because of white spaces in the file names.
pdfunite – Portable Document Format (PDF) page merger
pdfunite [options] PDF-sourcefile1..PDF-sourcefilen PDF-destfile
pdfunite merges several PDF (Portable Document Format) files in order of
their occurence on command line to one PDF result file.
It takes a specific input format where the first file named will be the first in the final document and the last filename on the command line will be the final merged PDF. This means that to merge all 44 files together in one command, there has to be use of the * and a suitable listing method to use the timestamp order, as the names alone are no use here.
Now for the sorting issues. The default listing of ls is alphabetical, so this may not be any good at all as I found. My downloaded Post PDF files names all began "quad-" followed by the Post Title e.g:
quad-1962 Fender Tremolux Repair.pdf
quad-1969 Marshall JMP 100W Buyers Info.pdf
First, they need the white space removed from all names as in sed tradition, so if you have them all in their separate directory, you can cd into it and run:
rename "s/ //g" *
A listing by timestamp shows the files in the wrong top down order that I want them merged:
stevee@AMD ~/Documents/AmpDocs $ ls -t
But the reverse switch -r fixes that.
$ ls -tr
Also, what about the file I missed that has a much later timestamp? It was the 2nd of two consecutive Posts.
That was fixed using the format:
touch -d "Jun 22 18:03" file.txt
so I changed it's stamp to 1 minute later than the previous:
ls -l quad-Laney*P*
-rw-r–r– 1 stevee stevee 1368722 Jun 22 18:02 quad-LaneyKlipp100WRepairandDesignStudyPart1.pdf
-rw-r–r– 1 stevee stevee 1286586 Jun 22 18:03 quad-LaneyKlipp100WRepairandDesignStudyPart2.pdf
Now I can list the Posts in the correct order I want, but how do I feed all these files as input to pdfunite when it needs a final file to merge to all on the same command line? I struggled with this for a while as there is no easy way to pipe the output of ls AND the final file as arguments to pdfunite using " | " and/or < or > ; but googling found this $ based solution:
pdfunite $(ls -tr quad-*) AllPosts.pdf
All that was missing from the final document list is the Main Page document I downloaded after (later timestamp) which can just be prepended to the newly created AllPosts.pdf, ensuring the right order so the Main Page Intro is at the begining of the final document:
pdfunite OldValveAmpandElectricalTechPosts.pdf AllPosts.pdf AllValveAmpPosts.pdf
Now for the tedious part!
The default Document Viewer view may be a left pages thumbnail column, so change it to Bookmarks and then create and rename a new Bookmark by copying/pasting the Post/Chapter Title by right clicking and renaming the page number as you scroll down the document. Save a copy periodically as you go: