Entries in Computing (187)

Thursday
Jul112019

How to make a PDF document searchable

I had a large PDF file that I wanted to be able to search quickly but it just contained images of text, not the actual text itself, so all searches in it failed.

I use Ubuntu so I reasoned that there should some solution for me available out on the Web.  There was.  This is the one I chose:

Install pdfsandwich:

sudo apt-get install pdfsandwich

Run pdfsandwich on the file you want to become searchable:

pdfsandwich test.pdf -o test-searchable.pdf -nthreads 12 -first_page 5 -last_page 290

Here I use the -nthreads option to prevent pdfsandwich from locking up my computer by using all of its 16 processors.  I also use the -first_page and -last_page options to exclude the table of contents and the index from the searchable area (these just produce unnecessary duplicate search results).  The -o option specifies the name of the output file.

The resulting file was less than a tenth the size of the original (20MB instead of 350MB) so the text and images were more grainy but they were still easily readable. And being able to search it quickly makes the document much more useful to me than the original was.

Saturday
Jan142017

Functional Programming Talks

In the past few months the following lectures have greatly helped my understanding of how algebraic structures can be used to design and construct functional programs:

Saturday
Mar192016

How to Reverse the Page Order in a PDF File

If someone scans the pages of a document in the wrong order then the resulting PDF file will have its pages in reverse order. On Linux this is very easy to fix. First install pdftk (if necessary):

  sudo apt-get install pdftk

Then enter:

  pdftk file1.pdf cat end-1 output file2.pdf

where file1.pdf is the name of your input file and file2.pdf the name of the ouput file.

(Thanks to emilien at Stack Overflow)

Sunday
Jan122014

Tim Bray on Javascript

JavaScript is horrible.

> [5, 10, 1].sort();
[ 1, 10, 5]

Et cetera.

From here

Saturday
Mar162013

RVM SSL Certificate Errors

The Ruby Version Manager RVM is one of the key components in a Ruby development environment on non-Windows platforms.

Yesterday I wanted to install it on my Ubuntu laptop.  The recommended way of doing is to run the following command in a terminal:

curl -L https://get.rvm.io | bash -s stable

Unfortunately this gave SSL certificate verification errors.  Visiting rvm.io in Firefox and Chrome indicated that this was due to the site's security certificate having expired.

A little Googling revealed the recommended workaround which is to use Gihub instead of rvm.io:

curl -L https://raw.github.com/wayneeseguin/rvm/master/binscripts/rvm-installer | bash -s stable