Thursday
Jul112019

How to make a PDF document searchable

I had a large PDF file that I wanted to be able to search quickly but it just contained images of text, not the actual text itself, so all searches in it failed.

I use Ubuntu so I reasoned that there should some solution for me available out on the Web.  There was.  This is the one I chose:

Install pdfsandwich:

sudo apt-get install pdfsandwich

Run pdfsandwich on the file you want to become searchable:

pdfsandwich test.pdf -o test-searchable.pdf -nthreads 12 -first_page 5 -last_page 290

Here I use the -nthreads option to prevent pdfsandwich from locking up my computer by using all of its 16 processors.  I also use the -first_page and -last_page options to exclude the table of contents and the index from the searchable area (these just produce unnecessary duplicate search results).  The -o option specifies the name of the output file.

The resulting file was less than a tenth the size of the original (20MB instead of 350MB) so the text and images were more grainy but they were still easily readable. And being able to search it quickly makes the document much more useful to me than the original was.

Sunday
May262019

Stag Beetle

A male stag beetle, Lucanus cervus (Coleoptera: Lucanidae).

This specimen was taken on The Warren, Caversham, UK on 2019-05-25 and was released after being photographed.

Saturday
Apr272019

Great Tit

A dead great tit, Parus major

Found on the staircase of our flats. It must have got in through one of the open windows but then panicked and crashed itself into a closed windows while trying to get out again.

Three years ago a nuthatch died in the same way.

Photos taken on 2019-04-27 in Reading, UK.

Monday
Apr012019

Sciomyzid Fly

A male Sepedon sphegea (Diptera: Sciomyzidae).  Identified using the key of Rozkosny, 1984.

Specimen taken beside the Jubilee River, near Taplow, UK on 2019-03-30.

Saturday
Feb162019

Sepsid Flies

A male Themira superba (Diptera: Sepsidae).  Identified using the key of Pont & Meier, 2002.

This is a female:

Specimens taken in Whiteknights Park, Reading, UK on 2018-06-02.