A DIY guide to EPUB text corrections

Teaching yesterday at the PTC I realised how exciting it when people realise that an EPUB is not all that different from a website - which is pretty familiar.

What's inside an EPUB

So, given that an EPUB is just a ZIP file with a bunch of (X)HTML files with some CSS, pictures (if your book has them), and some standard machine-generated metadata, anyone can edit an EPUB. All you need is a text editor and access to the epubcheck utility.

Cracking open an EPUB

Even if all you have at work is a locked-down Citrix-type system and Windows Notepad, you can change the extension from .epub to .zip and edit the files inside. It's even easier if you have a piece of software meant for editing HTML and CSS. If you are using Windows, Notepad++ is a good, free editor that suports useful things like search and replace across multiple files. On a Mac there are free options and paid options. If you are using free software, you should take a look at TextWrangler. I use TextMate, which isn't free but isn't expensive either.

If you are working on a Mac, you may have some issues with unzipping files. Sadly, there isn't any competent equivalent to WinZip or JZip available for Macs. You can use the built in unzipping facilities of the Mac to extract EPUB files and I recommend that you use EPUB-Zip to rebuild them. 

I'm going to deal with a regular EPUB2 file here, nothing fancy. So, once you've opened up your EPUB, you will see :

  • HTML files, one for each chapter and/or part from your book (in an EPUB you force a new page by using a new HTML file) 
  • a CSS file - this is your stylesheet (to see what effect a stylesheet has, take a look at epubzengarden)
  • one or more image files, ending in .jpg, .png, .gif or .svg - you probably have at least one (the cover) 
  • a file called "toc.ncx" (navigation structure)
  • a file called "content.opf" (or something similar - the key is the .opf extension - this is the book's metadata) 
  • a folder called "META-INF"  (files for container-level metadata, content, and encryption)

Edit the files

Now you can edit the HTML and CSS files as you would usually do for a website. This is particularly useful for editors or authors who want to make text corrections to their book.

If you are editing the HTML or CSS itself, remember to take into account that the EPUB only supports a subset of these languages, so be careful (we will deal with this in a later article if there is interest).  


Once you've edited your files and zipped them back up (ensuring that the file extension is changed back to .epub), you need to make sure that your EPUB is valid. You do this by using epubcheck. If you only have a few EPUB files you want to validate, use the IDPF's simple epubcheck website: http://validator.idpf.org/ . If you have a lot of files to validate, run it locally on your PC or Mac on the command line. 

And that really is all that's to it!