Converting a .pdf document into an .epub document can be done, but there is no easy way. I really wish that there was because I quite often have to convert a client’s manuscript in .pdf format to an .epub/.mobi for upload to the online bookstores. There are a number of software packages that claim to be able to do this. I’ve tried them all and they all do a terrible job. If you Google “pdf to epub” you’ll get a list of these software packages. If you are curious, I invite you to try any of the packages (they usually have trial versions) and view how badly they scramble your document. If any of these packages worked, it would make my life a lot easier because I do epub conversion for a living. I would be using it, believe me.
One of the main reasons that there is no instant way to convert a .pdf to .epub (with an .epub document that actually resembles the initial .pdf document in any way what-so-ever) if that an .epub file is actually a mini web site. Just like a web site, an .epub file contains pages of XHTML content code, a cascading style sheet that controls all styling and formatting, and a folder containing all images or links to all images.
A .pdf file cannot be converted directly into a web site nor can it be converted directly into an .epub, which is also a mini web site.
Probably the most important reason that .pdf files are darn hard to properly convert to any other format is that .pdf files contains none of the original formatting information that was in the initial document (for example, a Word file). This includes all of the basic building blocks of formatting such as line breaks, paragraphs, headers/footers, and columns. All of this information is destroyed when the .pdf is created.
The .pdf file is the last stop on the conversion train. All of the basic formatting information is destroyed when the .pdf is created. The initial Word document and the final .pdf document will look exactly the same, but the .pdf document now has only coordinates about where and how each object should be displayed on the page, but no longer contains any of the original formatting info.
What this means for anyone converting a .pdf to .epub is that there are no shortcuts. You need to go back and manually put all of that formatting information back in. When clients ask me to convert an .pdf document into an .epub, I always ask if the client has the original document in another format, such as Word or InDesign, which will contain all of the formatting information.
I create all of my .epub documents using InDesign or an HTML editor like Dreamweaver or Microsoft Expression Web (my favorite). I prefer using an HTML editor because I have direct contact with and total control over the XHTML and the CSS. The HTML editor provides me with the greatest ability to customize an .epub document.
Here is how I prepare a .pdf file for insertion into an HTML editor or Adobe InDesign. As I mentioned, there is no easy, automated, short-cut way to do this correctly. If there was, I would be doing it.
The first step is to extract all of the text out of the .pdf file. You can open the .pdf file in Adobe Reader and click Edit / Select All. After all of the text is highlighted, click Edit / Copy. Open up a text editor such as Notepad and paste all of that text in.
Probably one of the first things you will notice is that original line breaks that were in the Word document (before it was converted to .pdf) are no longer there. The .pdf placed a line break (a carriage return) at the right side of each printed line. You will have to go through the entire document and remove ALL of the carriage returns that do not belong. This is a huge PITA but there is no way around this. This is the first step in returning the formatting information that was destroyed when the .pdf was created.
The next step is to separate each paragraph from the others. In the text file, you can do this by hitting a carriage return at the end of each paragraph. This will create a space between paragraphs so you can visually identify the paragraphs.
Once you have the line breaks installed correctly and all paragraphs separated from each other, you are ready to drop all of that text into Adobe InDesign or the HTML editor.
Several other articles in this blog describe how to create an .epub file using Adobe InDesign or an HTML editor. These articles can guide you from here.
The basic steps you’ll be taking from here are to paste all of this text into either Adobe InDesign or an HTML editor and add all other formatting and styling. In Adobe InDesign, you’ll do that by applying Character or Paragraph styles to the text. In an HTML editor, you’ll be applying CSS styles to the text.
As I mentioned, I really, really do wish there was an easier way to do this and get a professional result. The creation of a .pdf file destroys all of the formatting information that is essential information for creating an .epub. You have to remove the text from the .pdf file and manually add all of that formatting information back in.