sudo apt-get install pdftohtml
sudo apt-get install lynx OR sudo apt-get install elinks
lynx filename.html OR elinks filename.html
OR for just text
pdftotext filename.pdf less filename.txt
open source pdf reader and utilities we can easily move from pdf to html (e.g. you have a pdf that won't reflow properly on your handheld device)
sudo apt-get xpdf
this should install not only the xpdf reader but the xpdf-utils
xpdf filename.pdf //find the first and last page you want by browsing the document
pdftohtml -c -f 157 -l 299 -nodrm filename.pdf output.html
-c = complex = export images to image files -f = first page -l = last page -nodrm = remove any digital rights management stuffs note the output file name is optional - by default it will output the source filename-page#.html
"pdftotext" is a similar tool "pdftohtml -h" show the help
UNFORTUNATELY it makes each page a single html document... (about the same size as the original pdf) there's a nice "index" feature, filename_ind.html and a "outline" (like table of contents)
So, manually merging them from the bash command line isn't fun and makes a big file but it works...
cd /path/to/ouput/html/and/images mkdir merge cat output_filename-.html >> output_filename_merged.html mv output_filename_merged.html merge mv .png merge now your "merge" directory is a self contained single file of the pdf
//unfortunately the above seems to run into some funny html formatting problems (weird large text) //i've found it was due to the last exported page... so by rm the last page number and then running the //above command it works very well...
using another gnu utility that is especially made for "downloading" htmls and doing stuff.. //wget recursive only 1 level wget -r -l1 -k -O output_merged.html http://localhost/output_fillename_ind.html
The above won't work unless you have a web server installed OR you could upload all of them to a website (easy way to share a huge pdf for someone to read remotely)...
Finally, yet another alternative is to download the windows pdf2html gui (which includes the pdftohtml.exe) http://www.divshare.com/download/4115853-20e You'll also need WINE as this is a windows app... The whole pdf2html directory is "portable" BUT you need ghostscript (gswin32c.exe) http://pages.cs.wisc.edu/~ghost/
(in my case I've a dual boot so after ntfs-3g /dev/hda1 /mnt/windows I could cp -a /mnt/windows/Program\ Files/gs/gs8.63 /home/username/pdf2html
wine /home/username/pdf2html/pdf2htmlgui.exe The above will prompt you for the location of the (windows) pdftohtml.exe (which should be in the .39 subdir) AND the gswin32c.exe (in pdf2html/gs8.63/bin subdir)
I chose "complex" which means images too!
Note that after running your wine CLI will display a bunch of output (e.g. page1 page2 etc.) and will finally seem to hang - it's still running but it's processing the images... wait until the pdf2htmlgui reappears!
The above will produce exactly the same output as the linux xpdf util "pdftohtml -c"...