So another day of reading Paradise Lost which, I admit, when read correctly is a very good work of literature. Looking into notes to supplement my reading of Paradise Lost, I found the comprehensive Barron’s Book Notes on Paradise Lost. However, as you may see on the TOC page, I would have to do a lot of clicking to read through the notes! This is unacceptable! Also, what if I wanted to place the notes on my calculator for reading? I would have to copy and paste a lot of text.
Therefore, like any good Computer Scientist, I decided to use the computer to accomplish the task of merging all of the pages into one note file. First, I used wget to quickly download all of the html pages. Whew! That’s half of the work! Now, I had to extract the text from the 90+ messy HTML files. This is where Python programming comes in :).
I wrote BarronNotesExtractor.py (3.43KB) which does the job of extracting text from the HTML pages and placing them in one nifty text file. I used a lot of pattern matching which could have probably been done much better by a Python guru (which I am not). Nevertheless, the script accomplishes its task very quickly (~3 seconds)! No more clicking links and watching ads!
Now back to work!