Skip to content Skip to sidebar Skip to footer
Showing posts with the label Lxml

Web Page Scraping Gems/tools Available In Ruby

I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the pr… Read more Web Page Scraping Gems/tools Available In Ruby

Parsing Html Table Using Python - Htmlparser Or Lxml

I have a html page which consist of a table & I want to fetch all the values in td, tr in that … Read more Parsing Html Table Using Python - Htmlparser Or Lxml

Getting More Granular Diffs From Difflib (or A Way To Post-process A Diff To Achieve The Same Thing)

Downloading this page and making a minor edit to it, changing the first 65 in this paragraph to 68:… Read more Getting More Granular Diffs From Difflib (or A Way To Post-process A Diff To Achieve The Same Thing)

What’s The Most Forgiving Html Parser In Python?

I have some random HTML and I used BeautifulSoup to parse it, but in most of the cases (>70%) it… Read more What’s The Most Forgiving Html Parser In Python?

Python Lxml Changes Tag Hierarchy?

I'm having a small issue with lxml. I'm converting an XML doc into an HTML doc. The origina… Read more Python Lxml Changes Tag Hierarchy?

Lxml: Cannot Import Etree

I went to this page and downloaded the tar file : http://pypi.python.org/pypi/lxml/2.3.4#downloads … Read more Lxml: Cannot Import Etree

How To Match A Text Node Then Follow Parent Nodes Using Xpath

I'm trying to parse some HTML with XPath. Following the simplified XML example below, I want to… Read more How To Match A Text Node Then Follow Parent Nodes Using Xpath

Extracting P Within H1 With Python/scrapy

I am using Scrapy to extract some data about musical concerts from websites. At least one website I… Read more Extracting P Within H1 With Python/scrapy