davehansen’s posterous

Filed under

python

 

First python script

My first functional Python script. Grabs the 'src' attribute of a specific image on a specific page, using xpath.

#!/usr/bin/python -u

from lxml import etree

parser = etree.HTMLParser()
html   = etree.parse('http://orgsci.journal.informs.org/', parser)

img = html.xpath('//a[(((count(preceding-sibling::*) + 1) = 3) and parent::*)]//img/@src')

for x in img:
print(x)

 

 

 

Loading mentions Retweet
Filed under  //   code   python  

Comments [0]

python packages

feedparser: for parsing RSS feeds INSTALLED

lxml: for parsing HTML and XML INSTALLED
  • needed libxml2-dev and libxslt-dev installed using apt-get

urllib & urllib2: replaces curl, for sending HTTP requests INSTALLED [default]

Loading mentions Retweet
Filed under  //   modules   python  

Comments [0]