Его некоторый запрос ajax, который заполняет страницу с помощью this url, следующее демо в scrapy shell поможет вам получить ваши данные.
scrapy shell 'http://seekingalpha.com/memcached2/hp_top_articles'
2015-06-22 10:43:26+0530 [scrapy] INFO: Scrapy 0.24.6 started (bot: scrapybot)
2015-06-22 10:43:26+0530 [scrapy] INFO: Optional features available: ssl, http11, boto, django
2015-06-22 10:43:26+0530 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2015-06-22 10:43:26+0530 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-06-22 10:43:27+0530 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-06-22 10:43:27+0530 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-06-22 10:43:27+0530 [scrapy] INFO: Enabled item pipelines:
2015-06-22 10:43:27+0530 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-06-22 10:43:27+0530 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2015-06-22 10:43:27+0530 [default] INFO: Spider opened
2015-06-22 10:43:27+0530 [default] DEBUG: Crawled (200) <GET http://seekingalpha.com/memcached2/hp_top_articles> (referer: None)
[s] Available Scrapy objects:
[s] crawler <scrapy.crawler.Crawler object at 0x7f2896c431d0>
[s] item {}
[s] request <GET http://seekingalpha.com/memcached2/hp_top_articles>
[s] response <200 http://seekingalpha.com/memcached2/hp_top_articles>
[s] settings <scrapy.settings.Settings object at 0x7f289ebad450>
[s] spider <Spider 'default' at 0x7f2895e356d0>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] fetch(req_or_url) Fetch request (or URL) and update local objects
[s] view(response) View response in a browser
In [1]: import json
In [2]: cleaned_data = response.body.strip('SA.Pages.HP.TopArticles.onupdate(').strip(')')
In [3]: data = json.loads(clea)
%clear cleaned_data
In [3]: data = json.loads(cleaned_data)
при печати данных вы получите что-то вроде следующего,
[{u'author_name': None,
u'author_picture': u'http://static1.cdn-seekingalpha.com/images/users_profile/003/022/051/medium_pic.png?1379847453',
u'comments_counts': u'13',
u'company_name': u'BlackBerry Ltd.',
u'id': 3273215,
u'path': u'/article/3273215-blackberry-brace-yourself-for-another-ugly-quarter',
u'publish_on': 1434944662,
u'slug': u'bbry',
u'title': u'BlackBerry: Brace Yourself For Another Ugly Quarter'},
{u'author_name': None,
u'author_picture': u'http://static.cdn-seekingalpha.com/images/users_profile/000/055/431/medium_pic.png?1379429224',
u'comments_counts': u'45',
u'company_name': None,
u'id': 3272165,
u'path': u'/article/3272165-weighing-the-week-ahead-what-does-the-greek-crisis-mean-for-financial-markets',
u'publish_on': 1434863813,
u'slug': None,
u'title': u'Weighing The Week Ahead: What Does The Greek Crisis Mean For Financial Markets'},
{u'author_name': None,
u'author_picture': u'http://static1.cdn-seekingalpha.com/images/users_profile/003/854/671/medium_pic.png?1428599641',
u'comments_counts': u'6',
u'company_name': u'Google Inc.',
u'id': 3272955,
u'path': u'/article/3272955-google-a-big-test-lies-ahead-with-the-verticalization-of-youtube',
u'publish_on': 1434908812,
u'slug': u'goog',
u'title': u'Google: A Big Test Lies Ahead With The Verticalization Of YouTube'},
...
...
}]
возможно это с помощью Ajax. – Tempux