Вы cannot use regular expressions in XPath 1.0 (даже если регулярные выражения наверняка будут полезны там!). В XPath 2.0 (который lxml не поддерживает), регулярные выражения могут использоваться в некоторых функциях, например matches()
или replace()
.
Если я правильно понял, вы ищете этот кусок данных:
<a href='/institute/event/11147'>Papel Picado Workshop Series: Session 5</a>
Вы можете найти эти a
элементы с
//a[starts-with(@href,'/institute/event/')]
Но учтите, что это возвращает список элементов - в то время как кажется, что вы ожидаете, что в результате будет один элемент. Пожалуйста, объясните более четко, что именно вам нужно в результате.
Как предложение, как об этом:
from lxml import html
import requests
page = requests.get('http://web.international.ucla.edu/institute/events')
tree = html.fromstring(page.text)
event_titles = tree.xpath('//a[starts-with(@href,"/institute/event/")]/text()')
for event_title in event_titles:
print "Event Title: ", event_title
И результат будет
Event Title: Papel Picado Workshop Series: Session 5
Event Title: Cacahuatl: The Origins and Global Impact of Chocolate
Event Title: “Institutionalizing Numbers in Post-Colonial Africa”
Event Title: The Daniel Pearl Memorial Lecture presents A Conversation with Leon Panetta, part of the Luskin Lecture Series
Event Title: Persian Women and Other Lies: Story-telling as Historical Retrieval
Event Title: UCLA EVENT: Making Micronesia
Event Title: Teach-In: Out of Nowhere? Some Questions, Answers, and Discussion about ISIS
Event Title: Impossible Testimonies: Literature and Aesthetics in the Aftermath of the Armenian Genocide
Event Title: “Casa Grande” Film Screening
Event Title: The Headscarf Debates: Conflicts of National Belonging
Event Title: Rethinking History in Chinese Central Asia
Event Title: Screening: "REBEL: Loreta Velazquez, Civil War Soldier and Spy"
Event Title: "How Terrorism is Designed to Work"
Event Title: Matthäus Rest Talk - Dreaming of Pipes: The politics of in/visibility around Nepal’s spectral infrastructures
Event Title: The Barber of Damascus: Nouveau Literacy in the Eighteenth-Century Levant
Event Title: Representation of "Apology": a Comparative Study on Narratives by Korean and Japanese Media
Event Title: "They Can Live in the Desert but Nowhere Else": A History of the Armenian Genocide
Event Title: Colloquium: Towards a contents-platform conglomerate?
Event Title: Picturing Political Abstractions in Song/Jin Painting
Event Title: ISIS and the Enslavement and Trafficking of Women: An Evening with Dr. Khaled Abou El Fadi
Event Title: Korean Culture Night
Event Title: Genocide and Global History: A Conference on the 100th Anniversary of the Armenian Genocide
Event Title: U.S.-China: Economic Ties, Growth Strategies and Investment Opportunities
Event Title: Human Rights and the Armenian Genocide
Event Title: Gerschenkron Redux? New Evidence on Shanghai's Pre-War Stock Exchange and Its Implications for the Chinese Economy at Present
Попробуйте 'event_title = tree.xpath ('// а [@href ="/институт/event/[0-9] {5} "]/text() ')' –
Я так и думал. Я получил пустой массив. Дай мне попробовать снова. –
Yup, я получил название события: [] 'как результат. –