2016-03-13 4 views
1

Я хочу извлечь выдержки из таких данных, как название компании и адрес с сайта, используя BeautifulSoup. Я получаю, однако, следующий сбой:Скребок с BeautifulSoup: объект не имеет атрибута

Calgary's Notary Public 
Traceback (most recent call last): 
    File "test.py", line 16, in <module> 
    print item.find_all(class_='jsMapBubbleAddress').text 
AttributeError: 'ResultSet' object has no attribute 'text' 

Фрагмент кода HTML здесь. Я хочу извлечь всю текстовую информацию и преобразовать ее в файл CSV. Пожалуйста, помогите мне.

<div class="listing__right article hasIcon"> 
    <h3 class="listing__name jsMapBubbleName" itemprop="name"><a data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"busname","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" title="See detailed information for Calgary's Notary Public">Calgary's Notary Public</a> </h3> 
    <div class="listing__address address mainLocal"> 
     <em class="itemCounter">1</em> 
     <span class="listing__address--full" itemprop="address" itemscope="" itemtype="http://schema.org/PostalAddress"> 
     <span class="jsMapBubbleAddress" itemprop="streetAddress">340-600 Crowfoot Cres NW</span>, <span class="jsMapBubbleAddress" itemprop="addressLocality">Calgary</span>, <span class="jsMapBubbleAddress" itemprop="addressRegion">AB</span> <span class="jsMapBubbleAddress" itemprop="postalCode">T3G 0B4</span></span> 
     <a class="listing__direction" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1a","lk_relevancy":"1","lk_name":"directions-step1","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/merchant/directions/100971374?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true" rel="nofollow" title="Get direction to Calgary's Notary Public">Get directions »</a> 
    </div> 
    <div class="listing__details"> 
     <p class="listing__details__teaser" itemprop="description">We offer you a convenient, quick and affordable solution for your Notary Public or Commissioner for Oaths in Calgary needs.</p> 
    </div> 
    <div class="listing__ratings--root"> 
     <div class="listing__ratings ratingWarp" itemprop="aggregateRating" itemscope="" itemtype="http://schema.org/AggregateRating"> 
     <meta content="5" itemprop="ratingValue"/> 
     <meta content="1" itemprop="ratingCount"/> 
     <span class="ypStars" data-analytics-group="stars" data-clicksent="false" data-rating="rating5" title="Ratings: 5 out of 5 stars"> 
     <span class="star1" data-analytics-name="stars" data-label="Optional : Why did you hate it?" title="I hated it"></span> 
     <span class="star2" data-analytics-name="stars" data-label="Optional : Why didn't you like it?" title="I didn't like it"></span> 
     <span class="star3" data-analytics-name="stars" data-label="Optional : Why did you like it?" title="I liked it"></span> 
     <span class="star4" data-analytics-name="stars" data-label="Optional : Why did you really like it?" title="I really liked it"></span> 
     <span class="star5" data-analytics-name="stars" data-label="Optional : Why did you love it?" title="I loved it"></span> 
     </span><a class="listing__ratings__count" data-analytics='{"lk_listing_id":"100971374","lk_non-ad-rollup":"0","lk_page_num":"1","lk_pos":"in_listing","lk_proximity":"14.5","lk_directory_heading":[{"085100":[{"00910600":"1"},{"00911000":"1"}]}],"lk_geo_tier":"in","lk_area":"left_1","lk_relevancy":"1","lk_name":"read_yp_reviews","lk_pos_num":"1","lk_se_id":"e292d1d2-f130-463d-8f0c-7dd66800dead_Tm90YXJ5_Q2FsZ2FyeSwgQUI_56","lk_ev":"link","lk_product":"l2"}' href="/bus/Alberta/Calgary/Calgary-s-Notary-Public/100971374.html?what=Notary&amp;where=Calgary%2C+AB&amp;useContext=true#ypgReviewsHeader" rel="nofollow" title="1 of Review for Calgary's Notary Public">1<span class="hidden-phone"> YP review</span></a> 
     </div> 
    </div> 
    <div class="listing__details detailsWrap"> 
     <ul> 
     <li><a href="/search/si/1/Notaries/Calgary%2C+AB" title="Notaries">Notaries</a> 
      , 
     </li> 
     <li><a href="/search/si/1/Notaries+Public/Calgary%2C+AB" title="Notaries Public">Notaries Public</a></li> 
     </ul> 
    </div> 
</div> 

Есть много div s с listing__right article hasIcon. Я использую for loop для извлечения информации.

Код Python, который я написал до сих пор.

import requests 
from bs4 import BeautifulSoup 

url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB' 
response = requests.get(url) 
content = response.content 

soup = BeautifulSoup(content) 
g_data=soup.find_all('div', attrs={'class': 'listing__right article hasIcon'}) 

for item in g_data: 
    print item.find('h3').text 
    #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text 
    print item.find_all(class_='jsMapBubbleAddress').text 
+0

'find_all 'возвращает список, а списки в Python не имеют свойства или атрибута text. Попробуйте повторить этот список, который был возвращен в последней строке вашего кода. – MrPyCharm

+0

Я хочу только первый соответствующий элемент –

+0

print item.find_all (класс _ = 'jsMapBubbleAddress') [0] .text –

ответ

1

find_all возвращает список, который не имеет «текст» атрибут, так что вы получаете сообщение об ошибке, не уверен, что вывод, который вы ищете, но этот код, кажется, работает нормально:

import requests 
from bs4 import BeautifulSoup 

url = 'http://www.yellowpages.ca/search/si-rat/1/Notary/Calgary%2C+AB' 
response = requests.get(url) 
content = response.content 

soup = BeautifulSoup(content,"lxml") 
g_data=soup.find_all('div', attrs={'class': 'listing__right article hasIcon'}) 

for item in g_data: 
    print item.find('h3').text 
    #print item.contents[2].find_all('em', attrs={'class': 'itemCounter'})[1].text 
    items = item.find_all(class_='jsMapBubbleAddress') 
    for item in items: 
     print item.text 
Смежные вопросы