2013-06-24 4 views
1

Мне просто интересно использовать beautifulsoup для извлечения всего значения 3-часовых чтений PSI с 12:00 до 11,59 вечера. Такие, как последние bold текст в 17:00.Как извлечь с помощью beautifulsoup python

Пример веб-сайта: http://app2.nea.gov.sg/anti-pollution-radiation-protection/air-pollution/psi/psi-readings-over-the-last-24-hours. Может ли кто-нибудь научить меня, как? Заранее спасибо !

<!-- start content --> 
    <h1 class="title" id="top"> 
     PSI Readings over the last 24 Hours</h1> 
    <script type="text/javascript"> 
     var baseUrl = '/anti-pollution-radiation-protection/air-pollution/psi/psi-readings-over-the-last-24-hours'; 

     function changetime(ddl) { 
      var strTime = ddl.options[ddl.selectedIndex].value; 

      if (strTime != null) { 
       var npage = baseUrl + "/time/" + strTime + "#psi24"; 
       window.location = npage; 
      } 
     } 
    </script> 
    <h1 id="psi24"> 
     24-hr PSI Readings on 24 Jun 2013 
    </h1> 
    <p> 
     View reading for: 
     <select class="default" id="ContentPlaceHolderContent_C001_DDLTime" name="ctl00$ContentPlaceHolderContent$C001$DDLTime" onchange="changetime(this);"> 
    <option value="0000">12AM</option> 
    <option value="0100">1AM</option> 
    <option value="0200">2AM</option> 
    <option value="0300">3AM</option> 
    <option value="0400">4AM</option> 
    <option value="0500">5AM</option> 
    <option value="0600">6AM</option> 
    <option value="0700">7AM</option> 
    <option value="0800">8AM</option> 
    <option value="0900">9AM</option> 
    <option value="1000">10AM</option> 
    <option value="1100">11AM</option> 
    <option value="1200">12PM</option> 
    <option value="1300">1PM</option> 
    <option value="1400">2PM</option> 
    <option value="1500">3PM</option> 
    <option value="1600">4PM</option> 
    <option selected="selected" value="1700">5PM</option> 
    </select> 
    </p> 
    <table border="0" cellpadding="4" cellspacing="1" class="text_psinormal" width="100%"> 
    <thead> 
    <tr> 
    <th width="33%"> 
    <center><strong>Region</strong></center> 
    </th> 
    <th width="33%"> 
    <center><strong>PSI</strong></center> 
    </th> 
    <th width="34%"> 
    <center><strong>24-hr PM2.5 Concentration (µg/m<sup>3</sup>)</strong></center> 
    </th> 
    </tr> 
    </thead> 
    <tr> 
    <td align="center">North 
      </td> 
    <td align="center"> 
       61 
      </td> 
    <td align="center"> 
       47 
      </td> 
    </tr> 
    <tr> 
    <td align="center">South 
      </td> 
    <td align="center"> 
       62 
      </td> 
    <td align="center"> 
       46 
      </td> 
    </tr> 
    <tr> 
    <td align="center">East 
      </td> 
    <td align="center"> 
       55 
      </td> 
    <td align="center"> 
       39 
      </td> 
    </tr> 
    <tr> 
    <td align="center">West 
      </td> 
    <td align="center"> 
       87 
      </td> 
    <td align="center"> 
       83 
      </td> 
    </tr> 
    <tr> 
    <td align="center">Central 
      </td> 
    <td align="center"> 
       58 
      </td> 
    <td align="center"> 
       40 
      </td> 
    </tr> 
    <tr> 
    <td align="center">Overall Singapore 
      </td> 
    <td align="center"> 
       55-87 
      </td> 
    <td align="center"> 
       39-83 
      </td> 
    </tr> 
    </table> 
    <div> 
    </div> 
    <div> 
    <h1>3-hr PSI Readings from 12AM to 11.59PM on 
          24 Jun 2013</h1> 
    <table border="0" cellpadding="4" cellspacing="1" width="100%"> 
    <tr> 
    <td align="center" width="16%"> 
    <strong>Time</strong> 
    </td> 
    <td align="center" width="7%"><strong>12AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>1AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>2AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>3AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>4AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>5AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>6AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>7AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>8AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>9AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>10AM</strong> 
    </td> 
    <td align="center" width="7%"><strong>11AM</strong> 
    </td> 
    </tr> 
    <tr> 
    <td align="center"> 
    <strong>3-hr PSI</strong> 
    </td> 
    <td align="center"> 
         76 
        </td> 
    <td align="center"> 
         70 
        </td> 
    <td align="center"> 
         64 
        </td> 
    <td align="center"> 
         59 
        </td> 
    <td align="center"> 
         54 
        </td> 
    <td align="center"> 
         51 
        </td> 
    <td align="center"> 
         48 
        </td> 
    <td align="center"> 
         47 
        </td> 
    <td align="center"> 
         47 
        </td> 
    <td align="center"> 
         47 
        </td> 
    <td align="center"> 
         49 
        </td> 
    <td align="center"> 
         52 
        </td> 
    </tr> 
    <tr> 
    <td align="center" width="16%"> 
    <strong>Time</strong> 
    </td> 
    <td align="center" width="7%"><strong>12PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>1PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>2PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>3PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>4PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>5PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>6PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>7PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>8PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>9PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>10PM</strong> 
    </td> 
    <td align="center" width="7%"><strong>11PM</strong> 
    </td> 
    </tr> 
    <tr> 
    <td align="center"> 
    <strong>3-hr PSI</strong> 
    </td> 
    <td align="center"> 
         54 
        </td> 
    <td align="center"> 
         59 
        </td> 
    <td align="center"> 
         65 
        </td> 
    <td align="center"> 
         72 
        </td> 
    <td align="center"> 
         79 
        </td> 
    <td align="center"> 
    <strong style="font-size:14px;">82</strong> 
    </td> 
    <td align="center"> 
         - 
        </td> 
    <td align="center"> 
         - 
        </td> 
    <td align="center"> 
         - 
        </td> 
    <td align="center"> 
         - 
        </td> 
    <td align="center"> 
         - 
        </td> 
    <td align="center"> 
         - 
        </td> 
    </tr> 
    </table> 
    </div> 
    <div class="sfContentBlock"> 
    <p class="table-caption">Hourly updates of 3-hr PSI readings are provided from 12am to 11:59pm. The 3hr PSI readings are calculated based on PM10 concentrations only</p> 
    </div> 
    <div> 
    </div> 
    <div class="backToTop"> 
    <a href="#top">Back to Top</a> 
    </div> 
    </div> 
    </div> 
    <!-- end content --> 

ответ

0

Хотя вы должны показать, что вы пытались сделать это самостоятельно, но вот код:

from pprint import pprint 
import urllib2 
from bs4 import BeautifulSoup as soup 


url = "http://app2.nea.gov.sg/anti-pollution-radiation-protection/air-pollution/psi/psi-readings-over-the-last-24-hours" 
web_soup = soup(urllib2.urlopen(url)) 

table = web_soup.find(name="div", attrs={'class': 'c1'}).find_all(name="div")[2].find_all('table')[0] 

table_rows = [] 
for row in table.find_all('tr'): 
    table_rows.append([td.text.strip() for td in row.find_all('td')]) 

data = {} 
for tr_index, tr in enumerate(table_rows): 
    if tr_index % 2 == 0: 
     for td_index, td in enumerate(tr): 
      data[td] = table_rows[tr_index + 1][td_index] 

pprint(data) 

печатает:

{'10AM': '49', 
'10PM': '-', 
'11AM': '52', 
'11PM': '-', 
'12AM': '76', 
'12PM': '54', 
'1AM': '70', 
'1PM': '59', 
'2AM': '64', 
'2PM': '65', 
'3AM': '59', 
'3PM': '72', 
'4AM': '54', 
'4PM': '79', 
'5AM': '51', 
'5PM': '82', 
'6AM': '48', 
'6PM': '79', 
'7AM': '47', 
'7PM': '-', 
'8AM': '47', 
'8PM': '-', 
'9AM': '47', 
'9PM': '-', 
'Time': '3-hr PSI'} 
+0

Извините, что я не очень хорошо в кодирование python, как вы печатаете только одно значение? Пример получения 5AM значения 51? Мне нужно только одно значение. Заранее спасибо ! –

+0

Есть ли способ получить последние обновленные погодные пси? Это ** смело ** на веб-сайте. –

+0

Проверьте это, я задал этот вопрос

Смежные вопросы