2015-06-09 2 views
0

У меня есть таблица HTML, который выглядит следующим образом:Разбираем таблицу HTML с Nokogiri в Рубине

<table id="TTdata" border="0" cellspacing="0" cellpadding="3" align="center"> 
    <tbody> 
     <tr class="TTdata_ltblue"> 
     <td class="ctr"><b>#</b></td> 
     <td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=YEAR">YEAR</a><img src="/images/up.gif"></b></td> 
     <td class="ctr" title="Player's name."><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=NAME">NAME</a></b></td> 
     <td class="ctr" title="how many pitches a catcher had a chance/need to frame"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=FR_CHANCES">FR_CHANCES</a></b></td> 
     <td class="ctr" title="the number of strikes the catcher is expected to have received according to RPM"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=PREDICTED_STRIKES">PREDICTED_STRIKES</a></b></td> 
     <td class="ctr" title="the number of strikes the catcher actually received"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=ACTUAL_STRIKES">ACTUAL_STRIKES</a></b></td> 
     <td class="ctr" title="the difference between actual and predicted strikes received by the catcher"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=EXTRA_STRIKES">EXTRA_STRIKES</a></b></td> 
     <td class="ctr" title="runs RPM credits to the catcher, using the ball-strike context to calculated run value"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=FR_RUNS_ADDED_BY_COUNT">FR_RUNS_ADDED_BY_COUNT</a><img src="/images/down.gif"></b></td> 
     <td class="ctr" title="how many runs RPM would assign using a generic .14 runs available per frame"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=FR_RUNS_ADDED_BY_CALL">FR_RUNS_ADDED_BY_CALL</a></b></td> 
     <td class="ctr" title="pitches the catcher received that could have resulted in a wild pitch or passed ball; this is when runners are on base or a dropped third strike is possible"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=BL_CHANCES">BL_CHANCES</a></b></td> 
     <td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=PREDICTED_PBWP">PREDICTED_PBWP</a></b></td> 
     <td class="ctr" title="the run value accumulated from preventing wild pitches and passed balls (.28 per PB/WP saved)"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=BL_RUNS_ADDED">BL_RUNS_ADDED</a></b></td> 
     <td class="ctr" title="the number of passed balls and wild pitches allowed by the catcher"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=ACTUAL_PBWP">ACTUAL_PBWP</a></b></td> 
     <td class="ctr" title="the difference between actual and predicted passed balls and wild pitches allowed by the catcher 
      "><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&amp;newsort1column=PBWP_SAVED">PBWP_SAVED</a></b></td> 
     </tr> 
     <tr class="TTdata"> 
     <td>1.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Yasmani+Grandal" target="_blank">Yasmani Grandal</a></td> 
     <td class="right">2295</td> 
     <td class="right">871.5</td> 
     <td class="right">925</td> 
     <td class="right">53.5</td> 
     <td class="right">8.0</td> 
     <td class="right">8.0</td> 
     <td class="right">1097</td> 
     <td class="right">18.0</td> 
     <td class="right">0.0</td> 
     <td class="right">18</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>2.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Buster+Posey" target="_blank">Buster Posey</a></td> 
     <td class="right">2601</td> 
     <td class="right">1,011.4</td> 
     <td class="right">1,056</td> 
     <td class="right">44.6</td> 
     <td class="right">6.6</td> 
     <td class="right">6.6</td> 
     <td class="right">1232</td> 
     <td class="right">10.0</td> 
     <td class="right">0.0</td> 
     <td class="right">10</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>3.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Francisco+Cervelli" target="_blank">Francisco Cervelli</a></td> 
     <td class="right">2629</td> 
     <td class="right">989.0</td> 
     <td class="right">1,033</td> 
     <td class="right">44.0</td> 
     <td class="right">6.5</td> 
     <td class="right">6.5</td> 
     <td class="right">1357</td> 
     <td class="right">14.0</td> 
     <td class="right">0.0</td> 
     <td class="right">14</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>4.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Mike+Zunino" target="_blank">Mike Zunino</a></td> 
     <td class="right">2828</td> 
     <td class="right">1,128.8</td> 
     <td class="right">1,169</td> 
     <td class="right">40.2</td> 
     <td class="right">6.0</td> 
     <td class="right">6.0</td> 
     <td class="right">1325</td> 
     <td class="right">19.0</td> 
     <td class="right">0.0</td> 
     <td class="right">19</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>5.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Caleb+Joseph" target="_blank">Caleb Joseph</a></td> 
     <td class="right">2713</td> 
     <td class="right">993.9</td> 
     <td class="right">1,031</td> 
     <td class="right">37.1</td> 
     <td class="right">5.5</td> 
     <td class="right">5.5</td> 
     <td class="right">1315</td> 
     <td class="right">9.0</td> 
     <td class="right">0.0</td> 
     <td class="right">9</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>6.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Chris+Iannetta" target="_blank">Chris Iannetta</a></td> 
     <td class="right">2158</td> 
     <td class="right">847.5</td> 
     <td class="right">884</td> 
     <td class="right">36.5</td> 
     <td class="right">5.4</td> 
     <td class="right">5.4</td> 
     <td class="right">1078</td> 
     <td class="right">15.0</td> 
     <td class="right">0.0</td> 
     <td class="right">15</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>7.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Jason+Castro" target="_blank">Jason Castro</a></td> 
     <td class="right">2679</td> 
     <td class="right">1,068.9</td> 
     <td class="right">1,105</td> 
     <td class="right">36.1</td> 
     <td class="right">5.4</td> 
     <td class="right">5.4</td> 
     <td class="right">1378</td> 
     <td class="right">18.0</td> 
     <td class="right">0.0</td> 
     <td class="right">18</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>8.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Miguel+Montero" target="_blank">Miguel Montero</a></td> 
     <td class="right">1977</td> 
     <td class="right">785.8</td> 
     <td class="right">820</td> 
     <td class="right">34.2</td> 
     <td class="right">5.1</td> 
     <td class="right">5.1</td> 
     <td class="right">972</td> 
     <td class="right">11.0</td> 
     <td class="right">0.0</td> 
     <td class="right">11</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>9.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Martin+Maldonado" target="_blank">Martin Maldonado</a></td> 
     <td class="right">2343</td> 
     <td class="right">906.0</td> 
     <td class="right">940</td> 
     <td class="right">34.0</td> 
     <td class="right">5.1</td> 
     <td class="right">5.1</td> 
     <td class="right">1193</td> 
     <td class="right">17.0</td> 
     <td class="right">0.0</td> 
     <td class="right">17</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>10.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Tyler+Flowers" target="_blank">Tyler Flowers</a></td> 
     <td class="right">2191</td> 
     <td class="right">833.4</td> 
     <td class="right">865</td> 
     <td class="right">31.6</td> 
     <td class="right">4.7</td> 
     <td class="right">4.7</td> 
     <td class="right">1305</td> 
     <td class="right">13.0</td> 
     <td class="right">0.0</td> 
     <td class="right">13</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>11.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Rene+Rivera" target="_blank">Rene Rivera</a></td> 
     <td class="right">2632</td> 
     <td class="right">1,043.1</td> 
     <td class="right">1,070</td> 
     <td class="right">26.9</td> 
     <td class="right">4.0</td> 
     <td class="right">4.0</td> 
     <td class="right">1331</td> 
     <td class="right">18.0</td> 
     <td class="right">0.0</td> 
     <td class="right">18</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>12.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Russell+Martin" target="_blank">Russell Martin</a></td> 
     <td class="right">2919</td> 
     <td class="right">1,121.3</td> 
     <td class="right">1,148</td> 
     <td class="right">26.7</td> 
     <td class="right">4.0</td> 
     <td class="right">4.0</td> 
     <td class="right">1470</td> 
     <td class="right">27.0</td> 
     <td class="right">0.0</td> 
     <td class="right">27</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>13.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Kevin+Plawecki" target="_blank">Kevin Plawecki</a></td> 
     <td class="right">1826</td> 
     <td class="right">744.0</td> 
     <td class="right">770</td> 
     <td class="right">26.0</td> 
     <td class="right">3.9</td> 
     <td class="right">3.9</td> 
     <td class="right">886</td> 
     <td class="right">9.0</td> 
     <td class="right">0.0</td> 
     <td class="right">9</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>14.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=David+Ross" target="_blank">David Ross</a></td> 
     <td class="right">941</td> 
     <td class="right">339.6</td> 
     <td class="right">361</td> 
     <td class="right">21.4</td> 
     <td class="right">3.2</td> 
     <td class="right">3.2</td> 
     <td class="right">519</td> 
     <td class="right">5.0</td> 
     <td class="right">0.0</td> 
     <td class="right">5</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>15.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Roberto+Perez" target="_blank">Roberto Perez</a></td> 
     <td class="right">1969</td> 
     <td class="right">776.5</td> 
     <td class="right">789</td> 
     <td class="right">12.5</td> 
     <td class="right">1.9</td> 
     <td class="right">1.9</td> 
     <td class="right">1090</td> 
     <td class="right">12.0</td> 
     <td class="right">0.0</td> 
     <td class="right">12</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>16.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Welington+Castillo" target="_blank">Welington Castillo</a></td> 
     <td class="right">1047</td> 
     <td class="right">410.6</td> 
     <td class="right">420</td> 
     <td class="right">9.4</td> 
     <td class="right">1.4</td> 
     <td class="right">1.4</td> 
     <td class="right">499</td> 
     <td class="right">4.0</td> 
     <td class="right">0.0</td> 
     <td class="right">4</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>17.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Hank+Conger" target="_blank">Hank Conger</a></td> 
     <td class="right">1000</td> 
     <td class="right">405.2</td> 
     <td class="right">414</td> 
     <td class="right">8.8</td> 
     <td class="right">1.3</td> 
     <td class="right">1.3</td> 
     <td class="right">511</td> 
     <td class="right">4.0</td> 
     <td class="right">0.0</td> 
     <td class="right">4</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata_ltgrey"> 
     <td>18.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Josh+Thole" target="_blank">Josh Thole</a></td> 
     <td class="right">476</td> 
     <td class="right">168.8</td> 
     <td class="right">177</td> 
     <td class="right">8.2</td> 
     <td class="right">1.2</td> 
     <td class="right">1.2</td> 
     <td class="right">275</td> 
     <td class="right">4.0</td> 
     <td class="right">0.0</td> 
     <td class="right">4</td> 
     <td class="right">0.0</td> 
     </tr> 
     <tr class="TTdata"> 
     <td>19.</td> 
     <td class="right">2015</td> 
     <td><a href="/player_search.php?search_name=Tucker+Barnhart" target="_blank">Tucker Barnhart</a></td> 
     <td class="right">934</td> 
     <td class="right">351.4</td> 
     <td class="right">357</td> 
     <td class="right">5.6</td> 
     <td class="right">0.8</td> 
     <td class="right">0.8</td> 
     <td class="right">410</td> 
     <td class="right">4.0</td> 
     <td class="right">0.0</td> 
     <td class="right">4</td> 
     <td class="right">0.0</td> 
     </tr> 
    </tbody> 
</table> 

В этом случае, я заинтересован в получении каждого «игрока», который находится в строке таблицы с либо класс TTdata, либо TTdata_ltgrey. Это может быть достигнуто с помощью следующих:

html = open(url) 
doc = Nokogiri::HTML(html) 

doc.css('.TTdata, .TTdata_lgrey').each do |catcher| 
    # parse here 
end 

Моя проблема в том, ни один из td записей не классы, связанные с ними. Я просто знаю, что TD 1 - это позиция, TD 2 - год, TD 3 - это имя.

Каков правильный способ доступа к каждому td с использованием итерации, чтобы я мог создать модель/хэш пар имя/вал для каждой строки?

+0

Это очень важно, чтобы показать, что вы пытались писать, чтобы решить эту проблему. Это помогает нам, потому что мы можем исправить ваш код, а не тратить время на то, чтобы писать все с нуля, и это помогает вам, потому что вам не нужно пытаться обучать какой-то чужой код в вашу. Кроме того, это позволяет нам знать, что вы на самом деле что-то пробовали, а не отказались от частичного пути. –

ответ

1

Вот один из подходов, которые я пробовал. Но да, вы можете взять его дальше отсюда, чтобы удовлетворить потребности у вас есть:

require 'nokogiri' 
require 'pp' 

doc = Nokogiri::HTML.parse(File.read("#{__dir__}/out1.html")) 

data = doc.css('.TTdata, .TTdata_lgrey').map do |tr| 
    %i(position year name).zip(tr.css("td:nth-child(-n+3)").map(&:text)).to_h 
end 

pp data 

выход

[{:position=>"1.", :year=>"2015", :name=>"Yasmani Grandal"}, 
{:position=>"3.", :year=>"2015", :name=>"Francisco Cervelli"}, 
{:position=>"5.", :year=>"2015", :name=>"Caleb Joseph"}, 
{:position=>"7.", :year=>"2015", :name=>"Jason Castro"}, 
{:position=>"9.", :year=>"2015", :name=>"Martin Maldonado"}, 
{:position=>"11.", :year=>"2015", :name=>"Rene Rivera"}, 
{:position=>"13.", :year=>"2015", :name=>"Kevin Plawecki"}, 
{:position=>"15.", :year=>"2015", :name=>"Roberto Perez"}, 
{:position=>"17.", :year=>"2015", :name=>"Hank Conger"}, 
{:position=>"19.", :year=>"2015", :name=>"Tucker Barnhart"}] 
+0

Тестирование сейчас - если я спрошу, что такое префикс '% i'? – randombits

+0

@randombits Да, это сокращенный синтаксис для создания массива * symbol *. –

+0

также требуется ли для него определенная версия ruby? Я получаю эту ошибку: 'NoMethodError: undefined method' to_h 'для [[: position, "1."], [: year, "2015"], [: name, "Yasmani Grandal"]]: Array' using Ruby 2.0.0 – randombits

Смежные вопросы