2015-12-21 5 views
2

Как я могу извлечь весь контент в «td»?Веб-скребок - Python

<td> 
    Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 
    <span class="excitingNote">8 entire dolls per set! Octuple the presents!</span> 
</td> 

Я попытался это:

desc = data.xpath("//td/text()") 
print desc 

Но, он возвращает только первое предложение:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 

Я хотел бы иметь выход в следующем формате:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents! 

Я также пробовал:

desc = data.xpath("//td//text()") 
    print desc 

Результат выглядит следующим образом:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 
8 entire dolls per set! Octuple the presents! 

Я предпочитаю следующее:

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents! 
+0

не должно быть '// td // text()'? – smac89

+0

см. Мой пересмотренный вопрос. – kevin

+2

'desc.replace (" \ n "," ")'? – DJanssens

ответ

2

Это работало.

desc = data.xpath("//td") 
    print desc.text_content()