Я препроцессор текстовых данных (точнее, данные в твиттере), но всякий раз, когда я применяю стебель NLTK, я получаю список NoneTypes. Я не могу понять, почему это происходит, и я понятия не имею, как его решить.NLTK stemmer возвращает список NoneTypes
Вот как выглядит мой текст данных по обработке:
Перед обработкой:
In [10]:
undefined
import pandas as pd
import numpy as np
import glob
import os
import nltk
dir = "C:\Users\Anonymous\Desktop\KAGA FOLDER\Hashtags"
train = np.array(pd.read_csv(os.path.join(dir,"train.csv")))[:,1]
def clean_the_text(data):
alist = []
data = nltk.word_tokenize(data)
for j in data:
alist.append(j.rstrip('\n'))
alist = " ".join(alist)
return alist
def stemmer(data):
stemmer = nltk.stem.PorterStemmer()
new_list = []
new_list = [new_list.append(stemmer.stem(word)) for word in data]
return new_list
def loop_data(data):
for i in range(len(data)):
data[i] = clean_the_text(data[i])
return data
train
Out[10]:
array(['Jazz for a Rainy Afternoon: {link}',
'RT: @mention: I love rainy days.',
'Good Morning Chicago! Time to kick the Windy City in the nuts and head back West!',
...,
'OMG #WeatherForecast for tomm 80 degrees & Sunny <=== #NeedThat #Philly #iMustSeeItToBelieveIt yo',
"@mention Oh no! We had cold weather early in the week, but now it's getting warmer! Hoping the rain holds out to Saturday!",
'North Cascades Hwy to reopen Wed.: quite late after a long, deep winter. Only had to clear snow 75 ft deep {link}'], dtype=object)
После tokenizing и очистки текста:
train = loop_data(train)
In [12]:
undefined
train
Out[12]:
array(['Jazz for a Rainy Afternoon : { link }',
'RT : @ mention : I love rainy days .',
'Good Morning Chicago ! Time to kick the Windy City in the nuts and head back West !',
...,
'OMG # WeatherForecast for tomm 80 degrees & Sunny & lt ; === # NeedThat # Philly # iMustSeeItToBelieveIt yo',
"@ mention Oh no ! We had cold weather early in the week , but now it 's getting warmer ! Hoping the rain holds out to Saturday !",
'North Cascades Hwy to reopen Wed. : quite late after a long , deep winter. Only had to clear snow 75 ft deep { link }'], dtype=object)
И, наконец, после того, как вытекающие :
In [13]:
undefined
train = stemmer(train)
train
Out[13]:
[None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
None,
спасибо. это действительно проблема. – Learner