2016-01-15 3 views
1

Я работал с оболочкой python (SKLEARN) для VW, но не мог понять, как использовать пространства имен, поэтому решил обойти tovw() и создать свой собственный форматированный список.Vowpal Wabbit Python Sklearn - Predict vw Формат

Сначала я экспортировал текстовый файл для учебных и тестовых файлов, работал с vw через терминал и все работало хорошо. Затем я попытался запустить то же самое с оболочкой python. Обучение модели, похоже, работало, но я получаю сообщение об ошибке, когда пытаюсь предсказать. Невозможно ли предсказать с файлом, уже отформатированным как vw?

Это код, который используется для экспорта данных, а также для создания списков

vwX=X_train["Response"].astype('str')+' |'\ 
'a '+\ 
'Product_Info_4:'+X_train["Product_Info_4"].astype('str')+' '\ 
'Ins_Age:'+X_train["Ins_Age"].astype('str')+' '\ 
'Ht:'+X_train["Ht"].astype('str')+' '\ 
'Wt:'+X_train["Wt"].astype('str')+' '\ 
'BMI:'+X_train["BMI"].astype('str')+' '\ 
'Employment_Info_1:'+X_train["Employment_Info_1"].astype('str')+' '\ 
'Employment_Info_4:'+X_train["Employment_Info_4"].astype('str')+' '\ 
'Employment_Info_6:'+X_train["Employment_Info_6"].astype('str')+' '\ 
'Insurance_History_5:'+X_train["Insurance_History_5"].astype('str')+' '\ 
'Family_Hist_2:'+X_train["Family_Hist_2"].astype('str')+' '\ 
'Family_Hist_3:'+X_train["Family_Hist_3"].astype('str')+' '\ 
'Family_Hist_4:'+X_train["Family_Hist_4"].astype('str')+' '\ 
'Family_Hist_5:'+X_train["Family_Hist_5"].astype('str')+' '\ 
'Medical_History_1:'+X_train["Medical_History_1"].astype('str')+' '\ 
'Medical_History_15:'+X_train["Medical_History_15"].astype('str')+' '\ 
'Medical_History_24:'+X_train["Medical_History_24"].astype('str')+' '\ 
'Medical_History_32:'+X_train["Medical_History_32"].astype('str')+' '\ 
'|b '+\ 
np.where(X_train["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\ 
np.where(X_train["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\ 
np.where(X_train["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\ 
np.where(X_train["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\ 
np.where(X_train["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\ 
np.where(X_train["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\ 
np.where(X_train["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\ 
np.where(X_train["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\ 
np.where(X_train["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\ 
np.where(X_train["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\ 
np.where(X_train["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\ 
np.where(X_train["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\ 
np.where(X_train["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\ 
np.where(X_train["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\ 
np.where(X_train["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\ 
np.where(X_train["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\ 
np.where(X_train["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\ 
np.where(X_train["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\ 
np.where(X_train["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\ 
np.where(X_train["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\ 
np.where(X_train["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\ 
np.where(X_train["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\ 
np.where(X_train["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\ 
np.where(X_train["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\ 
np.where(X_train["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\ 
np.where(X_train["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\ 
np.where(X_train["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\ 
np.where(X_train["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\ 
np.where(X_train["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\ 
np.where(X_train["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\ 
np.where(X_train["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\ 
np.where(X_train["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\ 
np.where(X_train["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\ 
np.where(X_train["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\ 
np.where(X_train["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\ 
np.where(X_train["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\ 
np.where(X_train["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\ 
np.where(X_train["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\ 
np.where(X_train["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\ 
np.where(X_train["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\ 
np.where(X_train["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\ 
np.where(X_train["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\ 
np.where(X_train["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\ 
np.where(X_train["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\ 
np.where(X_train["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\ 
np.where(X_train["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\ 
np.where(X_train["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\ 
np.where(X_train["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\ 
' |c '+\ 
"Product_Info_1_"+X_train["Product_Info_1"].astype('str')+' '\ 
"Product_Info_2_"+X_train["Product_Info_2"].astype('str')+' '\ 
"Product_Info_3_"+X_train["Product_Info_3"].astype('str')+' '\ 
"Product_Info_5_"+X_train["Product_Info_5"].astype('str')+' '\ 
"Product_Info_6_"+X_train["Product_Info_6"].astype('str')+' '\ 
"Product_Info_7_"+X_train["Product_Info_7"].astype('str')+' '\ 
"Employment_Info_2_"+X_train["Employment_Info_2"].astype('str')+' '\ 
"Employment_Info_3_"+X_train["Employment_Info_3"].astype('str')+' '\ 
"Employment_Info_5_"+X_train["Employment_Info_5"].astype('str')+' '\ 
"InsuredInfo_1_"+X_train["InsuredInfo_1"].astype('str')+' '\ 
"InsuredInfo_2_"+X_train["InsuredInfo_2"].astype('str')+' '\ 
"InsuredInfo_3_"+X_train["InsuredInfo_3"].astype('str')+' '\ 
"InsuredInfo_4_"+X_train["InsuredInfo_4"].astype('str')+' '\ 
"InsuredInfo_5_"+X_train["InsuredInfo_5"].astype('str')+' '\ 
"InsuredInfo_6_"+X_train["InsuredInfo_6"].astype('str')+' '\ 
"InsuredInfo_7_"+X_train["InsuredInfo_7"].astype('str')+' '\ 
"Insurance_History_1_"+X_train["Insurance_History_1"].astype('str')+' '\ 
"Insurance_History_2_"+X_train["Insurance_History_2"].astype('str')+' '\ 
"Insurance_History_3_"+X_train["Insurance_History_3"].astype('str')+' '\ 
"Insurance_History_4_"+X_train["Insurance_History_4"].astype('str')+' '\ 
"Insurance_History_7_"+X_train["Insurance_History_7"].astype('str')+' '\ 
"Insurance_History_8_"+X_train["Insurance_History_8"].astype('str')+' '\ 
"Insurance_History_9_"+X_train["Insurance_History_9"].astype('str')+' '\ 
"Family_Hist_1_"+X_train["Family_Hist_1"].astype('str')+' '\ 
"Medical_History_2_"+X_train["Medical_History_2"].astype('str')+' '\ 
"Medical_History_3_"+X_train["Medical_History_3"].astype('str')+' '\ 
"Medical_History_4_"+X_train["Medical_History_4"].astype('str')+' '\ 
"Medical_History_5_"+X_train["Medical_History_5"].astype('str')+' '\ 
"Medical_History_6_"+X_train["Medical_History_6"].astype('str')+' '\ 
"Medical_History_7_"+X_train["Medical_History_7"].astype('str')+' '\ 
"Medical_History_8_"+X_train["Medical_History_8"].astype('str')+' '\ 
"Medical_History_9_"+X_train["Medical_History_9"].astype('str')+' '\ 
"Medical_History_10_"+X_train["Medical_History_10"].astype('str')+' '\ 
"Medical_History_11_"+X_train["Medical_History_11"].astype('str')+' '\ 
"Medical_History_12_"+X_train["Medical_History_12"].astype('str')+' '\ 
"Medical_History_13_"+X_train["Medical_History_13"].astype('str')+' '\ 
"Medical_History_14_"+X_train["Medical_History_14"].astype('str')+' '\ 
"Medical_History_16_"+X_train["Medical_History_16"].astype('str')+' '\ 
"Medical_History_17_"+X_train["Medical_History_17"].astype('str')+' '\ 
"Medical_History_18_"+X_train["Medical_History_18"].astype('str')+' '\ 
"Medical_History_19_"+X_train["Medical_History_19"].astype('str')+' '\ 
"Medical_History_20_"+X_train["Medical_History_20"].astype('str')+' '\ 
"Medical_History_21_"+X_train["Medical_History_21"].astype('str')+' '\ 
"Medical_History_22_"+X_train["Medical_History_22"].astype('str')+' '\ 
"Medical_History_23_"+X_train["Medical_History_23"].astype('str')+' '\ 
"Medical_History_25_"+X_train["Medical_History_25"].astype('str')+' '\ 
"Medical_History_26_"+X_train["Medical_History_26"].astype('str')+' '\ 
"Medical_History_27_"+X_train["Medical_History_27"].astype('str')+' '\ 
"Medical_History_28_"+X_train["Medical_History_28"].astype('str')+' '\ 
"Medical_History_29_"+X_train["Medical_History_29"].astype('str')+' '\ 
"Medical_History_30_"+X_train["Medical_History_30"].astype('str')+' '\ 
"Medical_History_31_"+X_train["Medical_History_31"].astype('str')+' '\ 
"Medical_History_33_"+X_train["Medical_History_33"].astype('str')+' '\ 
"Medical_History_34_"+X_train["Medical_History_34"].astype('str')+' '\ 
"Medical_History_35_"+X_train["Medical_History_35"].astype('str')+' '\ 
"Medical_History_36_"+X_train["Medical_History_36"].astype('str')+' '\ 
"Medical_History_37_"+X_train["Medical_History_37"].astype('str')+' '\ 
"Medical_History_38_"+X_train["Medical_History_38"].astype('str')+' '\ 
"Medical_History_39_"+X_train["Medical_History_39"].astype('str')+' '\ 
"Medical_History_40_"+X_train["Medical_History_40"].astype('str')+' '\ 
"Medical_History_41_"+X_train["Medical_History_41"].astype('str') 

vwX.to_csv('train.vw',mode='a', header=False,index=False) 



vwX_T='1'+ ' |'\ 
'a '+\ 
'Product_Info_4:'+X_test["Product_Info_4"].astype('str')+' '\ 
'Ins_Age:'+X_test["Ins_Age"].astype('str')+' '\ 
'Ht:'+X_test["Ht"].astype('str')+' '\ 
'Wt:'+X_test["Wt"].astype('str')+' '\ 
'BMI:'+X_test["BMI"].astype('str')+' '\ 
'Employment_Info_1:'+X_test["Employment_Info_1"].astype('str')+' '\ 
'Employment_Info_4:'+X_test["Employment_Info_4"].astype('str')+' '\ 
'Employment_Info_6:'+X_test["Employment_Info_6"].astype('str')+' '\ 
'Insurance_History_5:'+X_test["Insurance_History_5"].astype('str')+' '\ 
'Family_Hist_2:'+X_test["Family_Hist_2"].astype('str')+' '\ 
'Family_Hist_3:'+X_test["Family_Hist_3"].astype('str')+' '\ 
'Family_Hist_4:'+X_test["Family_Hist_4"].astype('str')+' '\ 
'Family_Hist_5:'+X_test["Family_Hist_5"].astype('str')+' '\ 
'Medical_History_1:'+X_test["Medical_History_1"].astype('str')+' '\ 
'Medical_History_15:'+X_test["Medical_History_15"].astype('str')+' '\ 
'Medical_History_24:'+X_test["Medical_History_24"].astype('str')+' '\ 
'Medical_History_32:'+X_test["Medical_History_32"].astype('str')+' '\ 
'|b '+\ 
np.where(X_test["Medical_Keyword_1"] ==0,''," Medical_Keyword_1")+\ 
np.where(X_test["Medical_Keyword_2"] ==0,''," Medical_Keyword_2")+\ 
np.where(X_test["Medical_Keyword_3"] ==0,''," Medical_Keyword_3")+\ 
np.where(X_test["Medical_Keyword_4"] ==0,''," Medical_Keyword_4")+\ 
np.where(X_test["Medical_Keyword_5"] ==0,''," Medical_Keyword_5")+\ 
np.where(X_test["Medical_Keyword_6"] ==0,''," Medical_Keyword_6")+\ 
np.where(X_test["Medical_Keyword_7"] ==0,''," Medical_Keyword_7")+\ 
np.where(X_test["Medical_Keyword_8"] ==0,''," Medical_Keyword_8")+\ 
np.where(X_test["Medical_Keyword_9"] ==0,''," Medical_Keyword_9")+\ 
np.where(X_test["Medical_Keyword_10"] ==0,''," Medical_Keyword_10")+\ 
np.where(X_test["Medical_Keyword_11"] ==0,''," Medical_Keyword_11")+\ 
np.where(X_test["Medical_Keyword_12"] ==0,''," Medical_Keyword_12")+\ 
np.where(X_test["Medical_Keyword_13"] ==0,''," Medical_Keyword_13")+\ 
np.where(X_test["Medical_Keyword_14"] ==0,''," Medical_Keyword_14")+\ 
np.where(X_test["Medical_Keyword_15"] ==0,''," Medical_Keyword_15")+\ 
np.where(X_test["Medical_Keyword_16"] ==0,''," Medical_Keyword_16")+\ 
np.where(X_test["Medical_Keyword_17"] ==0,''," Medical_Keyword_17")+\ 
np.where(X_test["Medical_Keyword_18"] ==0,''," Medical_Keyword_18")+\ 
np.where(X_test["Medical_Keyword_19"] ==0,''," Medical_Keyword_19")+\ 
np.where(X_test["Medical_Keyword_20"] ==0,''," Medical_Keyword_20")+\ 
np.where(X_test["Medical_Keyword_21"] ==0,''," Medical_Keyword_21")+\ 
np.where(X_test["Medical_Keyword_22"] ==0,''," Medical_Keyword_22")+\ 
np.where(X_test["Medical_Keyword_23"] ==0,''," Medical_Keyword_23")+\ 
np.where(X_test["Medical_Keyword_24"] ==0,''," Medical_Keyword_24")+\ 
np.where(X_test["Medical_Keyword_25"] ==0,''," Medical_Keyword_25")+\ 
np.where(X_test["Medical_Keyword_26"] ==0,''," Medical_Keyword_26")+\ 
np.where(X_test["Medical_Keyword_27"] ==0,''," Medical_Keyword_27")+\ 
np.where(X_test["Medical_Keyword_28"] ==0,''," Medical_Keyword_28")+\ 
np.where(X_test["Medical_Keyword_29"] ==0,''," Medical_Keyword_29")+\ 
np.where(X_test["Medical_Keyword_30"] ==0,''," Medical_Keyword_30")+\ 
np.where(X_test["Medical_Keyword_31"] ==0,''," Medical_Keyword_31")+\ 
np.where(X_test["Medical_Keyword_32"] ==0,''," Medical_Keyword_32")+\ 
np.where(X_test["Medical_Keyword_33"] ==0,''," Medical_Keyword_33")+\ 
np.where(X_test["Medical_Keyword_34"] ==0,''," Medical_Keyword_34")+\ 
np.where(X_test["Medical_Keyword_35"] ==0,''," Medical_Keyword_35")+\ 
np.where(X_test["Medical_Keyword_36"] ==0,''," Medical_Keyword_36")+\ 
np.where(X_test["Medical_Keyword_37"] ==0,''," Medical_Keyword_37")+\ 
np.where(X_test["Medical_Keyword_38"] ==0,''," Medical_Keyword_38")+\ 
np.where(X_test["Medical_Keyword_39"] ==0,''," Medical_Keyword_39")+\ 
np.where(X_test["Medical_Keyword_40"] ==0,''," Medical_Keyword_40")+\ 
np.where(X_test["Medical_Keyword_41"] ==0,''," Medical_Keyword_41")+\ 
np.where(X_test["Medical_Keyword_42"] ==0,''," Medical_Keyword_42")+\ 
np.where(X_test["Medical_Keyword_43"] ==0,''," Medical_Keyword_43")+\ 
np.where(X_test["Medical_Keyword_44"] ==0,''," Medical_Keyword_44")+\ 
np.where(X_test["Medical_Keyword_45"] ==0,''," Medical_Keyword_45")+\ 
np.where(X_test["Medical_Keyword_46"] ==0,''," Medical_Keyword_46")+\ 
np.where(X_test["Medical_Keyword_47"] ==0,''," Medical_Keyword_47")+\ 
np.where(X_test["Medical_Keyword_48"] ==0,''," Medical_Keyword_48")+\ 
' |c '+\ 
"Product_Info_1_"+X_test["Product_Info_1"].astype('str')+' '\ 
"Product_Info_2_"+X_test["Product_Info_2"].astype('str')+' '\ 
"Product_Info_3_"+X_test["Product_Info_3"].astype('str')+' '\ 
"Product_Info_5_"+X_test["Product_Info_5"].astype('str')+' '\ 
"Product_Info_6_"+X_test["Product_Info_6"].astype('str')+' '\ 
"Product_Info_7_"+X_test["Product_Info_7"].astype('str')+' '\ 
"Employment_Info_2_"+X_test["Employment_Info_2"].astype('str')+' '\ 
"Employment_Info_3_"+X_test["Employment_Info_3"].astype('str')+' '\ 
"Employment_Info_5_"+X_test["Employment_Info_5"].astype('str')+' '\ 
"InsuredInfo_1_"+X_test["InsuredInfo_1"].astype('str')+' '\ 
"InsuredInfo_2_"+X_test["InsuredInfo_2"].astype('str')+' '\ 
"InsuredInfo_3_"+X_test["InsuredInfo_3"].astype('str')+' '\ 
"InsuredInfo_4_"+X_test["InsuredInfo_4"].astype('str')+' '\ 
"InsuredInfo_5_"+X_test["InsuredInfo_5"].astype('str')+' '\ 
"InsuredInfo_6_"+X_test["InsuredInfo_6"].astype('str')+' '\ 
"InsuredInfo_7_"+X_test["InsuredInfo_7"].astype('str')+' '\ 
"Insurance_History_1_"+X_test["Insurance_History_1"].astype('str')+' '\ 
"Insurance_History_2_"+X_test["Insurance_History_2"].astype('str')+' '\ 
"Insurance_History_3_"+X_test["Insurance_History_3"].astype('str')+' '\ 
"Insurance_History_4_"+X_test["Insurance_History_4"].astype('str')+' '\ 
"Insurance_History_7_"+X_test["Insurance_History_7"].astype('str')+' '\ 
"Insurance_History_8_"+X_test["Insurance_History_8"].astype('str')+' '\ 
"Insurance_History_9_"+X_test["Insurance_History_9"].astype('str')+' '\ 
"Family_Hist_1_"+X_test["Family_Hist_1"].astype('str')+' '\ 
"Medical_History_2_"+X_test["Medical_History_2"].astype('str')+' '\ 
"Medical_History_3_"+X_test["Medical_History_3"].astype('str')+' '\ 
"Medical_History_4_"+X_test["Medical_History_4"].astype('str')+' '\ 
"Medical_History_5_"+X_test["Medical_History_5"].astype('str')+' '\ 
"Medical_History_6_"+X_test["Medical_History_6"].astype('str')+' '\ 
"Medical_History_7_"+X_test["Medical_History_7"].astype('str')+' '\ 
"Medical_History_8_"+X_test["Medical_History_8"].astype('str')+' '\ 
"Medical_History_9_"+X_test["Medical_History_9"].astype('str')+' '\ 
"Medical_History_10_"+X_test["Medical_History_10"].astype('str')+' '\ 
"Medical_History_11_"+X_test["Medical_History_11"].astype('str')+' '\ 
"Medical_History_12_"+X_test["Medical_History_12"].astype('str')+' '\ 
"Medical_History_13_"+X_test["Medical_History_13"].astype('str')+' '\ 
"Medical_History_14_"+X_test["Medical_History_14"].astype('str')+' '\ 
"Medical_History_16_"+X_test["Medical_History_16"].astype('str')+' '\ 
"Medical_History_17_"+X_test["Medical_History_17"].astype('str')+' '\ 
"Medical_History_18_"+X_test["Medical_History_18"].astype('str')+' '\ 
"Medical_History_19_"+X_test["Medical_History_19"].astype('str')+' '\ 
"Medical_History_20_"+X_test["Medical_History_20"].astype('str')+' '\ 
"Medical_History_21_"+X_test["Medical_History_21"].astype('str')+' '\ 
"Medical_History_22_"+X_test["Medical_History_22"].astype('str')+' '\ 
"Medical_History_23_"+X_test["Medical_History_23"].astype('str')+' '\ 
"Medical_History_25_"+X_test["Medical_History_25"].astype('str')+' '\ 
"Medical_History_26_"+X_test["Medical_History_26"].astype('str')+' '\ 
"Medical_History_27_"+X_test["Medical_History_27"].astype('str')+' '\ 
"Medical_History_28_"+X_test["Medical_History_28"].astype('str')+' '\ 
"Medical_History_29_"+X_test["Medical_History_29"].astype('str')+' '\ 
"Medical_History_30_"+X_test["Medical_History_30"].astype('str')+' '\ 
"Medical_History_31_"+X_test["Medical_History_31"].astype('str')+' '\ 
"Medical_History_33_"+X_test["Medical_History_33"].astype('str')+' '\ 
"Medical_History_34_"+X_test["Medical_History_34"].astype('str')+' '\ 
"Medical_History_35_"+X_test["Medical_History_35"].astype('str')+' '\ 
"Medical_History_36_"+X_test["Medical_History_36"].astype('str')+' '\ 
"Medical_History_37_"+X_test["Medical_History_37"].astype('str')+' '\ 
"Medical_History_38_"+X_test["Medical_History_38"].astype('str')+' '\ 
"Medical_History_39_"+X_test["Medical_History_39"].astype('str')+' '\ 
"Medical_History_40_"+X_test["Medical_History_40"].astype('str')+' '\ 
"Medical_History_41_"+X_test["Medical_History_41"].astype('str') 

vwX_T.to_csv('test.vw',mode='a', header=False,index=False) 

#create a list to be used below in pyvw 
vwX_lst=[] 
vwX_T_lst=[] 

for j in vwX.values: 
    vwX_lst.append(j) 

for j in vwX_T.values: 
    vwX_T_lst.append(j) 

Тогда я тренировалась модель, которая, казалось, хорошо работать:

import sys 
sys.path.append('/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python') 

import pyvw 
import sklearn_vw as slvw 

import numpy as np 
import pandas as pd 

from sklearn.cross_validation import train_test_split,KFold 
from sklearn.preprocessing import OneHotEncoder 
from sklearn.preprocessing import StandardScaler 

import ml_metrics 

mod=slvw.VWRegressor(passes=5, quadratic="aa ab") 
mod.fit(X=vwX_lst,convert_to_vw=False) 

preds = mod.predict (X = vwX_T_lst, convert_to_vw = False)

--------------------------------------------------------------------------- 
IndexError        Traceback (most recent call last) 
<ipython-input-128-e43aa19fc8f5> in <module>() 
    16 mod=slvw.VWRegressor(passes=5, quadratic="aa ab") 
    17 mod.fit(X=vwX_lst,convert_to_vw=False) 
---> 18 preds=mod.predict(X=vwX_T_lst,convert_to_vw=False) 
    19 

/home/anaconda/lib/python2.7/site-packages/vowpal_wabbit/python/sklearn_vw.pyc in predict(self, X, convert_to_vw) 
    255    ex.set_test_only(True) 
    256    ex.learn() 
--> 257    y[idx] = ex.get_simplelabel_prediction() 
    258    ex.finish() 
    259 

IndexError: index 1 is out of bounds for axis 0 with size 1 

ответ

2

Я просто наткнулся на это сам. Проблема заключается в 10 строках кода выше, где вы получаете ошибку. В нем говорится:

try: 
    num_samples = X.shape[0] if X.ndim > 1 else 1 
except AttributeError: 
    num_samples = 1 

num_samples затем используется для инициализации пустой Numpy массив такого размера:

y = np.empty([num_samples]) 

Так что, если X не имеет атрибут ndim или если X.ndim == 1, то sum_samples устанавливается в 1, и ваш массив нп инициализируется с размером 1.

Таким образом, когда второй счет предсказания получает положить в у вас получить индекс из оценки погрешности здесь:

y[idx] = ex.get_simplelabel_prediction() 

Я это исправил, изменив попробовать/за исключением кода использовать длину X:

try: 
    num_samples = X.shape[0] if X.ndim > 1 else len(X) 
except AttributeError: 
    num_samples = len(X) 
Смежные вопросы