2016-03-23 2 views
1

У меня есть следующий код:Python панд dataframe- удалить столбцы из заголовка

data = pd.read_csv('audit_nor.csv') 
d1 = pd.get_dummies(data) 
header = d1.columns.values 
print(header) 
print(type(header)) 

Выход выглядит следующим образом:

['ID' 'Age' 'Income' 'Deductions' 'Hours' 'Adjustment' 'Adjusted' 
'Employment_Consultant' 'Employment_PSFederal' 'Employment_PSLocal' 
'Employment_PSState' 'Employment_Private' 'Employment_SelfEmp' 
'Employment_Unemployed' 'Employment_Volunteer' 'Education_Associate' 
'Education_Bachelor' 'Education_College' 'Education_Doctorate' 
'Education_HSgrad' 'Education_Master' 'Education_Preschool' 
'Education_Professional' 'Education_Vocational' 'Education_Yr10' 
'Education_Yr11' 'Education_Yr12' 'Education_Yr5t6' 'Education_Yr7t8' 
'Education_Yr9' 'Marital_Absent' 'Marital_Divorced' 'Marital_Married' 
'Marital_Married-spouse-absent' 'Marital_Unmarried' 'Marital_Widowed' 
'Occupation_Cleaner' 'Occupation_Clerical' 'Occupation_Executive' 
'Occupation_Farming' 'Occupation_Machinist' 'Occupation_Professional' 
'Occupation_Repair' 'Occupation_Sales' 'Occupation_Service' 
'Occupation_Support' 'Occupation_Transport' 'Sex_Female' 'Sex_Male' 
'Accounts_Cuba' 'Accounts_England' 'Accounts_Germany' 'Accounts_India' 
'Accounts_Indonesia' 'Accounts_Iran' 'Accounts_Ireland' 'Accounts_Jamaica' 
'Accounts_Malaysia' 'Accounts_Mexico' 'Accounts_Philippines' 
'Accounts_Portugal' 'Accounts_UnitedStates' 'Accounts_Vietnam'] 
<type 'numpy.ndarray'> 

Я пытаюсь удалить «ID 'из заголовка, поэтому я могу удалить весь столбец «ID» из фрейма данных. Я сделал:

columns = header.delete('ID') 

но получить ошибки:

AttributeError: 'numpy.ndarray' object has no attribute 'delete' 

Я удивляюсь, каким должен быть правильный способ решить эту проблему. Благодаря!

+0

Что такое исключение (то есть, объяснить ошибки)? – ChrisP

+0

Сообщение об ошибке, обновленное выше. Благодаря! – Edamame

ответ

2

Вы можете использовать numpy.delete с numpy.where для поиска index:

import numpy as np 

print np.where(header=='ID') 
(array([0], dtype=int64),) 

columns = np.delete(header, np.where(header=='ID')) 
print columns 
['Age' 'Income' 'Deductions' 'Hours' 'Adjustment' 'Adjusted' 
'Employment_Consultant' 'Employment_PSFederal' 'Employment_PSLocal' 
'Employment_PSState' 'Employment_Private' 'Employment_SelfEmp' 
'Employment_Unemployed' 'Employment_Volunteer' 'Education_Associate' 
'Education_Bachelor' 'Education_College' 'Education_Doctorate' 
'Education_HSgrad' 'Education_Master' 'Education_Preschool' 
'Education_Professional' 'Education_Vocational' 'Education_Yr10' 
'Education_Yr11' 'Education_Yr12' 'Education_Yr5t6' 'Education_Yr7t8' 
'Education_Yr9' 'Marital_Absent' 'Marital_Divorced' 'Marital_Married' 
'Marital_Married-spouse-absent' 'Marital_Unmarried' 'Marital_Widowed' 
'Occupation_Cleaner' 'Occupation_Clerical' 'Occupation_Executive' 
'Occupation_Farming' 'Occupation_Machinist' 'Occupation_Professional' 
'Occupation_Repair' 'Occupation_Sales' 'Occupation_Service' 
'Occupation_Support' 'Occupation_Transport' 'Sex_Female' 'Sex_Male' 
'Accounts_Cuba' 'Accounts_England' 'Accounts_Germany' 'Accounts_India' 
'Accounts_Indonesia' 'Accounts_Iran' 'Accounts_Ireland' 'Accounts_Jamaica' 
'Accounts_Malaysia' 'Accounts_Mexico' 'Accounts_Philippines' 
'Accounts_Portugal' 'Accounts_UnitedStates' 'Accounts_Vietnam'] 

Или вы можете использовать list понимание для Вытащите ID:

columns = [x for x in header if x != 'ID'] 
print columns 
['Age', 'Income', 'Deductions', 'Hours', 'Adjustment', 'Adjusted', 'Employment_Consultant', 'Employment_PSFederal', 'Employment_PSLocal', 'Employment_PSState', 'Employment_Private', 'Employment_SelfEmp', 'Employment_Unemployed', 'Employment_Volunteer', 'Education_Associate', 'Education_Bachelor', 'Education_College', 'Education_Doctorate', 'Education_HSgrad', 'Education_Master', 'Education_Preschool', 'Education_Professional', 'Education_Vocational', 'Education_Yr10', 'Education_Yr11', 'Education_Yr12', 'Education_Yr5t6', 'Education_Yr7t8', 'Education_Yr9', 'Marital_Absent', 'Marital_Divorced', 'Marital_Married', 'Marital_Married-spouse-absent', 'Marital_Unmarried', 'Marital_Widowed', 'Occupation_Cleaner', 'Occupation_Clerical', 'Occupation_Executive', 'Occupation_Farming', 'Occupation_Machinist', 'Occupation_Professional', 'Occupation_Repair', 'Occupation_Sales', 'Occupation_Service', 'Occupation_Support', 'Occupation_Transport', 'Sex_Female', 'Sex_Male', 'Accounts_Cuba', 'Accounts_England', 'Accounts_Germany', 'Accounts_India', 'Accounts_Indonesia', 'Accounts_Iran', 'Accounts_Ireland', 'Accounts_Jamaica', 'Accounts_Malaysia', 'Accounts_Mexico', 'Accounts_Philippines', 'Accounts_Portugal', 'Accounts_UnitedStates', 'Accounts_Vietnam'] 
#if you need filter df by columns 
df = df[columns] 

Или фильтр array путем удаления первого элемента (ID должен быть первый элемент header):

columns = header[1:] 
print columns 
['Age' 'Income' 'Deductions' 'Hours' 'Adjustment' 'Adjusted' 
'Employment_Consultant' 'Employment_PSFederal' 'Employment_PSLocal' 
'Employment_PSState' 'Employment_Private' 'Employment_SelfEmp' 
'Employment_Unemployed' 'Employment_Volunteer' 'Education_Associate' 
'Education_Bachelor' 'Education_College' 'Education_Doctorate' 
'Education_HSgrad' 'Education_Master' 'Education_Preschool' 
'Education_Professional' 'Education_Vocational' 'Education_Yr10' 
'Education_Yr11' 'Education_Yr12' 'Education_Yr5t6' 'Education_Yr7t8' 
'Education_Yr9' 'Marital_Absent' 'Marital_Divorced' 'Marital_Married' 
'Marital_Married-spouse-absent' 'Marital_Unmarried' 'Marital_Widowed' 
'Occupation_Cleaner' 'Occupation_Clerical' 'Occupation_Executive' 
'Occupation_Farming' 'Occupation_Machinist' 'Occupation_Professional' 
'Occupation_Repair' 'Occupation_Sales' 'Occupation_Service' 
'Occupation_Support' 'Occupation_Transport' 'Sex_Female' 'Sex_Male' 
'Accounts_Cuba' 'Accounts_England' 'Accounts_Germany' 'Accounts_India' 
'Accounts_Indonesia' 'Accounts_Iran' 'Accounts_Ireland' 'Accounts_Jamaica' 
'Accounts_Malaysia' 'Accounts_Mexico' 'Accounts_Philippines' 
'Accounts_Portugal' 'Accounts_UnitedStates' 'Accounts_Vietnam'] 

#if you need filter df by columns 
df = df[columns] 

Но если вам нужно удалить столбец ID, используйте drop:

df = df.drop('ID', axis=1) 
Смежные вопросы