2017-01-17 5 views
1

Я хочу получить значение другого столбца на основе значения в определенном столбце в той же строке.pandas df locate сохранить только первый элемент

пример:

для бизнеса ID = '123', я хочу, чтобы получить BUSINESS_NAME находится

ДФ:

biz_id biz_name 
123  chew 
456  bite 
123  chew 

код:

df['biz_name'].loc[df['biz_id'] == 123] 

возвращает меня:

chew 
chew 

Как получить только 1 значение 'chew' в строковом формате?

ответ

1

Вы можете использовать iloc или iat для выбора первого значения Series:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0]) 
chew 

Или:

print (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0]) 
chew 

С query:

print (df.query('biz_id == 123')['biz_name'].iloc[0]) 
chew 

Или выбрать первое значение в list или numpy array:

print (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0]) 
chew 

print (df.loc[df['biz_id'] == 123, 'biz_name'].values[0]) 
chew 

Timings:

In [18]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iloc[0]) 
1000 loops, best of 3: 399 µs per loop 

In [19]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].iat[0]) 
The slowest run took 4.16 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 391 µs per loop 

In [20]: %timeit (df.query('biz_id == 123')['biz_name'].iloc[0]) 
The slowest run took 4.39 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 1.75 ms per loop 

In [21]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].tolist()[0]) 
The slowest run took 4.18 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 384 µs per loop 

In [22]: %timeit (df.loc[df['biz_id'] == 123, 'biz_name'].values[0]) 
The slowest run took 5.32 times longer than the fastest. This could mean that an intermediate result is being cached. 
1000 loops, best of 3: 370 µs per loop 

In [23]: %timeit (df.loc[df.biz_id.eq(123).idxmax(), 'biz_name']) 
1000 loops, best of 3: 517 µs per loop 
2

Используйте idxmax, чтобы захватить индекс первого максимального значения

df.loc[df.biz_id.eq(123).idxmax(), 'biz_name'] 

'chew' 
Смежные вопросы