2017-02-13 2 views
4

Еще я довольно новыми для питона, потребуется помощь в этом:Использование Python необходимо преобразовать данные из нескольких столбцов в один столбец и повторите колонка А

Данные, которые я имею в формате CSV, как это:

 Month YEAR  AZ-Phoenix CA-Los Angeles CA-San Diego CA-San Francisco CO-Denver DC-Washington 
    January 1987   59.33  54.67  46.61   50.20 
    February 1987   59.65  54.89  46.87   49.96  64.77 

И это необходимо объединить и отобразить в столбцах 2 и 3 путем увеличения столбца 1 n .. раз.

Вывод должен быть:

 
    Month YEAR       
    January 1987 AZ-Phoenix 
    January 1987 CA-Los Angeles  59.33 
    January 1987 CA-San Diego  54.67 
    January 1987 CA-San Francisco 46.61 
    January 1987 CO-Denver  50.20 

Как это может быть достигнуто в считывающем Csv?

ответ

2

Использование read_csv с сепаратором tab - \t или сепаратор 2 and more whitespaces Используйте piRSquared's решение:

import pandas as pd 

df = pd.read_csv(sep='\t') 

Я думаю, что вам нужно:

df = df.set_index('YEAR').stack(dropna=False).reset_index() 
df.columns = ['YEAR','A','B'] 
print (df) 
      YEAR     A  B 
0 January 1987  AZ-Phoenix 59.33 
1 January 1987 CA-Los Angeles 54.67 
2 January 1987   CA-San 46.61 
3 January 1987    Diego 50.20 
4 January 1987 CA-San Francisco NaN 
5 January 1987   CO-Denver NaN 
6 January 1987  DC-Washington NaN 
7 February 1987  AZ-Phoenix 59.65 
8 February 1987 CA-Los Angeles 54.89 
9 February 1987   CA-San 46.87 
10 February 1987    Diego 49.96 
11 February 1987 CA-San Francisco 64.77 
12 February 1987   CO-Denver NaN 
13 February 1987  DC-Washington NaN 

#if need remove rows with NaN 
df = df.set_index('YEAR').stack().reset_index() 
df.columns = ['YEAR','A','B'] 
print (df) 
      YEAR     A  B 
0 January 1987  AZ-Phoenix 59.33 
1 January 1987 CA-Los Angeles 54.67 
2 January 1987   CA-San 46.61 
3 January 1987    Diego 50.20 
4 February 1987  AZ-Phoenix 59.65 
5 February 1987 CA-Los Angeles 54.89 
6 February 1987   CA-San 46.87 
7 February 1987    Diego 49.96 
8 February 1987 CA-San Francisco 64.77 

Другое решение с melt:

df = pd.melt(df, id_vars='YEAR', value_name='B', var_name='A') 
print (df) 
      YEAR     A  B 
0 January 1987  AZ-Phoenix 59.33 
1 February 1987  AZ-Phoenix 59.65 
2 January 1987 CA-Los Angeles 54.67 
3 February 1987 CA-Los Angeles 54.89 
4 January 1987   CA-San 46.61 
5 February 1987   CA-San 46.87 
6 January 1987    Diego 50.20 
7 February 1987    Diego 49.96 
8 January 1987 CA-San Francisco NaN 
9 February 1987 CA-San Francisco 64.77 
10 January 1987   CO-Denver NaN 
11 February 1987   CO-Denver NaN 
12 January 1987  DC-Washington NaN 
13 February 1987  DC-Washington NaN 


#if need remove rows with NaN 
df = pd.melt(df, id_vars='YEAR', value_name='B', var_name='A').dropna(subset=['B']) 
print (df) 
      YEAR     A  B 
0 January 1987  AZ-Phoenix 59.33 
1 February 1987  AZ-Phoenix 59.65 
2 January 1987 CA-Los Angeles 54.67 
3 February 1987 CA-Los Angeles 54.89 
4 January 1987   CA-San 46.61 
5 February 1987   CA-San 46.87 
6 January 1987    Diego 50.20 
7 February 1987    Diego 49.96 
9 February 1987 CA-San Francisco 64.77 
+0

ли ДФ нуждается оператор импорта панд? и может ли это быть использовано в csv reader? – Viv

+0

Да, точно. Дайте мне некоторое время – jezrael

+0

# Approach 1 работает просто отлично. Благодаря! – Viv

2

вариант 1
использование pd.melt

pd.melt(df, 'YEAR') 

      YEAR   variable value 
0 January 1987  AZ-Phoenix 59.33 
1 February 1987  AZ-Phoenix 59.65 
2 January 1987 CA-Los Angeles 54.67 
3 February 1987 CA-Los Angeles 54.89 
4 January 1987  CA-San Diego 46.61 
5 February 1987  CA-San Diego 46.87 
6 January 1987 CA-San Francisco 50.20 
7 February 1987 CA-San Francisco 49.96 
8 January 1987   CO-Denver NaN 
9 February 1987   CO-Denver 64.77 
10 January 1987  DC-Washington NaN 
11 February 1987  DC-Washington NaN 

вариант 2
реконструировать с numpy инструментов

pd.DataFrame(dict(
     YEAR=df.YEAR.values.repeat(len(df.columns) - 1), 
     B=df.drop('YEAR', 1).values.ravel(), 
     A=np.tile(df.columns.difference(['YEAR']).values, len(df)), 
    ))[['YEAR', 'A', 'B']] 


      YEAR   variable value 
0 January 1987  AZ-Phoenix 59.33 
1 February 1987  AZ-Phoenix 59.65 
2 January 1987 CA-Los Angeles 54.67 
3 February 1987 CA-Los Angeles 54.89 
4 January 1987  CA-San Diego 46.61 
5 February 1987  CA-San Diego 46.87 
6 January 1987 CA-San Francisco 50.20 
7 February 1987 CA-San Francisco 49.96 
8 January 1987   CO-Denver NaN 
9 February 1987   CO-Denver 64.77 
10 January 1987  DC-Washington NaN 
11 February 1987  DC-Washington NaN 

установка

df = pd.read_csv(sep='\s{2,}', engine='python') 
+0

Январь и 1987 год - это два разных столбца, а при использовании первого кода он спасает первую строку. то есть в январе не отображается только разворот в 1987 году, AZ-Phoenix, 59,33. Как убедиться, что январь также считается – Viv

Смежные вопросы