2016-05-25 3 views
2

Я хотел бы создать корпус, составленный телом разных статей, хранящихся в формате JSON. Они находятся в разных файлах, названных в честь года, например:Создание корпуса из разных файлов JSON

with open('Scot_2005.json') as f: 
    data = [json.loads(line) for line in f] 

соответствует газете Scotsman за 2005 год Кроме того, остальные файлы для этой газеты названы: APJ_2006 .... APJ2015 , Также. У меня есть другая газета, Scottish Daily Mail, которая идет только с 2014-1015 годов: SDM_2014, SDM_2015. Я хотел бы создать общий список с телом всех этих статей:

doc_set = [d['body'] for d in data] 

Моей проблема зацикливания первой части кода, который я разместил так, что данные соответствует всем статьям, а не только те, от а данной газете в конкретном году. Любые идеи о том, как выполнить эту задачу? В моей попытке я попробовать использовать панда такие:

for i in range(2005,2016): 
    df = pandas.DataFrame([json.loads(l) for l in open('Scot_%d.json' % i)]) 

doc_set = df.body 

Проблема с этим методом, мне кажется, что: он не добавляет все года; Я не уверен, как включать другие газеты с временными интервалами, отличными от 2005-15. Результатом этого метода выглядит следующим образом:

date 
2015-12-31 The Institute of Directors (IoD) has added its... 
2015-12-31 It is startling to see how much the Holyrood l... 
2015-12-31 A hike in interest rates in the new year will ... 
2015-12-31 The First Minister has resolved to make 2016 a... 
2015-12-30 The Scottish Government announced yesterday th... 
2015-12-30 The Footsie closed lower amid falling oil pric... 
2015-12-28 BEFORE we start the guessing game for 2016, a ... 
2015-12-27 AS WE ushered in 2015, few would have predicte... 
2015-12-23 No matter how hard Derek McInnes and his Aberd... 
2015-12-21 THE HEAD of a Scottish Government task force s... 
2015-12-17 A Scottish local authority has fought off a le... 
2015-12-17 Markets lifted after the Federal Reserve hiked... 
2015-12-17 Significant increases in UK quotas for fish in... 
2015-12-17 WAR of words with Donald Trump suggests its ti... 
2015-12-16 SCOTLAND'S national performance companies have... 
2015-12-15 Markets jumped ahead of what investors expect ... 
2015-12-14 Political uncertainty in back seat as transpor... 
2015-12-11 The International Monetary Fund (IMF) has warn... 
2015-12-08 Scotland has a "spring in its step" with the j... 
2015-12-07 London's leading share index struggled for dir... 
2015-12-03 REDUCING carbon is just the start of it, write... 
2015-11-26 One of the country's most prized salmon rivers... 
2015-11-23 Tax and legislative changes undermine strong f... 
2015-11-23 A second House of Lords committee has called f... 
2015-11-14 At first glance, Scotland's economic performan... 
2015-11-13 THE United States has long been viewed as the ... 
2015-11-12 IT IS vital for a new governance group to rest... 
2015-11-12 Former SSE chief Ian Marchant has criticised r... 
2015-11-11 Telecoms firm TalkTalk said it will take a hit... 
2015-11-09 Improvements to consumer rights legislation ma... 
            ...       
2015-02-25 Traders baulked at an assault on the 7,000 lev... 
2015-02-24 BRITISH military personnel are to be deployed ... 
2015-02-20 DAVID Cameron has announced a £859 million inv... 
2015-02-16 Falling oil prices and slowing inflation have ... 
2015-02-14 DEFENCE spending cuts and falling oil prices h... 
2015-02-14 Brent crude rallied to a 2015 high and helped ... 
2015-02-12 THE HOUSING markets in Scotland and Northern I... 
2015-02-10 INVESTMENT in Scotland's commercial property m... 
2015-02-09 Investors took flight after Greece's new gover... 
2015-02-01 Experts say large numbers are delaying decisio... 
2015-01-29 MORE than 300 jobs are at risk after Tesco sai... 
2015-01-27 THE Three Bears have hit out at the Rangers bo... 
2015-01-21 GEORGE Osborne has challenged the right of SNP... 
2015-01-19 Employment figures this week should show Briti... 
2015-01-19 Why haven't petrol pump prices fallen as fast ... 
2015-01-18 Without an agreement on immediate action, the... 
2015-01-17 A SECOND independence referendum could be trig... 
2015-01-14 THE RETAILER, which like its rivals has come u... 
2015-01-14 HOUSE prices in Scotland rose by more than 4 p... 
2015-01-13 HOUSE builder Taylor Wimpey is preparing for a... 
2015-01-13 Supermarket group Sainsbury's today said it wo... 
2015-01-13 INFLATION has tumbled to its lowest level on r... 
2015-01-12 BUSINESSES are bullish about their ­prospects ... 
2015-01-11 FOR decades, oil has dripped through our natio... 
2015-01-09 Shares in the housebuilding sector fell heavil... 
2015-01-08 THE Bank of England is expected to leave inter... 
2015-01-05 COMPANIES in Scotland are more optimistic abou... 
2015-01-04 UK is doing OK, but uncertainty looms on mid-y... 
2015-01-02 The London market began the new year in a subd... 
2015-01-02 The famous election mantra of Bill Clinton's c... 
Name: body, dtype: object 
+2

Итак, где [mcve] * вашей попытки * сделать это, и в чем проблема с этим? – jonrsharpe

+1

Я не вижу попыток зациклиться на названиях газет или лет. Может, попробуй? – jonrsharpe

+0

@jonrshape, я просто обновил вопрос, как вы можете видеть, используя Pandas. Я не могу создать список. –

ответ

1

Если у вас есть список файлов:

file_name_list = ('Scot_2005.json', 'APJ_2006.json') 

Вы можете append в список, как это:

data = list() 
for file_name in file_name_list: 
    with open(file_name, 'r') as json_file: 
     for line in json_file: 
      data.append(json.loads(line)) 

Если вы хотите создать file_name_list программно, вы можете использовать библиотеку glob.

+0

спасибо, хорошо, хорошо! –

Смежные вопросы