2013-06-17 2 views
0

У меня есть файл в следующем формате:Почему это регулярное выражение?

/* No comment provided by engineer. */ 
"Logout Successful!" = "Logout Successful!"; 

/* No comment provided by engineer. */ 
"London" = "London"; 

/* No comment provided by engineer. */ 
"Low Balance" = "Low Balance"; 

/* No comment provided by engineer. */ 
"Low-Cost Call" = "Low-Cost Call"; 

/* No comment provided by engineer. */ 
"Making A Low Cost Call" = "Making A Low Cost Call"; 

/* No comment provided by engineer. */ 
"Making FREE Calls" = "Making FREE Calls"; 

/* No comment provided by engineer. */ 
"MNO" = "MNO"; 

/* No comment provided by engineer. */ 
"more free credit" = "more free credit"; 

/* No comment provided by engineer. */ 
"My Phone Number" = "My Phone Number"; 

/* No comment provided by engineer. */ 
"My Purchase is Missing" = "My Purchase is Missing"; 

/* No comment provided by engineer. */ 
"Next" = "Next"; 

/* No comment provided by engineer. */ 
"NO" = "NO"; 

/* No comment provided by engineer. */ 
"No" = "No"; 

/* No comment provided by engineer. */ 
"No Balance" = "No Balance"; 

/* No comment provided by engineer. */ 
"Post Successful" = "Post Successful"; 

/* No comment provided by engineer. */ 
"Post to %d %@ Facebook Wall" = "Post to %1$d %[email protected] Facebook Wall"; 

/* No comment provided by engineer. */ 
"Post to Facebook Wall" = "Post to Facebook Wall"; 

/* No comment provided by engineer. */ 
"Post To My Facebook Wall" = "Post To My Facebook Wall"; 

/* No comment provided by engineer. */ 
"Post to My Wall" = "Post to My Wall"; 

/* No comment provided by engineer. */ 
"Posted" = "Posted"; 

/* No comment provided by engineer. */ 
"Posting" = "Posting"; 

/* No comment provided by engineer. */ 
"Posting to Your Facebook Wall..." = "Posting to Your Facebook Wall..."; 

/* No comment provided by engineer. */ 
"PQRS" = "PQRS"; 

/* No comment provided by engineer. */ 
"Proceed" = "Proceed"; 

/* No comment provided by engineer. */ 
"Proceed, Don't Show Again" = "Proceed, Don't Show Again"; 

/* No comment provided by engineer. */ 
"Processing..." = "Processing..."; 

/* No comment provided by engineer. */ 
"Purchase History" = "Purchase History"; 

/* No comment provided by engineer. */ 
"Rates" = "Rates"; 

/* No comment provided by engineer. */ 
"Remind me later" = "Remind me later"; 

/* No comment provided by engineer. */ 
"Restart" = "Restart"; 

/* No comment provided by engineer. */ 
"Retry Failed" = "Retry Failed"; 

/* No comment provided by engineer. */ 
"Return to %@ after each call ends" = "Return to %@ after each call ends"; 

/* No comment provided by engineer. */ 
"Return To App After Call" = "Return To App After Call"; 

/* No comment provided by engineer. */ 
"Roaming Support" = "Roaming Support"; 

/* No comment provided by engineer. */ 
"Roaming Warning!" = "Roaming Warning!"; 

/* No comment provided by engineer. */ 
"Searching..." = "Searching..."; 

/* No comment provided by engineer. */ 
"See The Time In Any Country" = "See The Time In Any Country"; 

/* No comment provided by engineer. */ 
"Select All" = "Select All"; 

/* No comment provided by engineer. */ 
"Select the number for an iPhone with %@" = "Select the number for an iPhone with %@"; 

/* No comment provided by engineer. */ 
"Send" = "Send"; 

/* No comment provided by engineer. */ 
"Send a Text Message" = "Send a Text Message"; 

/* No comment provided by engineer. */ 
"Sending..." = "Sending..."; 

/* No comment provided by engineer. */ 
"Settings" = "Settings"; 

/* No comment provided by engineer. */ 
"Show All" = "Show All"; 

/* No comment provided by engineer. */ 
"Show Me How" = "Show Me How"; 

/* No comment provided by engineer. */ 
"Show Selected" = "Show Selected"; 

/* No comment provided by engineer. */ 
"Sign In" = "Sign In"; 

/* No comment provided by engineer. */ 
"Signing in..." = "Signing in..."; 

/* No comment provided by engineer. */ 
"Skip" = "Skip"; 

/* No comment provided by engineer. */ 
"SMS" = "SMS"; 

/* No comment provided by engineer. */ 
"Speed Dial & Favorites" = "Speed Dial & Favorites"; 

/* No comment provided by engineer. */ 
"Store" = "Store"; 

/* No comment provided by engineer. */ 
"Success" = "Success"; 

/* No comment provided by engineer. */ 
"Success!" = "Success!"; 

/* No comment provided by engineer. */ 
"Support" = "Support"; 

/* No comment provided by engineer. */ 
"System Status" = "System Status"; 

/* No comment provided by engineer. */ 
"Tapjoy Offers" = "Tapjoy Offers"; 

/* No comment provided by engineer. */ 
"Tell %d Friend%@" = "Tell %1$d Friend%[email protected]"; 

/* No comment provided by engineer. */ 
"Tell Facebook Friends" = "Tell Facebook Friends"; 

/* No comment provided by engineer. */ 
"Tell Friends" = "Tell Friends"; 

/* No comment provided by engineer. */ 
"Tell Friends About %@" = "Tell Friends About %@"; 

/* No comment provided by engineer. */ 
"Tell via E-Mail" = "Tell via E-Mail"; 

/* No comment provided by engineer. */ 
"Tell via SMS" = "Tell via SMS"; 

/* No comment provided by engineer. */ 
"Test Call" = "Test Call"; 

/* No comment provided by engineer. */ 
"Text Message" = "Text Message"; 

/* No comment provided by engineer. */ 
"Try Again" = "Try Again"; 

/* No comment provided by engineer. */ 
"Turning Caller ID ON/OFF" = "Turning Caller ID ON/OFF"; 

/* No comment provided by engineer. */ 
"TUV" = "TUV"; 

/* No comment provided by engineer. */ 
"Tweet to Friends" = "Tweet to Friends"; 

/* No comment provided by engineer. */ 
"Unable to Call" = "Unable to Call"; 

/* No comment provided by engineer. */ 
"Unable to Check Talk Time" = "Unable to Check Talk Time"; 

/* No comment provided by engineer. */ 
"Unable to connect." = "Unable to connect."; 

/* No comment provided by engineer. */ 
"Unable to Create Account" = "Unable to Create Account"; 

/* No comment provided by engineer. */ 
"Unable to Purchase" = "Unable to Purchase"; 

/* No comment provided by engineer. */ 
"Unable to Sign In" = "Unable to Sign In"; 

/* No comment provided by engineer. */ 
"Unknown" = "Unknown"; 

/* No comment provided by engineer. */ 
"unknown caller" = "unknown caller"; 

/* No comment provided by engineer. */ 
"Unselect All" = "Unselect All"; 

/* No comment provided by engineer. */ 
"Updating Your Phone Number" = "Updating Your Phone Number"; 

/* No comment provided by engineer. */ 
"VoIP %@" = "VoIP %@"; 

/* No comment provided by engineer. */ 
"WARNING!" = "WARNING!"; 

Я хочу, чтобы разобрать это с помощью регулярных выражений, чтобы получить только ключи и значения без окружающих кавычек в словарь:

def load_replacement_dict(file_name): 
    with open(file_name, 'r') as f: 
     content = f.read() 
     resultDict = {} 

     dictionary_regex = re.compile('"([^"]*)" = "([^"]*)"',) 

     for result in dictionary_regex.finditer(content): 
      resultDict[result.group(1)] = result.group(2) 

     for key, value in resultDict.items(): 
      print (key+" = "+value).decode('utf-8') 

     return resultDict 

Первая подгруппа но когда я добавляю что-либо после этого, он перестает соответствовать. Я пробовал использовать пространство, используя \ s, и ничто не похоже на пробелы вокруг знаков равенства. Что мне здесь не хватает?

EDIT: Я обнаружил, что если я удалю маркер порядка байтов юникода с начала файла, тогда будет работать регулярное выражение. Не решение очевидно, но, возможно, подсказка о том, как можно модифицировать регулярное выражение?

+0

[Regexplanet] (http://www.regexplanet.com/advanced/python/index.html) может вам помочь. Как бы то ни было, я не знаю, чего вы хотите или чего не хотите, но в регулярном выражении в коде отсутствует буквальная двойная кавычка вокруг второй группы захвата. –

+0

Отредактировано для исправления кавычек. У меня были они, но я удалял вещи, пытаясь заставить их работать и забыл прочитать вопрос. –

+0

Возможно ли, чтобы инженер предоставил комментарий? –

ответ

1

В конечном итоге это проблема с кодировкой. Файл был UTF-16. Размещен:

with codecs.open(file_name, 'r', 'utf-16') as f: 

Регулярно работает нормально.

3

Чтобы избежать сбежавших проблем цитаты, вы можете использовать этот

"((?:[^"]+|(?<=\\)")*)" = "((?:[^"]+|(?<=\\)")*)" 
5

Мне кажется, что то, что вы пытаетесь достичь может быть проще сделать при помощи строковых методов вместо регулярных выражений:

>>> s = '"A Key With \"quotes\" in it" = " Another Value "' 
>>> l,r = [v.strip().strip('"').strip() for v in s.split('=')] 
>>> l,r 
('A Key With "quotes" in it', 'Another Value') 

Ускорение будет сохранено, оно потеряется выше только из-за того, как я создал строку. Я прочитать текст из файла, то, что происходит это:

In [1]: lines = open('x.txt').read().splitlines() 

In [2]: for s in lines: print [v.strip().strip('"').strip() for v in s.split('=')] 
    ...: 
['Some Key', 'Some Value'] 
['Another Key', 'Another Value'] 
['A Key With \\"quotes\\" in it', 'Another Value'] 
+0

Это лучший подход. –

+0

'' 'должен быть' '' ключ с \\ "кавычками \\" в нем "=" другое значение "''. – falsetru

+0

Ключи и значения должны оставаться экранированными –

1

Вы не проверяя кавычки значения в регулярном выражении, поэтому он не может сравниться. Кроме того, чтобы обрабатывать сбежавшую кавычку внутри ключа или значения, я верю, это должно покрыть:

dictionary_regex = re.compile(r'"((?:(?:\\")|[^"])*)" = "((?:(?:\\")|[^"])*)"') 
0

С образцами ключевых парами значений, которые были опубликованы следующим регулярное выражение, кажется, работает:

re.compile('"(.*)" = "(.*)"') 

Я что-то упускаю?

Смежные вопросы