2015-07-09 4 views
2

Мне регулярно приходится извлекать изображения, которые были скопированы в файлы Excel. К сожалению, эти файлы входят в презренный формат XLS. Итак, поскольку простой unzip-трюк не работает, я решил попробовать сделать небольшой скрипт python для этого.Как извлечь вложение PNG в файл Excel XLS?

(Извлечение изображений является болезненным, как я есть на самом деле скопировать и вставить в Paint, чтобы сохранить их. Там нет Сохранить как ... или Экспорт кнопки.)

Если вы посмотрите на PNG (или уже знаете это), вы увидите, что в основном он начинается с маркера èPNG и заканчивается куском IEND.

Так что я попытался следующий код:

import sys 
import os 

def info(s): 
    print("[i] "+s) 

info("Opening file: " + sys.argv[1]) 
with open(sys.argv[1],'rb') as f: 
    buf = f.read() 
info("File read") 

offset_s = buf.find(b'\x89PNG\x0D\x0A\x1A\x0A') 
if offset_s == -1: 
    error("PNG not found") 
    os.exit(-1) 
else: 
    info("PNG start found at offset: {}".format(offset_s)) 
offset_e = buf.find(b'IEND') 
if offset_e == -1: 
    error("PNG not found") 
    os.exit(-1) 
else: 
    offset_e += 8 
    info("PNG end found at offset: {}".format(offset_e)) 

with open("out.png", "wb") as f: 
    f.write(buf[offset_s:offset_e]) 
info("Written to out.png") 

Так извлекает данные. Но данные PNG повреждены (в блоке IDAT), поэтому он не отображается должным образом. Вот результат pngcheck бега:

File: out.png (221879 bytes) 
    chunk IHDR at offset 0x0000c, length 0 
    1366 x 768 image, 24-bit RGB, non-interlaced 
    chunk sRGB at offset 0x00025, length 0 
    rendering intent = perceptual 
    chunk pHYs at offset 0x00032, length 0: 3780x3780 pixels/meter (96 dpi) 
    chunk IDAT at offset 0x00047, length 0 
    zlib: deflated, 32K window, fast compression 
    CRC error in chunk IDAT (actual 632dd60d, should be 5985ed29) 
    Chunk name fffffffb 02 ffffff8a 5e doesn't conform to naming rules. 
    chunk ?? at offset 0x10008, length 0 

Как вы думаете (или знаете, за то, - но я не нашел эту информацию, когда пробовал?) Хранит Excel PNG файлов с определенным (или даже патентованными) фильтрами/алгоритм сжатия?

Любая идея о том, как я могу заставить его работать?

Редактировать - продолжение исследования: Я продолжал анализ дальше. Я взял более крупное изображение, положил его в пустой файл Excel и сохранил как XLS.

Затем я извлек его своим предыдущим инструментом и сделал новый, чтобы идентифицировать 4-байтные элементы, добавленные Excel. Здесь идет код:

import sys 
import os 
import binascii 

def info(s): 
    print("[i] "+s) 

def die(s): 
    print("[!] "+s) 
    sys.exit(-1) 

info("Opening original file: " + sys.argv[1]) 
i = 0 
with open(sys.argv[1], 'rb') as original: 
    info("Opening changed file: " + sys.argv[2]) 
    with open(sys.argv[2], 'rb') as changed: 
     o_byte = original.read(1) 
     c_byte = changed.read(1) 
     while o_byte != b"": 
      if c_byte == b"": 
       die("Error reading from changed file.") 
      while c_byte != o_byte: 
       info("{:08X} - Found diff: 0x{:02X} 0x{:02X} 0x{:02X} 0x{:02X}".format(i, ord(c_byte), ord(changed.read(1)), ord(changed.read(1)), ord(changed.read(1)))) 
       i += 4 
       c_byte = changed.read(1) 
      o_byte = original.read(1) 
      c_byte = changed.read(1) 
      i += 1 

Бега это против моих оригинальных и XLS экстракции PNG файлов, я получаю следующее из пут:

[i] Opening original file: test1.PNG 
[i] Opening changed file: out.png 
[i] 00001FAB - Found diff: 0xEB 0x00 0x20 0x20 
[i] 00003FCF - Found diff: 0x3C 0x00 0x20 0x20 
[i] 00005FF3 - Found diff: 0x3C 0x00 0x20 0x20 
[i] 00008017 - Found diff: 0x3C 0x00 0x20 0x20 
[i] 000090BE - Found diff: 0x81 0x00 0x00 0x00 
[i] 000090C2 - Found diff: 0x82 0x00 0x00 0x00 
[i] 000090C6 - Found diff: 0x83 0x00 0x00 0x00 
[i] 000090CA - Found diff: 0x84 0x00 0x00 0x00 
[i] 000090CE - Found diff: 0x85 0x00 0x00 0x00 
[i] 000090D2 - Found diff: 0x86 0x00 0x00 0x00 
[i] 000090D6 - Found diff: 0x87 0x00 0x00 0x00 
[i] 000090DA - Found diff: 0x88 0x00 0x00 0x00 
[i] 000090DE - Found diff: 0x89 0x00 0x00 0x00 
[i] 000090E2 - Found diff: 0x8A 0x00 0x00 0x00 
[i] 000090E6 - Found diff: 0x8B 0x00 0x00 0x00 
[i] 000090EA - Found diff: 0x8C 0x00 0x00 0x00 
[i] 000090EE - Found diff: 0x8D 0x00 0x00 0x00 
[i] 000090F2 - Found diff: 0x8E 0x00 0x00 0x00 
[i] 000090F6 - Found diff: 0x8F 0x00 0x00 0x00 
[i] 000090FA - Found diff: 0x90 0x00 0x00 0x00 
[i] 000090FE - Found diff: 0x91 0x00 0x00 0x00 
[i] 00009102 - Found diff: 0x92 0x00 0x00 0x00 
[i] 00009106 - Found diff: 0x93 0x00 0x00 0x00 
[i] 0000910A - Found diff: 0x94 0x00 0x00 0x00 
[i] 0000910E - Found diff: 0x95 0x00 0x00 0x00 
[i] 00009112 - Found diff: 0x96 0x00 0x00 0x00 
[i] 00009116 - Found diff: 0x97 0x00 0x00 0x00 
[i] 0000911A - Found diff: 0x98 0x00 0x00 0x00 
[i] 0000911E - Found diff: 0x99 0x00 0x00 0x00 
[i] 00009122 - Found diff: 0x9A 0x00 0x00 0x00 
[i] 00009126 - Found diff: 0x9B 0x00 0x00 0x00 
[i] 0000912A - Found diff: 0x9C 0x00 0x00 0x00 
[i] 0000912E - Found diff: 0x9D 0x00 0x00 0x00 
[i] 00009132 - Found diff: 0x9E 0x00 0x00 0x00 
[i] 00009136 - Found diff: 0x9F 0x00 0x00 0x00 
[i] 0000913A - Found diff: 0xA0 0x00 0x00 0x00 
[i] 0000913E - Found diff: 0xA1 0x00 0x00 0x00 
[i] 00009142 - Found diff: 0xA2 0x00 0x00 0x00 
[i] 00009146 - Found diff: 0xA3 0x00 0x00 0x00 
[i] 0000914A - Found diff: 0xA4 0x00 0x00 0x00 
[i] 0000914E - Found diff: 0xA5 0x00 0x00 0x00 
[i] 00009152 - Found diff: 0xA6 0x00 0x00 0x00 
[i] 00009156 - Found diff: 0xA7 0x00 0x00 0x00 
[i] 0000915A - Found diff: 0xA8 0x00 0x00 0x00 
[i] 0000915E - Found diff: 0xA9 0x00 0x00 0x00 
[i] 00009162 - Found diff: 0xAA 0x00 0x00 0x00 
[i] 00009166 - Found diff: 0xAB 0x00 0x00 0x00 
[i] 0000916A - Found diff: 0xAC 0x00 0x00 0x00 
[i] 0000916E - Found diff: 0xAD 0x00 0x00 0x00 
[i] 00009172 - Found diff: 0xAE 0x00 0x00 0x00 
[i] 00009176 - Found diff: 0xAF 0x00 0x00 0x00 
[i] 0000917A - Found diff: 0xB0 0x00 0x00 0x00 
[i] 0000917E - Found diff: 0xB1 0x00 0x00 0x00 
[i] 00009182 - Found diff: 0xB2 0x00 0x00 0x00 
[i] 00009186 - Found diff: 0xB3 0x00 0x00 0x00 
[i] 0000918A - Found diff: 0xB4 0x00 0x00 0x00 
[i] 0000918E - Found diff: 0xB5 0x00 0x00 0x00 
[i] 00009192 - Found diff: 0xB6 0x00 0x00 0x00 
[i] 00009196 - Found diff: 0xB7 0x00 0x00 0x00 
[i] 0000919A - Found diff: 0xB8 0x00 0x00 0x00 
[i] 0000919E - Found diff: 0xB9 0x00 0x00 0x00 
[i] 000091A2 - Found diff: 0xBA 0x00 0x00 0x00 
[i] 000091A6 - Found diff: 0xBB 0x00 0x00 0x00 
[i] 000091AA - Found diff: 0xBC 0x00 0x00 0x00 
[i] 000091AE - Found diff: 0xBD 0x00 0x00 0x00 
[i] 000091B2 - Found diff: 0xBE 0x00 0x00 0x00 
[i] 000091B6 - Found diff: 0xBF 0x00 0x00 0x00 
[i] 000091BA - Found diff: 0xC0 0x00 0x00 0x00 
[i] 000091BE - Found diff: 0xC1 0x00 0x00 0x00 
[i] 000091C2 - Found diff: 0xC2 0x00 0x00 0x00 
[i] 000091C6 - Found diff: 0xC3 0x00 0x00 0x00 
[i] 000091CA - Found diff: 0xC4 0x00 0x00 0x00 
[i] 000091CE - Found diff: 0xC5 0x00 0x00 0x00 
[i] 000091D2 - Found diff: 0xC6 0x00 0x00 0x00 
[i] 000091D6 - Found diff: 0xC7 0x00 0x00 0x00 
[i] 000091DA - Found diff: 0xC8 0x00 0x00 0x00 
[i] 000091DE - Found diff: 0xC9 0x00 0x00 0x00 
[i] 000091E2 - Found diff: 0xCA 0x00 0x00 0x00 
[i] 000091E6 - Found diff: 0xCB 0x00 0x00 0x00 
[i] 000091EA - Found diff: 0xCC 0x00 0x00 0x00 
[i] 000091EE - Found diff: 0xCD 0x00 0x00 0x00 
[i] 000091F2 - Found diff: 0xCE 0x00 0x00 0x00 
[i] 000091F6 - Found diff: 0xCF 0x00 0x00 0x00 
[i] 000091FA - Found diff: 0xD0 0x00 0x00 0x00 
[i] 000091FE - Found diff: 0xD1 0x00 0x00 0x00 
[i] 00009202 - Found diff: 0xD2 0x00 0x00 0x00 
[i] 00009206 - Found diff: 0xD3 0x00 0x00 0x00 
[i] 0000920A - Found diff: 0xD4 0x00 0x00 0x00 
[i] 0000920E - Found diff: 0xD5 0x00 0x00 0x00 
[i] 00009212 - Found diff: 0xD6 0x00 0x00 0x00 
[i] 00009216 - Found diff: 0xD7 0x00 0x00 0x00 
[i] 0000921A - Found diff: 0xD8 0x00 0x00 0x00 
[i] 0000921E - Found diff: 0xD9 0x00 0x00 0x00 
[i] 00009222 - Found diff: 0xDA 0x00 0x00 0x00 
[i] 00009226 - Found diff: 0xDB 0x00 0x00 0x00 
[i] 0000922A - Found diff: 0xDC 0x00 0x00 0x00 
[i] 0000922E - Found diff: 0xDD 0x00 0x00 0x00 
[i] 00009232 - Found diff: 0xDE 0x00 0x00 0x00 
[i] 00009236 - Found diff: 0xDF 0x00 0x00 0x00 
[i] 0000923A - Found diff: 0xE0 0x00 0x00 0x00 
[i] 0000923E - Found diff: 0xE1 0x00 0x00 0x00 
[i] 00009242 - Found diff: 0xE2 0x00 0x00 0x00 
[i] 00009246 - Found diff: 0xE3 0x00 0x00 0x00 
[i] 0000924A - Found diff: 0xE4 0x00 0x00 0x00 
[i] 0000924E - Found diff: 0xE5 0x00 0x00 0x00 
[i] 00009252 - Found diff: 0xE6 0x00 0x00 0x00 
[i] 00009256 - Found diff: 0xE7 0x00 0x00 0x00 
[i] 0000925A - Found diff: 0xE8 0x00 0x00 0x00 
[i] 0000925E - Found diff: 0xE9 0x00 0x00 0x00 
[i] 00009262 - Found diff: 0xEA 0x00 0x00 0x00 
[i] 00009266 - Found diff: 0xEB 0x00 0x00 0x00 
[i] 0000926A - Found diff: 0xEC 0x00 0x00 0x00 
[i] 0000926E - Found diff: 0xED 0x00 0x00 0x00 
[i] 00009272 - Found diff: 0xEE 0x00 0x00 0x00 
[i] 00009276 - Found diff: 0xEF 0x00 0x00 0x00 
[i] 0000927A - Found diff: 0xF0 0x00 0x00 0x00 
[i] 0000927E - Found diff: 0xF1 0x00 0x00 0x00 
[i] 00009282 - Found diff: 0xF2 0x00 0x00 0x00 
[i] 00009286 - Found diff: 0xF3 0x00 0x00 0x00 
[i] 0000928A - Found diff: 0xFE 0xFF 0xFF 0xFF 
[i] 0000928E - Found diff: 0xFE 0xFF 0xFF 0xFF 
[i] 00009292 - Found diff: 0xF6 0x00 0x00 0x00 
[i] 00009296 - Found diff: 0xFE 0xFF 0xFF 0xFF 
[i] 0000929A - Found diff: 0xFE 0xFF 0xFF 0xFF 
[i] 0000929E - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092A2 - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092A6 - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092AA - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092AE - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092B2 - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092B6 - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 000092BA - Found diff: 0xFF 0xFF 0xFF 0xFF 
[i] 0000A23B - Found diff: 0x3C 0x00 0x20 0x20 
[i] 0000C25F - Found diff: 0x3C 0x00 0x20 0x20 
[i] 0000E283 - Found diff: 0x3C 0x00 0x20 0x20 
[i] 000102A7 - Found diff: 0x3C 0x00 0x20 0x20 
[i] 000122CB - Found diff: 0x3C 0x00 0x20 0x20 
[i] 000142EF - Found diff: 0x3C 0x00 0x20 0x20 
[i] 00016313 - Found diff: 0x3C 0x00 0x20 0x20 
[i] 00018337 - Found diff: 0x3C 0x00 0x20 0x20 
[i] 0001A35B - Found diff: 0x3C 0x00 0x0D 0x0B 

Кто, черт возьми, этот парень 0x3C? И почему Excel начинает подсчет в какой-то момент? (0x81, 0x82, 0x83 ...)

Edit - дополнительные указатели: кажется 0x003C является идентификатором CONTINUE записи в формате файла Excel, как описано в https://www.openoffice.org/sc/excelfileformat.pdf

И подсчет может быть составной документ SSAT-таблица, но я не уверен.

Но до сих пор не знаю о 0xEB.

+1

После создания небольшого тестового документа: Если я сравниваю сохраненную длину блока IDAT с указанным размером, я также получаю несоответствие: оно больше 8 байтов.Excel должен вставлять что-то внутри встроенных двоичных файлов. – usr2564301

+0

Хорошо, я могу подтвердить, что: в моем файле разница в 4 байта. Excel добавил 4 байта в '0x1FAA'. Кроме того, файл был идентичен. Я полагаю, что Excel добавляет какие-то дополнения или информацию внутри куска для каждого байта 'n'. – DaLynX

ответ

0

Если вы используете Windows или используете VM для этой задачи, вы можете использовать интерфейс COM для этого - вы даже можете использовать его с Python с помощью pywin32. Взгляните на этот вопрос, например: Export Charts from Excel as images using Python.

+0

Спасибо, что ответили. Это будет работать, но я ищу способ, который не требовал бы фактической установки Excel. – DaLynX

Смежные вопросы