2015-10-09 3 views
0

Я пытаюсь преобразовать xml-файл в файл csv, я попробовал bash-скрипт awk, xmlstarlet, но не повезло, теперь я пробую это в python, но все равно не повезло, ниже - мой пример XML-файлконвертировать xml-файл в файл csv с помощью python

<items><item> 
<Name>demo title 1</Name> 
<FileType>image</FileType> 
<ReleaseDate>15 May 2015</ReleaseDate> 
<Quality> 
HDRiP</Quality> 
<size>2848292</size> 
<Rating>6.6</Rating> 
<Genre>Comedy, 
Music</Genre> 
<Cast>rules bank demo, 
anademo demo 2, 
Hai demo 3, 
Ale Demo 4</Cast> 
<Languages>English</Languages> 
<Subtitles> 
hindi</Subtitles> 
<FileName>demo title 1 fname</FileName> 
<FileSize>1.4GB</FileSize> 
<NoOfFiles>5</NoOfFiles> 
<UploadTime>4 months</UploadTime> 
<DateOfDataCapture>May 29, 2015</DateOfDataCapture> 
<TimesDownloaded>2,339</TimesDownloaded> 
<UpVotes>+742</UpVotes> 
<DownVotes>-37</DownVotes> 
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType> 
<Summary>this is demo pics 
collected for wallpapers only it is free available on many app and urls. 

Written by 

demo1.Cdemo324.78K 

report summary</Summary> 
</item><item> 
<Name>demo title 2</Name> 
<FileType>image</FileType> 
<ReleaseDate>16 May 2015</ReleaseDate> 
<Quality> 
HDRiP</Quality> 
<size>2855292</size> 
<Rating>6.9</Rating> 
<Genre>Comedy, 
Music</Genre> 
<Cast>rules bank demo, 
anademo demo 12, 
Hai demo 13, 
Ale Demo 14</Cast> 
<Languages>English</Languages> 
<Subtitles> 
hindi</Subtitles> 
<FileName>demo title 2 fname</FileName> 
<FileSize>1.3GB</FileSize> 
<NoOfFiles>5</NoOfFiles> 
<UploadTime>4 months</UploadTime> 
<DateOfDataCapture>May 29, 2015</DateOfDataCapture> 
<TimesDownloaded>2,339</TimesDownloaded> 
<UpVotes>+742</UpVotes> 
<DownVotes>-37</DownVotes> 
<MediaType>[1080p, 720p, Blu-Ray, BDRip, HDRiP, DVD, DVDRip, x264, WEB-DL, Cam]</MediaType> 
<Summary>this is demo pics 2 
collected for wallpapers only it is free available on many app and urls. 

Written by 

demo2.C2demo324.78K 

report summary</Summary> 
</item> 
</items> 

i want convert into csv file and each <item> records should be in same line , 

when i am trying to use xml parser , it is converted records into csv file but issue is my tag values in multiple line and also contain new line character so it is converted csv in same way like 
below is sample csv file converted. 
demo title 1,image,15 May 2015, 
HDRiP, 
2848292,6.6,Comedy, 
Music,rules bank demo, 
anademo demo 2, 
Hai demo 3, 
Ale Demo 4,English 

i want it new line character should be replace by space so all records of single items saved in one row in csv file . 

я пытался питон XML Parser xml2csv тоже, но подоконника не повез, вписатется не подсказываю, как я мог прочитать файл XML и удалить этот нежелательный символ новой строки с пространством.

+0

Пожалуйста, обратите внимание на [редактирование-помощь] (http://stackoverflow.com/editing-help). – Cyrus

ответ

0

попробовать так:

 import csv 
    from lxml import etree 

    # in: xml with trader joe's locations 
     # out: csv with trader joe's locations 

     out = raw_input("Name for output file: ") 
    if out.strip() is "": 
    out = "trader-joes-all-locations.csv" 

    out_data = [] 

    # use recover=True to ignore errors in the XML 
      # examples of errors in this XML: 
     # missing "<" in opening tag: 
      # fax></fax> 
     # missing "</" in closing tag: 
      # <uid>1429860810uid> 
      # 
     # also ignore blank text 
    parser = etree.XMLParser(recover=True, remove_blank_text=True) 

     # xml on disk...could also pass etree.parse a URL 
     file_name = "trader-joes-all-locations.xml" 

     # use lxml to read and parse xml 
      root = etree.parse(file_name, parser) 

     # element names with data to keep 
     tag_list = [ "name", "address1", "address2", "beer", "city",      "comingsoon", "hours", "latitude", "longitude", "phone", "postalcode", "spirits", "state", "wine" ] 

    # add field names by copying tag_list 
    out_data.append(tag_list[:]) 

    def missing_location(p): 
    lat = p.find("latitude") 
    lon = p.find("longitude") 
if lat is None or lon is None: 
return True 
else: 
    return False 

     # pull info out of each poi node 
    def get_poi_info(p): 
    # if latitude or longitude doesn't exist, skip 
     if missing_location(p): 
     print "tMissing location for %s" % p.find("name").text 
return None 
    info = [] 
    for tag in tag_list: 
    # if tag == "name": 
    # print "%s" % p.find(tag).text 
    node = p.find(tag) 
    if node is not None and node.text: 
    if tag == "latitude" or tag == "longitude": 
    info.append(round(float(node.text), 5)) 
    else: 
    info.append(node.text.encode("utf-8")) 
    # info.append(node.text.encode("ascii", "ignore")) 
else: 
    info.append("") 
return info 

print "nreading xml..." 

# get all <poi> elements 
pois = root.findall(".//poi") 
    for p in pois: 
    poi_info = get_poi_info(p) 
# print "%s" % (poiInfo) 
if poi_info: 
out_data.append(poi_info) 

print "finished xml, writing file..." 

out_file = open(out, "wb") 
csv_writer = csv.writer(out_file, quoting=csv.QUOTE_MINIMAL) 
    for row in out_data: 
csv_writer.writerow(row) 

out_file.close() 

print "wrote %sn" % out 
Смежные вопросы