2015-02-28 2 views
1

У меня возникли проблемы с созданием схемы Avro, ниже я положу свою схему.Проблемы с созданием схемы .avsc Avro

twitter.avsc:

{ 
    "type" : "record", 
    "name" : "twitter_schema", 
    "namespace" : "com.miguno.avro", 
    "fields" : [ 
    { "name" : "_id", "type" : "record", "doc" : "Values of the indexes/id tweets"}, 
    { "name" : "nome","type" : "string","doc" : "Name of the user account on Twitter.com" }, 
    { "name" : "tweet", "type" : "string","doc" : "The content of the user's Twitter message" }, 
    { "name" : "datahora", "type" : "string","doc" : "Unix epoch time in seconds"} 

    ], 
    "doc:" : "A schema for storing Twitter messages" 
} 

когда я пытаюсь преобразовать tweet.json в .avro У меня есть следующее сообщение об ошибке:

Exception in thread "main" org.apache.avro.SchemaParseException: "record" is not a defined name. The type of the "_id" field must be a defined name or a {"type": ...} expression. 
    at org.apache.avro.Schema.parse(Schema.java:1199) 
    at org.apache.avro.Schema$Parser.parse(Schema.java:965) 
    at org.apache.avro.Schema$Parser.parse(Schema.java:938) 
    at org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:82) 
    at org.apache.avro.tool.Main.run(Main.java:84) 
    at org.apache.avro.tool.Main.main(Main.java:73) 

Ниже я положил файл .json Я пытаясь конвертировать в .avro.

tweet.json:

{ "_id" : { "$oid" : "54d148b471eb130b1e8b4567" }, "nome" : "Marco Correia", "tweet" : "Globo repassará R$ 300 milhões /clubes http://t.co/SQIjscDolU Vão entrar 45 milhões /Flamengo nesse Mês e Março e o clube não tem Grana!Sei", "datahora" : "Tue Feb 03 22:15:54 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4568" }, "nome" : "FLUMINENSE F.C.", "tweet" : "Jornalheiros - Flamengo x Barra Mansa - Transmissão ao vivo (04/02/2015, 22:00, Maracanã) http://t.co/BYQk3swWqf", "datahora" : "Tue Feb 03 22:15:44 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4569" }, "nome" : "VaiRio - O Globo", "tweet" : "Praia do Flamengo tem fluxo bom no sentido Botafogo, na altura da Rua Dois de Dezembro http://t.co/lWe3IEvAp2", "datahora" : "Tue Feb 03 22:15:44 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456a" }, "nome" : "PC Filho ★★★★", "tweet" : "Jornalheiros - Flamengo x Barra Mansa - Transmissão ao vivo (04/02/2015, 22:00, Maracanã) http://t.co/NArNpqy3tz", "datahora" : "Tue Feb 03 22:15:43 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456b" }, "nome" : "ATL Sports Bar", "tweet" : "SCORE ALERT: #Basketball #Livescore @ScoresPro: (-NBB) #Flamengo Bc vs #Minas: 41-30", "datahora" : "Tue Feb 03 22:15:38 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456c" }, "nome" : "FlamengoNews", "tweet" : " Parcial dos quartos:\n1ºQ - @Flamengo 26x13 Minas\n2ºQ - Flamengo 15x17 Minas", "datahora" : "Tue Feb 03 22:15:33 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456d" }, "nome" : "VaiRio - O Globo", "tweet" : "Rua Mário Ribeiro com trânsito lento no sentido Lagoa, altura do C. R. Flamengo http://t.co/SzhrtTTMz1", "datahora" : "Tue Feb 03 22:15:33 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456e" }, "nome" : "carols", "tweet" : "RT @Flamengo: Esse dia foi LOUCO http://t.co/tEdwRX3bsN", "datahora" : "Tue Feb 03 22:15:30 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b456f" }, "nome" : "walisson rodrigues ", "tweet" : "RT @Esp_Interativo: Alô, torcida do @Flamengo! O EI plus estará ABERTO na web para a transmissão do Jogando em Casa com Rodrigo Caetano! ht…", "datahora" : "Tue Feb 03 22:15:28 +0000 2015" } 
{ "_id" : { "$oid" : "54d148b471eb130b1e8b4570" }, "nome" : "Adélio", "tweet" : "Flamengo: eu sou o fã número 520 #365Scores veio e torce por ele também! http://t.co/Fa4ToFWdMB", "datahora" : "Tue Feb 03 22:15:24 +0000 2015" } 

ответ

2

Тип должен быть одним из примитивов или Avro определенный пользователем тип (запись - это должно быть определено первым, а затем используется). AVSC должен быть одним из ниже:

{ 
"type": "record", 
"name": "twitter_schema", 
"namespace": "com.miguno.avro", 
"fields": [ 
    { 
     "name": "_id", 
     "type": { 
      "type": "record", 
      "name": "id_schema", 
      "namespace": "com.miguno.avro", 
      "fields": [ 
       { 
        "name": "id_name", 
        "type": "string", 
        "doc": "Value of the indexes/id name tweets" 
       }, 
       { 
        "name": "id_value", 
        "type": "string", 
        "doc": "Value of the indexes/id value tweets" 
       } 
      ], 
      "doc:": "A schema for storing Values of the indexes/id tweets" 
     }, 
     "doc": "Values of the indexes/id tweets" 
    }, 
    { 
     "name": "nome", 
     "type": "string", 
     "doc": "Name of the user account on Twitter.com" 
    }, 
    { 
     "name": "tweet", 
     "type": "string", 
     "doc": "The content of the user's Twitter message" 
    }, 
    { 
     "name": "datahora", 
     "type": "string", 
     "doc": "Unix epoch time in seconds" 
    } 
], 
"doc:": "A schema for storing Twitter messages" 
} 

или

{ 
"type": "record", 
"name": "twitter_schema", 
"namespace": "com.miguno.avro", 
"fields": [ 
    { 
     "name": "_id", 
     "type": { 
      "type": "array", 
      "items": "string" 
     }, 
     "doc": "Values of the indexes/id tweets" 
    }, 
    { 
     "name": "nome", 
     "type": "string", 
     "doc": "Name of the user account on Twitter.com" 
    }, 
    { 
     "name": "tweet", 
     "type": "string", 
     "doc": "The content of the user's Twitter message" 
    }, 
    { 
     "name": "datahora", 
     "type": "string", 
     "doc": "Unix epoch time in seconds" 
    } 
], 
"doc:": "A schema for storing Twitter messages" 
} 
Смежные вопросы