2013-11-17 4 views

Я только что установил свои первые 4 узла, стек HadoopDataPlatform 2.0, кластер.«Привет, мир!» для hadoop/hbase?

Есть хороший "Hello World" программу, чтобы начать работу с

  • Hbase?
  • Pig?
  • Улей?

Фактическая проблема с продуктом, которую я в конечном итоге решаю, слишком сложна, чтобы даже частично воспроизводить. Я надеюсь найти несколько хороших ПОЛУЧИТЬ стартером документы, которые немного глубже, что «http://hbase.apache.org/book/quickstart.html»

Я думаю, что улей и свиньи являются конкурентом в пищевой цепи, но мы будем оценивать как для нашего конкретного использования которых случаев до обнуления на одном.



(Вы, вероятно, чтобы получить лучший ответ, если вы разделяете, что вы смотрели на до сих пор)

Некоторые вводные уроки по Pig, улья и Hbase: http://hortonworks.com/hadoop-tutorial/hello-world-an-introduction-to-hadoop-hcatalog-hive-and-pig/#pig http://pig.apache.org/docs/r0.8.1/tutorial.html https://cwiki.apache.org/confluence/display/Hive/Tutorial http://gethue.tumblr.com/post/58181985680/hadoop-tutorial-how-to-create-example-tables-in-hbase

Есть хорошие книги, «Программирование свиней» Алана Гейтса, «Программирование улья» и т. Д., Также доступны, если вы хотите углубиться.

Утверждение о том, что Свиньи и Куст являются соперниками в пищевой цепи, на самом деле не так. Вы можете очень хорошо использовать их совместно - Pig для работы с неструктурированными данными, группировкой и преобразованием данных в структурированный вывод. Hive QL (который похож на SQL) можно использовать для запуска специальных запросов по структурированным данным, выводимым из Pig.

Кроме того, в дополнение к Pig (который имеет пользовательский DSL, называемый Pig Latin), существует несколько других сокращений абстракций, доступных как Scalding/Scoobi для Scala, или Cascading, Crunch для Java. Возможность программировать на одном языке с хорошим уровнем абстракции - это то преимущество, которое вы получите с ними.


Я думаю, что этот ответ немного опоздал на Ajeet, так как вы задали вопрос более года назад, но для других, которые ищут простой стиль HBase «Hello World» в API HBase (v1.0), здесь новая один, который только что был размещен на Gist.GitHub.com - https://gist.github.com/dvimont/a7791f61c4ba788fd827

Вот полное содержание сути (обновлено 2016-04-15):

* This brief HELLO WORLD Java program is meant to enable you to very quickly 
* gain a rudimentary, hands-on understanding of how data (and metadata) is 
* stored and retrieved in HBase via the "client API". 
* ================ 
* For those coming to the HBase world with previous experience in traditional 
* RDBMS databases, it is essential to realize that Tables, Rows, and Columns 
* in HBase, while bearing some resemblance to their namesakes in the RDBMS 
* world, differ markedly in their structures and functionality. 
* **Column Families** 
* As you can see in the code below, when you use the Admin#createTable method, 
* besides providing a TableName, you also must specify at least one "Column 
* Family" (denoted in the code by the class HColumnDescriptor). 
* In HBase, Columns are all grouped by Column Family, with all Columns in a 
* family being physically stored together. Theoretically, you could have a 
* large number of Column Families, but the present HBase architecture 
* actually has a practical limitation of no more than three or four per Table. 
* **Versioning** 
* In the code below, a "maxVersions" value of 3 is assigned to the 
* Column Family, which means that versioning has been enabled for all Columns 
* in the family: when a Column is updated, the 2 most recent *previous* values 
* for that Column are still retrievable, each designated by a timestamp. 
* These individual versioned instances are sometimes referred to as the Cells 
* of a Column. The retrieval of multiple versions (Cells) of the same Column is 
* performed below in the #getAndPrintAllCellVersions method. 
* **Columns** 
* It is important to note (in the most striking departure from RDBMS norms) 
* that Columns themselves are NOT part of the Table definition. Columns are 
* "defined" on-the-fly as each row is <put> (i.e., inserted/updated) into the 
* database. There is also NO datatyping of each Column: HBase accepts any 
* byte-array of any length/format you wish to store in any Column. This means 
* NAMES AND DATATYPES. In the RDBMS world, the database (i.e., database 
* administrator) manages column metadata; in the HBase world, the application 
* (i.e., application designer/programmer) manages column metadata. 
* **Rows** 
* Rows are inserted, accessed, and physically ordered exclusively by Row ID 
* (the conceptual equivalent of an RDBMS primary key). When a "scan" is 
* performed to access multiple contiguous rows, those rows will always be 
* returned in Row ID order (either ascending or descending). 
* ============================== 
* Importantly, your ability to run this code requires that you have successfully 
* installed and started a standalone implementation of HBase on the machine 
* on which this program is to be run. 
* The recommended steps to take to run this program are: 
* (1) Install a "standalone" configuration of the current stable release of 
*  HBase on your machine following the instructions provided at: 
*   https://hbase.apache.org/book.html#quickstart 
*  (If you are installing on a Windows machine, it is strongly recommended 
*  that you NOT bother trying to do an installation using the documented 
*  Cygwin option [which has proven to be faulty and is apparently not 
*  kept up-to-date with new releases of HBase], but instead install and 
*  run a virtual Unix environment [e.g., Ubuntu] in a virtual machine 
*  such as VirtualBox, and install HBase in that environment.) 
* (2) Copy this code into a new project in your favorite IDE, set up the 
*  CLASSPATH as documented below, and use this code as your launchpad into 
*  effective utilization of the HBase Client API. Run and modify this code 
*  as extensively as you need to in order to build and deepen your 
*  understanding of how to store and retrieve data (and metadata!) in HBase. 
*  Refer to the HBase javadocs (https://hbase.apache.org/apidocs/) 
*  to extend this code and explore functionality not demonstrated in the 
*  code below. 
* This code was developed in coordination with HBase release; 
* compatibility with subsequent releases is hoped for, but by no means 
* guaranteed. 
* ========================= 
* To fulfill CLASSPATH requirements to compile/run this program: 
* -- the CLASSPATH must include the directory in which hbase-site.xml (i.e., 
*  the HBase startup parameters file) is stored for your currently-running 
*  instance of HBase (e.g., '/usr/local/hbase/hbase-'). 
*  [In NetBeans, this would be set in Project Properties/Libraries/Run.] 
* -- the CLASSPATH should also include the HBase library (e.g. "HBase_1.0.1.1" 
*  [In NetBeans, you can include this library in your project's 
*  "Compile-time Libraries" list.] 
package org.prettygoodexamples.hellohbase; 

import java.io.IOException; 
import java.util.Map.Entry; 
import java.util.NavigableMap; 
import org.apache.hadoop.hbase.HColumnDescriptor; 
import org.apache.hadoop.hbase.HTableDescriptor; 
import org.apache.hadoop.hbase.NamespaceDescriptor; 
import org.apache.hadoop.hbase.NamespaceNotFoundException; 
import org.apache.hadoop.hbase.TableName; 
import org.apache.hadoop.hbase.client.Admin; 
import org.apache.hadoop.hbase.client.Connection; 
import org.apache.hadoop.hbase.client.ConnectionFactory; 
import org.apache.hadoop.hbase.client.Delete; 
import org.apache.hadoop.hbase.client.Get; 
import org.apache.hadoop.hbase.client.Put; 
import org.apache.hadoop.hbase.client.Result; 
import org.apache.hadoop.hbase.client.Table; 
import org.apache.hadoop.hbase.util.Bytes; 

* Successful running of this application requires access to an active instance 
* of HBase. For install instructions for a standalone instance of HBase, please 
* refer to https://hbase.apache.org/book.html#quickstart 
public final class HelloHBase { 

    protected static final String MY_NAMESPACE_NAME = "myTestNamespace"; 
    static final TableName MY_TABLE_NAME = TableName.valueOf("myTestTable"); 
    static final byte[] MY_COLUMN_FAMILY_NAME = Bytes.toBytes("cf"); 
    static final byte[] MY_FIRST_COLUMN_QUALIFIER 
      = Bytes.toBytes("myFirstColumn"); 
    static final byte[] MY_SECOND_COLUMN_QUALIFIER 
      = Bytes.toBytes("mySecondColumn"); 
    static final byte[] MY_ROW_ID = Bytes.toBytes("rowId01"); 

    public static void main(final String[] args) throws IOException { 
    final boolean deleteAllAtEOJ = true; 

    * ConnectionFactory#createConnection() automatically looks for 
    * hbase-site.xml (HBase configuration parameters) on the system's 
    * CLASSPATH, to enable creation of Connection to HBase via Zookeeper. 
    try (Connection connection = ConnectionFactory.createConnection(); 
      Admin admin = connection.getAdmin()) { 

     admin.getClusterStatus(); // assure connection successfully established 
     System.out.println("\n*** Hello HBase! -- Connection has been " 
       + "established via Zookeeper!!\n"); 


     System.out.println("Getting a Table object for [" + MY_TABLE_NAME 
       + "] with which to perform CRUD operations in HBase."); 
     try (Table table = connection.getTable(MY_TABLE_NAME)) { 


     if (deleteAllAtEOJ) { 

     if (deleteAllAtEOJ) { 

    * Invokes Admin#createNamespace and Admin#createTable to create a namespace 
    * with a table that has one column-family. 
    * @param admin Standard Admin object 
    * @throws IOException If IO problem encountered 
    static void createNamespaceAndTable(final Admin admin) throws IOException { 

    if (!namespaceExists(admin, MY_NAMESPACE_NAME)) { 
     System.out.println("Creating Namespace [" + MY_NAMESPACE_NAME + "]."); 

    if (!admin.tableExists(MY_TABLE_NAME)) { 
     System.out.println("Creating Table [" + MY_TABLE_NAME.getNameAsString() 
       + "], with one Column Family [" 
       + Bytes.toString(MY_COLUMN_FAMILY_NAME) + "]."); 

     admin.createTable(new HTableDescriptor(MY_TABLE_NAME) 
       .addFamily(new HColumnDescriptor(MY_COLUMN_FAMILY_NAME))); 

    * Invokes Table#put to store a row (with two new columns created 'on the 
    * fly') into the table. 
    * @param table Standard Table object (used for CRUD operations). 
    * @throws IOException If IO problem encountered 
    static void putRowToTable(final Table table) throws IOException { 

    table.put(new Put(MY_ROW_ID).addColumn(MY_COLUMN_FAMILY_NAME, 

    System.out.println("Row [" + Bytes.toString(MY_ROW_ID) 
      + "] was put into Table [" 
      + table.getName().getNameAsString() + "] in HBase;\n" 
      + " the row's two columns (created 'on the fly') are: [" 
      + Bytes.toString(MY_COLUMN_FAMILY_NAME) + ":" 
      + Bytes.toString(MY_FIRST_COLUMN_QUALIFIER) 
      + "] and [" + Bytes.toString(MY_COLUMN_FAMILY_NAME) + ":" 
      + Bytes.toString(MY_SECOND_COLUMN_QUALIFIER) + "]"); 

    * Invokes Table#get and prints out the contents of the retrieved row. 
    * @param table Standard Table object 
    * @throws IOException If IO problem encountered 
    static void getAndPrintRowContents(final Table table) throws IOException { 

    Result row = table.get(new Get(MY_ROW_ID)); 

    System.out.println("Row [" + Bytes.toString(row.getRow()) 
      + "] was retrieved from Table [" 
      + table.getName().getNameAsString() 
      + "] in HBase, with the following content:"); 

    for (Entry<byte[], NavigableMap<byte[], byte[]>> colFamilyEntry 
      : row.getNoVersionMap().entrySet()) { 
     String columnFamilyName = Bytes.toString(colFamilyEntry.getKey()); 

     System.out.println(" Columns in Column Family [" + columnFamilyName 
       + "]:"); 

     for (Entry<byte[], byte[]> columnNameAndValueMap 
       : colFamilyEntry.getValue().entrySet()) { 

     System.out.println(" Value of Column [" + columnFamilyName + ":" 
       + Bytes.toString(columnNameAndValueMap.getKey()) + "] == " 
       + Bytes.toString(columnNameAndValueMap.getValue())); 

    * Checks to see whether a namespace exists. 
    * @param admin Standard Admin object 
    * @param namespaceName Name of namespace 
    * @return true If namespace exists 
    * @throws IOException If IO problem encountered 
    static boolean namespaceExists(final Admin admin, final String namespaceName) 
      throws IOException { 
    try { 
    } catch (NamespaceNotFoundException e) { 
     return false; 
    return true; 

    * Invokes Table#delete to delete test data (i.e. the row) 
    * @param table Standard Table object 
    * @throws IOException If IO problem is encountered 
    static void deleteRow(final Table table) throws IOException { 
    System.out.println("Deleting row [" + Bytes.toString(MY_ROW_ID) 
      + "] from Table [" 
      + table.getName().getNameAsString() + "]."); 
    table.delete(new Delete(MY_ROW_ID)); 

    * Invokes Admin#disableTable, Admin#deleteTable, and Admin#deleteNamespace to 
    * disable/delete Table and delete Namespace. 
    * @param admin Standard Admin object 
    * @throws IOException If IO problem is encountered 
    static void deleteNamespaceAndTable(final Admin admin) throws IOException { 
    if (admin.tableExists(MY_TABLE_NAME)) { 
     System.out.println("Disabling/deleting Table [" 
       + MY_TABLE_NAME.getNameAsString() + "]."); 
     admin.disableTable(MY_TABLE_NAME); // Disable a table before deleting it. 
    if (namespaceExists(admin, MY_NAMESPACE_NAME)) { 
     System.out.println("Deleting Namespace [" + MY_NAMESPACE_NAME + "]."); 
