阅读 69

Running HQL from Python without using the Hive Standalone Server


The Ex CS Grad Student: Running HQL from Python without using the Hive Standalone Server

Running HQL from Python without using the Hive Standalone Server

To use a language other than Java (say python) with Hive, you must use the Hive Standalone Server. The main disadvantage of using the Hive Standalone Server is that it is currently single threaded [HIVE-80].  Additionally, there is the inconvenience of running an additional server.

 

We can solve this problem by using Jython (and possibly JRuby).   Jython enables us to use Hive's Java client library to execute the HQL query and retrieve the results.  We can then process the results in pure python.



Let us try it out:



STEP 1:

Download and install Jython.



STEP 2:

Make sure you have the following jars and directories in your CLASSPATH.

  • hive-service-0.6.0.jar

  • libfb303.jar

  • log4j-1.2.15.jar

  • antlr-runtime-3.0.1.jar derby.jar

  • jdo2-api-2.3-SNAPSHOT.jar

  • commons-logging-1.0.4.jar

  • datanucleus-core-1.1.2.jar

  • datanucleus-enhancer-1.1.2.jar

  • datanucleus-rdbms-1.1.2.jar

  • hive-exec-0.6.0.jar

  • hive-jdbc-0.6.0.jar

  • hive-metastore-0.6.0.jar

  • derby.jar

  • jdo2-api-2.3-SNAPSHOT.jar

  • commons-lang-2.4.jar

  • hadoopcore/hadoop-0.20.0/hadoop-0.20.0-core.jar

  • /usr/lib/hadoop-0.20/lib/mysql-connector-java-5.0.8-bin.jar

  • conf (this is your hive installation's build/dist/conf directory)

Jar locations and versions may be different in your hive installation.



STEP 3:

Create a test data file /tmp/test.dat with the following lines

1:one2:two3:three



STEP 4:

Run the following Jython script

from java.lang import *from java.lang import *from java.sql import *driverName = "org.apache.hadoop.hive.jdbc.HiveDriver";try:
  Class.forName(driverName);except Exception, e:
  print "Unable to load %s" % driverName  System.exit(1);conn = DriverManager.getConnection("jdbc:hive://");stmt = conn.createStatement();# Drop table#stmt.executeQuery("DROP TABLE testjython")# Create a tableres = stmt.executeQuery("CREATE TABLE testjython (key int, value string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':'")# Show tablesres = stmt.executeQuery("SHOW TABLES")print "List of tables:"while res.next():
    print res.getString(1)# Load some datares = stmt.executeQuery("LOAD DATA LOCAL INPATH '/tmp/test.dat' INTO TABLE testjython")# SELECT the datares = stmt.executeQuery("SELECT * FROM testjython")print "Listing contents of table:"while res.next():
    print res.getInt(1), res.getString(2)



You should see the following output, amidst a whole lot of debug statements:

1 one

2 two


文章分类
代码人生
版权声明:本站是系统测试站点,无实际运营。本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌抄袭侵权/违法违规的内容, 请发送邮件至 XXXXXXo@163.com 举报,一经查实,本站将立刻删除。
相关推荐