Hive 2 Connection Woes …

This post is mostly for my reference but, if you are having trouble connecting to a metastore with hive2 server… this might help.

A database is required for a Hive install as a “metastore” repository and IMPORTANTLY that database is NOT provided with the Hive download.   It is worth mentioning that the database that homes the Hive metastore could be stored in the following databases: “Derby” (embedded metastore – don’t use it – it has unacceptable limitations), MySQL, PostgreSQL, Oracle, and MS SQLServer, maybe more.

It also appears that for local and remote metastores (we will configure a local metastore) a JDBC driver is also required and is not supplied and most importantly these facts are not documented or poorly at best.

So with so many options, I’m not arguing the fact that the Hive install does not supply a DB for a metastore (an absolutely critical part of a working Hive installation) … but what should be made clear and is not in the Hive documentation, is what the PREREQUISITES are to running Hive.  Generally, the prerequisites are:

  • Java Install – documented elsewhere – here’s one way to install Java – http://www.lonzodb.com/?p=3939
  • Hadoop (optional)
  • Install Hive – documented here
  • A database remote or local – local documented in this blog
  • Configure the database and Hive to work together – documented here for MySQL / MariaDB

The pre-requisites are not supplied in the Hive download, and how you might install and configure those missing prerequisites so that they will work with Hive 2.  Hive will not work without a metastore.

Installing and configuring any Hive metastore, has been a lot like pulling teeth – no fun.  As I have said this elsewhere, “no wonder Cloudera and HortonWorks are somewhat popular… you don’t have to configure most of this bull-stuff yourself… it should not be this hard.

What I’m documenting below is very basic, and not very secure but, install a DB for the metastore, and configure the atabase and Hive 2 to connect Hive 2 to a mariadb / mysql database on a RHEL 7 clone and allow the use of some basic SQL – I haven’t tested a mapreduce job yet if that is still being done from Hive2 – lets find out.

  • install mariadb ( mysql clone ) on CentOS 7 / RHEL 7 / Oracle Linux 7

yum install mariadb
yum install mariadb-server
systemctl start mariadb
systemctl enable mariadb
  • create a mysql user with grants – from a root Linux account with no security defined yet…

$ mysql
grant all PRIVILEGES on *.* TO 'hive'@'localhost' IDENTIFIED BY 'YES' WITH GRANT OPTION;
  • install Hive 2 (or version 3.0 really) – note I installed with root linux account to /opt/hive31 – see details below – this is the easy part

get the Hive 3.1 download from hive.apache.org - download the tar.gz - get a version like this:
apache-hive-3.1.0-bin.tar.gz
sudo cp apache-hive-3.1.0-bin.tar.gz /opt
# can be installed eslewhere - just where I put it
cd /opt
tar xzvf apache-hive-3.1.0-bin.tar.gz
mv apache-hive-3.1.0-bin hive31

Here is what my .bashrc and .bash_profile for my hadoop linux user looks like (and yes hadoop 3.1.1 was installed on the VM before Hive):

[hadoop@single-node ~]$ cat .bashrc
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# Uncomment the following line if you don't like systemctl's auto-paging feature:
# export SYSTEMD_PAGER=
# User specific aliases and functions
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib/native"
export HIVE_HOME=/opt/hive31
export HIVE_CONF_DIR=/opt/hive31/conf
export PATH=$PATH:$HIVE_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/*:.
export CLASSPATH=$CLASSPATH:$HIVE_HOME/lib/*:.
[hadoop@single-node ~]$ cat .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
	. ~/.bashrc
fi
export JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:bin/java::")
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
# User specific environment and startup programs
PATH=$PATH:$HOME/.local/bin:$HOME/bin
export PATH
  • Don’t forget to “source” what you just defined

source ~/.bashrc
source ~/.bash_profile
  • install JDBC drivers using a tar.gz download – copy the driver to $HIVE_HOME/lib

cd ~/Downloads
tar xzvf mysql-connector-java-8.0.12.jar.tar.gz
cd mysql-connector-java-8.0.12.jar
sudo cp mysql-connector-java-8.0.12.jar $HIVE_HOME/lib
  • add configuration info for hive 2, $HIVE_HOME/conf/hive-env.sh – add the following line

export HADOOP_HOME=/home/hadoop/hadoop
  • Add configuration info for Hive 2 to $HIVE_HOME/conf/hive-site.xml to define metastore with mysql username/pw etc.

Here is my entire $HIVE_HOME/conf/hive-site.xml

[hadoop@single-node conf]$ cat hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <!-- Hive Configuration can either be stored in this file or in the hadoop configuration files  -->
  <!-- that are implied by Hadoop setup variables.                                                -->
  <!-- Aside from Hadoop setup variables - this file is provided as a convenience so that Hive    -->
  <!-- users do not have to edit hadoop configuration files (that may be managed as a centralized -->
  <!-- resource).                                                                                 -->
  <!-- Hive Execution Parameters -->
  <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://127.0.0.1/metastore?createDatabaseIfNotExist=true</value>
    <description>JDBC connect string for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>hive</value>
  </property>
  <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>YES</value>
  </property>
<property>
   <name>hive.server2.authentication</name>
    <value>NONE</value>
    <description>Authentication type </description>
</property>
</configuration>
  • using schemaTool to initialize the metastore using the following:

$HIVE_HOME/bin/schematool -dbType mysql -initSchema --verbose"
  • start hiveserver2

$HIVE_HOME/bin/hiveserver2
# note the command above starts the hiveserver2 in the foreground so that I could see the output, optionally use:
nohup $HIVE_HOME/bin/hiveserver2 &
# to put it in the background
  • kill hiveserver2 – no way to gracefully shut this beast down?  Does not look like it…

[root@single-node ~]# cat ./kill-hive2.bash
hive2_pid=`pgrep -f org.apache.hive.service.server.HiveServer2`
    if [[ -n "$hive2_pid" ]]
    then
        echo "Found hivesevrer2 PID-- "$hive2_pid
        kill $hive2_pid
        # if process is still around, use kill -9
        if ps -p $hive2_pid > /dev/null ; then
            echo "Initial kill failed, killing with -9 "
            kill -9 $hive2_pid
        fi
    echo "Hive server2 stopped successfully"
    else
        echo "Hiveserver2 process not found , HIveserver2 is not running !!!"
    fi
    meta_pid=`pgrep -f org.apache.hadoop.hive.metastore.HiveMetaStore`
    if [[ -n "$meta_pid" ]]
    then
        echo "Found hivesevrer2 PID-- "$meta_pid
        kill $meta_pid
        # if process is still around, use kill -9
        if ps -p $meta_pid > /dev/null ; then
            echo "Initial kill failed, killing with -9 "
            kill -9 $meta_pid
        fi
    echo "Hive metastore stopped successfully"
    else
        echo "Hive metastore process not found , Hive metastore is not running !!!"
    fi
  • and insecurely connect using the “beeline CLI” (just enter return for username and password):

$HIVE_HOME/bin/beeline
!connect jdbc:hive2://

Note:

On a connect from beeline – connect shown below – I am getting the following warning but the connect succeeds so… one less thing to worry about.

!connect jdbc:hive2://
WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored
Connected to: Apache Hive (version 3.1.0)
Driver: Hive JDBC (version 3.1.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://> select current_date()
. . . . . . . . > ;
OK
+-------------+
| _c0 |
+-------------+
| 2018-08-30 |
+-------------+
1 row selected (2.569 seconds)
0: jdbc:hive2://> show databases;
OK
+----------------+
| database_name |
+----------------+
| default |
+----------------+
1 row selected (0.24 seconds)

# here is a little create table and load data test
# before I create the table in beeline – here is what the table and data looks like

[hadoop@single-node hive-stuff]$ pwd
/home/hadoop/hive-stuff
[hadoop@single-node hive-stuff]$ cat bulljunk.txt
1801	MickeyMantle	555000	HomeRunner
1802	Lonzo	195000	Blogger

# the file is ‘\t’ – tab delimited and records are terminated by a ‘\n’ – line feed

0: jdbc:hive2://> CREATE TABLE IF NOT EXISTS guys ( empid int, name String,
. . . . . . . . > salary String, positionTitle String)
. . . . . . . . > COMMENT 'Employee details'
. . . . . . . . > ROW FORMAT DELIMITED
. . . . . . . . > FIELDS TERMINATED BY '\t'
. . . . . . . . > LINES TERMINATED BY '\n'
. . . . . . . . > STORED AS TEXTFILE;
OK
0: jdbc:hive2://> LOAD DATA LOCAL INPATH '/home/hadoop/hive-stuff/bulljunk.txt'
. . . . . . . . > OVERWRITE INTO TABLE guys;
Loading data to table default.guys
18/08/30 21:07:37 [HiveServer2-Background-Pool: Thread-77]: WARN metastore.ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
18/08/30 21:07:38 [HiveServer2-Background-Pool: Thread-77]: WARN metastore.ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
OK
No rows affected (0.643 seconds)
0: jdbc:hive2://> show databases;
18/08/30 21:09:27 [a73e84e3-bb48-47a6-bdb3-ac7298d270fb main]: WARN metastore.ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored
OK
+----------------+
| database_name  |
+----------------+
| default        |
+----------------+
1 row selected (0.034 seconds)
0: jdbc:hive2://> show tables;
OK
+-----------+
| tab_name  |
+-----------+
| employee  |
| guys      |
+-----------+
2 rows selected (0.032 seconds)

Ok, data loaded – I hope – lets get the count – it is running a map-reduce – when I count the data as shown below.

0: jdbc:hive2://> select count(*) from guys;
18/08/30 21:09:46 [HiveServer2-Background-Pool: Thread-96]: WARN ql.Driver: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20180830210946_40782547-160b-4be1-97f6-1aeba420e159
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
WARN  : Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
18/08/30 21:09:48 [HiveServer2-Background-Pool: Thread-96]: WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
Starting Job = job_1535660525578_0001, Tracking URL = http://localhost:8088/proxy/application_1535660525578_0001/
Kill Command = /home/hadoop/hadoop/bin/mapred job  -kill job_1535660525578_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
18/08/30 21:10:00 [HiveServer2-Background-Pool: Thread-96]: WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
2018-08-30 21:09:59,993 Stage-1 map = 0%,  reduce = 0%
2018-08-30 21:10:07,464 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.51 sec
2018-08-30 21:10:13,725 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.8 sec
MapReduce Total cumulative CPU time: 7 seconds 800 msec
Ended Job = job_1535660525578_0001
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 7.8 sec   HDFS Read: 12600 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 7 seconds 800 msec
OK
+------+
| _c0  |
+------+
| 2    |
+------+
1 row selected (29.339 seconds)

# but no map reduce when I select the data… hmmm

0: jdbc:hive2://> select * from guys;
OK
+-------------+---------------+--------------+---------------------+
| guys.empid  |   guys.name   | guys.salary  | guys.positiontitle  |
+-------------+---------------+--------------+---------------------+
| 1801        | MickeyMantle  | 555000       | HomeRunner          |
| 1802        | Lonzo         | 195000       | Blogger             |
+-------------+---------------+--------------+---------------------+
2 rows selected (0.125 seconds)
0: jdbc:hive2://>

For more Big Data stuff from lonzodb

Leave a Comment

Scroll to Top