Skip to content

Spark Thrift Server fails to start with Hive 4.0.1 — multiple packaging issues #146

@lapazDBL

Description

@lapazDBL

[BUG] Spark Thrift Server fails to start with Hive 4.0.1 — multiple packaging issues

Environment

  • Clemlab version: 1.3.1.0-292
  • OS: Linux 5.14.0-427.13.1.el9_4.x86_64 (RHEL 9.4)
  • Java: 1.8.0_391
  • Spark: 3.5.6.1.3.1.0-292
  • Hive: 4.0.1.1.3.1.0-292
  • Hadoop: 3.4.1

Summary

The Spark 3 Thrift Server (HiveThriftServer2) fails to start out-of-the-box in Clemlab 1.3.1.0-292. Three distinct packaging bugs were identified:

  1. Missing standalone-metastore directory — the directory referenced by spark.sql.hive.metastore.jars does not exist
  2. Reflection incompatibility with Hive 4.0ReflectionUtils$.setSuperField assumes Hive 2.x/3.x class hierarchy
  3. Missing repackaged Guava classesorg.sparkproject.guava.collect.MultimapBuilder not found

Bug 1: Missing standalone-metastore directory

Symptom

WARN HiveUtils: Hive jar path '/usr/odp/current/spark3-client/standalone-metastore/*' does not exist.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/metastore/api/AlreadyExistsException

Root cause

The default configuration sets spark.sql.hive.metastore.jars = /usr/odp/current/spark3-client/standalone-metastore/* and spark.sql.hive.metastore.version = 3.0, but the directory /usr/odp/current/spark3-client/standalone-metastore/ is never created during installation.

Analysis

The IsolatedClientLoader uses a dedicated classloader with JARs from the standalone-metastore directory. Populating this directory with all Hive JARs from /usr/odp/1.3.1.0-292/hive/lib/ introduces a LinkageError:

java.lang.LinkageError: loader constraint violation: loader (instance of sun/misc/Launcher$AppClassLoader)
previously initiated loading for a different type with name "org/apache/hadoop/hive/common/io/SessionStream"

This occurs because SessionStream exists in both:

  • hive-common-4.0.1.1.3.1.0-292.jar in the main Spark classpath (/usr/odp/current/spark3-thriftserver/jars/)
  • hive-exec-4.0.1.1.3.1.0-292.jar (uber-jar) in the standalone-metastore directory

The hive-exec JAR shipped with Hive is a fat/uber JAR that embeds classes from hive-common (including SessionStream), while the Spark Thrift Server classpath ships hive-exec-4.0.1.1.3.1.0-292-core.jar (a stripped version without embedded dependencies). When both classloaders have copies of SessionStream, the JVM throws a LinkageError when they cross classloader boundaries in SparkSQLEnv$.init().

Workaround attempted

Using builtin mode (spark.sql.hive.metastore.jars = builtin, spark.sql.hive.metastore.version = 4.0.1.1.3.1.0-292) bypasses the standalone-metastore directory entirely and avoids the classloader isolation issues. However, this leads to Bug 2.

Suggested fix

Either:

  • Ship a correctly populated standalone-metastore directory using the -core variant of hive-exec plus all necessary dependencies (thrift, fb303, hive-common, hive-metastore, hive-shims, hive-shims-0.23, etc.) without including classes that conflict with the main Spark classpath
  • Or set the default configuration to spark.sql.hive.metastore.jars = builtin and spark.sql.hive.metastore.version = 4.0.1.1.3.1.0-292, which requires fixing Bug 2

Bug 2: ReflectionUtils$.setSuperField incompatible with Hive 4.0 class hierarchy

Symptom

With builtin mode:

ERROR HiveThriftServer2: Error starting HiveThriftServer2
java.lang.NoSuchFieldException: hiveConf
    at java.lang.Class.getDeclaredField(Class.java:2070)
    at org.apache.spark.sql.hive.thriftserver.ReflectionUtils$.setAncestorField(ReflectionUtils.scala:27)
    at org.apache.spark.sql.hive.thriftserver.ReflectionUtils$.setSuperField(ReflectionUtils.scala:22)
    at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIService.init(SparkSQLCLIService.scala:48)

After fixing hiveConf, a second reflection error appears:

java.lang.NoSuchFieldException: cliService
    at org.apache.spark.sql.hive.thriftserver.ReflectionUtils$.setAncestorField(ReflectionUtils.scala:27)
    at org.apache.spark.sql.hive.thriftserver.ReflectionUtils$.setSuperField(ReflectionUtils.scala:22)
    at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2.init(HiveThriftServer2.scala:133)

Root cause

ReflectionUtils$.setSuperField(obj, fieldName, value) calls setAncestorField(obj, 1, fieldName, value) with a hardcoded level of 1, meaning it always looks for the field in the immediate superclass only.

In Hive 4.0, the class hierarchy changed. The hiveConf field, which was previously in CLIService (1 level up from SparkSQLCLIService), was moved to AbstractService (3 levels up):

SparkSQLCLIService → CLIService → CompositeService → AbstractService (has hiveConf)

Meanwhile, cliService is in HiveServer2 (1 level up from HiveThriftServer2), so it needs level 1:

HiveThriftServer2 → HiveServer2 (has cliService) → CompositeService → AbstractService

The hardcoded drop(1) in setSuperField cannot accommodate both depths. Additionally, the method getAncestorField is also called by ReflectedCompositeService and has the same issue.

Workaround applied

Recompiled ReflectionUtils.scala to search the entire class hierarchy iteratively instead of jumping to a fixed level:

package org.apache.spark.sql.hive.thriftserver

object ReflectionUtils {
  def setSuperField(obj: AnyRef, fieldName: String, fieldValue: AnyRef): Unit = {
    setFieldInHierarchy(obj, fieldName, fieldValue)
  }

  def setAncestorField(obj: AnyRef, level: Int, fieldName: String, fieldValue: AnyRef): Unit = {
    setFieldInHierarchy(obj, fieldName, fieldValue)
  }

  def getSuperField[T](obj: AnyRef, fieldName: String): T = {
    getFieldInHierarchy[T](obj, fieldName)
  }

  def getAncestorField[T](obj: AnyRef, level: Int, fieldName: String): T = {
    getFieldInHierarchy[T](obj, fieldName)
  }

  private def setFieldInHierarchy(obj: AnyRef, fieldName: String, fieldValue: AnyRef): Unit = {
    var clazz: Class[_] = obj.getClass
    while (clazz != null) {
      try {
        val f = clazz.getDeclaredField(fieldName)
        f.setAccessible(true)
        f.set(obj, fieldValue)
        return
      } catch {
        case _: NoSuchFieldException => clazz = clazz.getSuperclass
      }
    }
    throw new NoSuchFieldException(fieldName)
  }

  private def getFieldInHierarchy[T](obj: AnyRef, fieldName: String): T = {
    var clazz: Class[_] = obj.getClass
    while (clazz != null) {
      try {
        val f = clazz.getDeclaredField(fieldName)
        f.setAccessible(true)
        return f.get(obj).asInstanceOf[T]
      } catch {
        case _: NoSuchFieldException => clazz = clazz.getSuperclass
      }
    }
    throw new NoSuchFieldException(fieldName)
  }
}

Compiled with:

java -cp /usr/odp/1.3.1.0-292/spark3/jars/scala-compiler-2.12.18.jar:\
/usr/odp/1.3.1.0-292/spark3/jars/scala-library-2.12.18.jar:\
/usr/odp/1.3.1.0-292/spark3/jars/scala-reflect-2.12.18.jar \
scala.tools.nsc.Main \
-classpath /usr/odp/1.3.1.0-292/spark3/jars/scala-library-2.12.18.jar \
-d /tmp/fix2 /tmp/fix2/ReflectionUtils.scala

cd /tmp/fix2
jar uf /usr/odp/current/spark3-thriftserver/jars/spark-hive-thriftserver_2.12-3.5.6.1.3.1.0-292.jar \
  org/apache/spark/sql/hive/thriftserver/ReflectionUtils$.class \
  org/apache/spark/sql/hive/thriftserver/ReflectionUtils.class

Suggested fix

Update ReflectionUtils in spark-hive-thriftserver to search the entire class hierarchy instead of using a hardcoded level. The iterative approach shown above is backward-compatible with Hive 2.x/3.x since it will still find the field regardless of which superclass contains it.


Bug 3: Missing org.sparkproject.guava.collect.MultimapBuilder

Symptom

After fixing Bug 2, the Thrift Server fails with:

java.lang.NoClassDefFoundError: org/sparkproject/guava/collect/MultimapBuilder
    at org.apache.hive.service.cli.operation.OperationManager.<init>(OperationManager.java:83)
    at org.apache.hive.service.cli.session.SessionManager.<init>(SessionManager.java:77)
    at org.apache.spark.sql.hive.thriftserver.SparkSQLSessionManager.<init>(SparkSQLSessionManager.scala:36)

Root cause

The spark-hive-thriftserver_2.12-3.5.6.1.3.1.0-292.jar embeds Hive 4 service classes (OperationManager, SessionManager, etc.) that reference Guava under the repackaged namespace org.sparkproject.guava. However, the repackaged Guava classes shipped in spark-network-common_2.12-3.5.6.1.3.1.0-292.jar do not include MultimapBuilder.

The standard Guava JAR (guava-32.0.1-jre.jar) contains MultimapBuilder under com.google.common.collect, but the embedded Hive classes expect it under org.sparkproject.guava.collect.

Suggested fix

Either:

  • Include a complete repackaged Guava JAR (org.sparkproject.guava) that covers all classes needed by the embedded Hive 4 service classes
  • Or recompile the embedded Hive service classes to use the standard Guava namespace (com.google.common) which is already present in the classpath

Impact

The Spark Thrift Server is completely non-functional in Clemlab 1.3.1.0-292. No workaround was found that resolves all three issues simultaneously.

Services NOT affected: Hive (standalone), Spark shell (spark3-shell), PySpark (pyspark3), Spark SQL (spark3-sql), spark3-submit, and all other cluster services work correctly. Only the Spark Thrift Server (HiveThriftServer2) is broken.

Steps to reproduce

  1. Install Clemlab 1.3.1.0-292 with Spark 3 and Hive
  2. Start the Spark Thrift Server from Ambari
  3. Check logs: tail -f /var/log/spark3/spark-spark-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-*.out

The service will fail with one of the errors described above depending on configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions