大数据之hive JVM 重用 JVM Reuse 配置与资源节省

大数据阿木 发布于 10 天前 2 次阅读


摘要:

随着大数据时代的到来,Hive作为一款广泛使用的数据仓库工具,在处理海量数据时,其性能和资源消耗成为关注的焦点。JVM(Java虚拟机)重用是优化Hive性能和节省资源的重要手段。本文将深入探讨Hive JVM重用配置,并通过代码实现展示如何在实际应用中节省资源。

一、

Hive作为Apache Hadoop生态系统的一部分,主要用于处理大规模数据集。在执行查询时,Hive会启动多个JVM进程来执行MapReduce任务。每次查询都会创建新的JVM进程,这会导致资源浪费和性能下降。JVM重用通过复用现有的JVM进程来减少资源消耗,提高查询效率。

二、JVM 重用配置

1. 设置JVM重用参数

Hive允许通过配置参数来启用JVM重用。以下是一些关键的配置参数:

- hive.exec.parallel:启用并行执行。

- hive.exec.parallel.thread.number:设置并行执行时使用的线程数。

- hive.exec.reducers.bytes.per.reducer:设置每个reducer处理的数据量。

- hive.exec.reducers.max:设置最大reducer数量。

- hive.exec.reuse.jvm.num.tasks:设置复用JVM的task数量。

2. 优化JVM参数

除了上述配置参数,还可以通过以下方式优化JVM参数:

- 设置合适的堆内存大小(-Xmx)和堆内存最小值(-Xms)。

- 使用G1垃圾回收器来提高垃圾回收效率。

- 优化JVM启动参数,如禁用垃圾回收日志(-XX:+DisableExplicitGC)。

三、代码实现

以下是一个简单的Hive查询示例,展示了如何配置JVM重用和优化JVM参数:

```java

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.hive.ql.exec.UDF;

import org.apache.hadoop.hive.ql.exec.UDFArgumentException;

import org.apache.hadoop.hive.ql.metadata.HiveException;

import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;

import org.apache.hadoop.io.Text;

public class MyUDF extends GenericUDF {

public Text evaluate(Struct[] parameters) throws UDFArgumentException, HiveException {

// 查询逻辑

return new Text("Result");

}

public String getDisplayString(String[] children) {

return "MyUDF(" + String.join(", ", children) + ")";

}

}

public class Main {

public static void main(String[] args) {

Configuration conf = new Configuration();

conf.setBoolean("hive.exec.parallel", true);

conf.setInt("hive.exec.parallel.thread.number", 4);

conf.setInt("hive.exec.reducers.bytes.per.reducer", 500000000);

conf.setInt("hive.exec.reducers.max", 10);

conf.setInt("hive.exec.reuse.jvm.num.tasks", 10);

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer", "500000000");

conf.set("hive.exec.reducers.max", "10");

conf.set("hive.exec.reuse.jvm.num.tasks", "10");

conf.set("hive.exec.parallel", "true");

conf.set("hive.exec.parallel.thread.number", "4");

conf.set("hive.exec.reducers.bytes.per.reducer",