All Projects → klarna → Hiverunner

klarna / Hiverunner

Licence: apache-2.0
An Open Source unit test framework for Hive queries based on JUnit 4 and 5

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Hiverunner

HiveRunner
An Open Source unit test framework for Hive queries based on JUnit 4 and 5
Stars: ✭ 244 (+8.44%)
Mutual labels:  hive, junit, test-framework
mutant-swarm
Mutation testing framework and code coverage for Hive SQL
Stars: ✭ 20 (-91.11%)
Mutual labels:  hive, junit
page-content-tester
Paco is a Java based framework for non-blocking and highly parallelized Dom testing.
Stars: ✭ 13 (-94.22%)
Mutual labels:  junit, test-framework
Truth
Fluent assertions for Java and Android
Stars: ✭ 2,359 (+948.44%)
Mutual labels:  junit, test-framework
Junit5
✅ The 5th major version of the programmer-friendly testing framework for Java and the JVM
Stars: ✭ 4,929 (+2090.67%)
Mutual labels:  junit, test-framework
Hive Jdbc Uber Jar
Hive JDBC "uber" or "standalone" jar based on the latest Apache Hive version
Stars: ✭ 188 (-16.44%)
Mutual labels:  hive
Japa
Embedable test runner for Node.js
Stars: ✭ 202 (-10.22%)
Mutual labels:  test-framework
Hive
Lightweight and blazing fast key-value database written in pure Dart.
Stars: ✭ 2,681 (+1091.56%)
Mutual labels:  hive
Video Recorder Java
This library allows easily record video of your UI tests by just putting couple annotations.
Stars: ✭ 179 (-20.44%)
Mutual labels:  junit
Mastering Junit5
A comprehensive collection of test examples created with JUnit 5
Stars: ✭ 223 (-0.89%)
Mutual labels:  junit
Helicalinsight
Helical Insight software is world’s first Open Source Business Intelligence framework which helps you to make sense out of your data and make well informed decisions.
Stars: ✭ 214 (-4.89%)
Mutual labels:  hive
Reporting
Zebrunner Reporting Tool
Stars: ✭ 198 (-12%)
Mutual labels:  junit
Vim Themis
A testing framework for Vim script.
Stars: ✭ 189 (-16%)
Mutual labels:  test-framework
Androidunittest
Save time & clear your unit tests on Android !
Stars: ✭ 205 (-8.89%)
Mutual labels:  junit
Catch2
A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
Stars: ✭ 14,330 (+6268.89%)
Mutual labels:  test-framework
Tutorial
Spring Boot的例子,包含RESTful API, MVC, JMS, Cache, Mybatis, Cache, Websocket...
Stars: ✭ 215 (-4.44%)
Mutual labels:  junit
Rstest
Fixture-based test framework for Rust
Stars: ✭ 182 (-19.11%)
Mutual labels:  test-framework
Fastlane Plugin Test center
🎯 The best fastlane plugin to understand and tame misbehaving iOS tests 🎉
Stars: ✭ 214 (-4.89%)
Mutual labels:  junit
Bats Core
Bash Automated Testing System
Stars: ✭ 2,820 (+1153.33%)
Mutual labels:  junit
Dotnet Testcontainers
A library to support tests with throwaway instances of Docker containers for all compatible .NET Standard versions.
Stars: ✭ 195 (-13.33%)
Mutual labels:  test-framework

Maven Central Build Status build GitHub license

ScreenShot

HiveRunner

Welcome to HiveRunner - Zero installation open source unit testing of Hive applications.

Watch the HiveRunner teaser on youtube!

Welcome to the open source project HiveRunner. HiveRunner is a unit test framework based on JUnit (4 & 5) and enables TDD development of HiveQL without the need for any installed dependencies. All you need is to add HiveRunner to your pom.xml as any other library and you're good to go.

HiveRunner is under constant development. We use it extensively in all our Hive projects. Please feel free to suggest improvements both as pull requests and as written requests.

A word from the inventors

HiveRunner enables you to write Hive SQL as releasable tested artifacts. It will require you to parametrize and modularize HiveQL in order to make it testable. The bits and pieces of code should then be wired together with some orchestration/workflow/build tool of your choice, to be runnable in your environment (e.g. Oozie, Pentaho, Talend, Maven, etc…)

So, even though your current Hive SQL probably won't run off the shelf within HiveRunner, we believe the enforced testability and enabling of a TDD workflow will do as much good to the scripting world of SQL as it has for the Java community.

Cook Book

1. Include HiveRunner

HiveRunner is published to Maven Central. To start to use it, add a dependency to HiveRunner to your pom file:

<dependency>
    <groupId>com.klarna</groupId>
    <artifactId>hiverunner</artifactId>
    <version>[HIVERUNNER VERSION]</version>
    <scope>test</scope>
</dependency>

Alternatively, if you want to build from source, clone this repo and build with:

 mvn install

Then add the dependency as mentioned above.

Also explicitly add the surefire plugin and configure forkMode=always to avoid OutOfMemory when building big test suites.

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.21.0</version>
    <configuration>
        <forkMode>always</forkMode>
    </configuration>
</plugin>

As an alternative if this does not solve the OOM issues, try increase the -Xmx and -XX:MaxPermSize settings. For example:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.21.0</version>
    <configuration>
        <forkCount>1</forkCount>
        <reuseForks>false</reuseForks>
        <argLine>-Xmx2048m -XX:MaxPermSize=512m</argLine>
    </configuration>
</plugin>

(please note that the forkMode option is deprecated and you should use forkCount and reuseForks instead)

With forkCount and reuseForks there is a possibility to reduce the test execution time drastically, depending on your hardware. A plugin configuration which are using one fork per CPU core and reuse threads would look like:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.21.0</version>
    <configuration>
        <forkCount>1C</forkCount>
        <reuseForks>true</reuseForks>
        <argLine>-Xmx2048m -XX:MaxPermSize=512m</argLine>
    </configuration>
</plugin>

By default, HiveRunner uses mapreduce (mr) as the execution engine for hive. If you wish to run using Tez, set the System property hiveconf_hive.execution.engine to 'tez'.

(Any Hive conf property may be overridden by prefixing it with 'hiveconf_')

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.21.0</version>
        <configuration>
            <systemProperties>
                <hiveconf_hive.execution.engine>tez</hiveconf_hive.execution.engine>
                <hiveconf_hive.exec.counters.pull.interval>1000</hiveconf_hive.exec.counters.pull.interval>
            </systemProperties>
        </configuration>
    </plugin>

Timeout

It's possible to configure HiveRunner to make tests time out after some time and retry those tests a couple of times, but only when using StandaloneHiveRunner as this is not available in the HiveRunnerExtension (from HiveRunner 5.x and up). This is to cover for the bug https://issues.apache.org/jira/browse/TEZ-2475 that at times causes test cases to not terminate due to a lost DAG reference. The timeout feature can be configured via the 'enableTimeout', 'timeoutSeconds' and 'timeoutRetries' properties. A configuration which enables timeouts after 30 seconds and allows 2 retries would look like:

<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-surefire-plugin</artifactId>
    <version>2.21.0</version>
    <configuration>
        <systemProperties>
            <enableTimeout>true</enableTimeout>
            <timeoutSeconds>30</timeoutSeconds>
            <timeoutRetries>2</timeoutRetries>
        </systemProperties>
    </configuration>
</plugin>

Logging

HiveRunner uses SLF4J so you should configure logging in your tests using any compatible logging framework.

2. Look at the examples

Look at the com.klarna.hiverunner.examples.HelloHiveRunnerTest reference test case to get a feeling for how a typical test case looks like in JUnit5. To find JUnit4 versions of the examples, look at com.klarna.hiverunner.examples.junit4.HelloHiveRunnerTest.

If you're put off by the verbosity of the annotations, there's always the possibility to use HiveShell in a more interactive mode. The com.klarna.hiverunner.SerdeTest adds a resource (test data) interactively with HiveShell instead of using annotations.

Annotations and interactive mode can be mixed and matched, however you'll always need to include the com.klarna.hiverunner.annotations.HiveSQL annotation e.g:

     @HiveSQL(files = {"serdeTest/create_table.sql", "serdeTest/hql_custom_serde.sql"}, autoStart = false)
     public HiveShell hiveShell;

Note that the autostart = false is needed for the interactive mode. It can be left out when running with only annotations.

Sequence files

If you work with sequence files (Or anything else than regular text files) make sure to take a look at ResourceOutputStreamTest for an example of how to use the new method HiveShell#getResourceOutputStream to manage test input data.

Programatically create test input data

Test data can be programmatically inserted into any Hive table using HiveShell.insertInto(...). This seamlessly handles different storage formats and partitioning types allowing you to focus on the data required by your test scenarios:

hiveShell.execute("create database test_db");
hiveShell.execute("create table test_db.test_table ("
    + "c1 string,"
    + "c2 string,"
    + "c3 string"
    + ")"
    + "partitioned by (p1 string)"
    + "stored as orc");

hiveShell.insertInto("test_db", "test_table")
    .withColumns("c1", "p1").addRow("v1", "p1")       // add { "v1", null, null, "p1" }
    .withAllColumns().addRow("v1", "v2", "v3", "p1")  // add { "v1", "v2", "v3", "p1" }
    .copyRow().set("c1", "v4")                        // add { "v4", "v2", "v3", "p1" }
    .addRowsFromTsv(file)                             // parses TSV data out of a file resource
    .addRowsFrom(file, fileParser)                    // parses custom data out of a file resource
    .commit();

See com.klarna.hiverunner.examples.InsertTestDataTest for working examples.

3. Understand a little bit of the order of execution

HiveRunner will in default mode set up and start the HiveShell before the test method is invoked. If autostart is set to false, the HiveShell must be started manually from within the test method. Either way, HiveRunner will do the following steps when start is invoked:

  1. Merge any @HiveProperties from the test case with the Hive conf
  2. Start the HiveServer with the merged conf
  3. Copy all @HiveResource data into the temp file area for the test
  4. Execute all fields annotated with @HiveSetupScript
  5. Execute the script files given in the @HiveSQL annotation

The HiveShell field annotated with @HiveSQL will always be injected before the test method is invoked.

Hive version compatibility

  • This version of HiveRunner is built for Hive 3.1.2.

  • For Hive 2.x support please use HiveRunner 5.2.1.

  • Command shell emulations are provided to closely match the behaviour of both the Hive CLI and Beeline interactive shells. The desired emulation can be specified in your pom.xml file like so:

      <plugin>
          <groupId>org.apache.maven.plugins</groupId>
          <artifactId>maven-surefire-plugin</artifactId>
          <version>2.21.0</version>
          <configuration>
              <systemProperties>
                  <!-- Defaults to HIVE_CLI, other options include BEELINE and HIVE_CLI_PRE_V200 -->
                  <commandShellEmulator>BEELINE</commandShellEmulator>
              </systemProperties>
          </configuration>
      </plugin>
    

    Or provided on the command line using a system property:

    mvn -DcommandShellEmulator=BEELINE test
    

Future work and Limitations

  • HiveRunner does not allow the add jar statement. It is considered bad practice to keep environment specific code together with the business logic that targets HiveRunner. Keep environment specific stuff in separate files and use your build/orchestration/workflow tool to run the right files in the right order in the right environment. When running HiveRunner, all SerDes available on the classpath of the IDE/maven will be available.

  • HiveRunner runs Hive and Hive runs on top of Hadoop, and Hadoop has limited support for Windows machines. Installing Cygwin might help out.

  • Some of the HiveRunner annotations should probably be rebuilt to be more test method specific. E.g. Resources may be described on a test method basis instead of for a whole test case. Feedback is always welcome!

  • Currently the HiveServer spins up and tears down for every test method. As a performance option it should be possible to clean the HiveServer and metastore between each test method invocation. The choice should probably be exposed to the test writer. By switching between different strategies, side effects/leakage can be ruled out during test case debugging.

Known Issues

UnknownHostException

I've had issues with UnknownHostException on OS X after upgrading my system or running docker. Usually a restart of my machine solved it, but last time I got some corporate stuff installed the restarts stopped working and I kept getting UnknownHostExceptions. Following this simple guide solved my problem: http://crunchify.com/getting-java-net-unknownhostexception-nodename-nor-servname-provided-or-not-known-error-on-mac-os-x-update-your-privateetchosts-file/

Tez queries do not terminate

Tez will at times forget the process id of a random DAG. This will cause the query to never terminate. To get around this there is a timeout and retry functionality implemented in HiveRunner:

     <plugin>
         <groupId>org.apache.maven.plugins</groupId>
         <artifactId>maven-surefire-plugin</artifactId>
         <version>2.21.0</version>
         <configuration>
             <systemProperties>
                 <enableTimeout>true</enableTimeout>
                 <timeoutSeconds>30</timeoutSeconds>
                 <timeoutRetries>2</timeoutRetries>
                 </systemProperties>
         </configuration>
     </plugin>

Make sure to set the timeoutSeconds to that of your slowest test in the test suite and then add some padding.

Contact

Mailing List

If you would like to ask any questions about or discuss HiveRunner please join our mailing list at

https://groups.google.com/forum/#!forum/hive-runner-user

Tags

Hive Hadoop HiveRunner HDFS Unit test JUnit SQL HiveSQL HiveQL

Legal

This project is available under the Apache 2.0 License.

Copyright 2013-2021 Klarna AB.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].