All Projects → marklogic → marklogic-contentpump

marklogic / marklogic-contentpump

Licence: other
MarkLogic Connector for Hadoop and MarkLogic Contentpump (mlcp)

Programming Languages

java
68154 projects - #9 most used programming language
HTML
75241 projects
XQuery
69 projects
javascript
184084 projects - #8 most used programming language
CSS
56736 projects
Batchfile
5799 projects

MarkLogic Content Pump and MarkLogic Connector for Hadoop

MarkLogic Content Pump (mlcp) is a command-line tool that provides the fastest way to import, export, and copy data to or from MarkLogic databases. Core features of mlcp include:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Export data from a database to a file system
  • Copy subsets of data between databases

You can run mlcp across many threads on a single machine or across many nodes in a cluster. Mlcp can now run against MarkLogic clusters hosted on AWS/Azure.

The MarkLogic Connector for Hadoop is an extension to Hadoop’s MapReduce framework that allows you to easily and efficiently communicate with a MarkLogic database from within a Hadoop job. From 10.0-5, Hadoop Connector is removed from a separate release, but mlcp still uses Hadoop Connector as an internal dependency.

Release Note

What's New in mlcp and Hadoop Connector 10.0.9

  • Upgrade log4j from 2.17.0 to 2.17.1 to mitigate security vulnerability CVE-2021-44832.
  • Upgrade dependencies for fixing security vulnerabilities.
  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.8.2

  • Upgrade log4j from 1.2.17 to 2.17.0 to mitigate security vulnerability CVE-2019-17571.

What's New in mlcp and Hadoop Connector 10.0.8

  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.7

  • Upgrade Hadoop Library to 2.7.2.
  • Upgrade dependencies for fixing security vulnerabilities.
  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.6

  • Add auto-scaling capability (scale-out/scale-in) for MLCP import to be leveraged by DHS.
  • Add new command line options: -max_thread_percentage, -polling_init_delay, -polling_period.
  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.5

  • Enable MLCP retry inserting documents when commit fails to make mlcp more robust.
  • Support passing Java Keystore through mlcp command line for TLS Client Authentication connections.
  • Refactor mlcp repo to remove Hadoop Connector from a separate release.
  • Add initial server thread polling for mlcp import.
  • Add a new command line option -max_threads.
  • Disable mlcp distributed mode.
  • Upgrade dependencies for fixing security vulnerabilities.
  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.4

  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.3

  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.2

  • Bug fixes.

What's New in mlcp and Hadoop Connector 10.0.1

  • Library upgrade.
  • Bug fixes.

Getting Started

Documentation

For official product documentation, please refer to:

Wiki pages of this project contain useful information when you work on development:

Required Software

Build

Steps to build mlcp:

$ git clone https://github.com/marklogic/marklogic-contentpump.git
$ cd marklogic-contentpump
$ mvn clean package -DskipTests=true

The build writes to the respective deliverable directory under under the root directory marklogic-contentpump/.

For information on contributing to this project see CONTRIBUTING.md. For information on working on development of this project see project wiki page.

Tests

The unit tests included in this repository are designed to provide illustrative examples of the APIs and to sanity check external contributions. MarkLogic Engineering runs a more comprehensive set of unit, integration, and performance tests internally. To run the unit tests, execute the following command from the marklogic-contentpump/ root directory:

$ mvn test

For detailed information about running unit tests, see Guideline to Run Tests.

Have a question? Need help?

If you have questions about mlcp or the Hadoop Connector, ask on StackOverflow. Tag your question with mlcp and marklogic. If you find a bug or would like to propose a new capability, file a GitHub issue.

Support

mlcp and the Hadoop Connector are maintained by MarkLogic Engineering and distributed under the Apache 2.0 license. They are designed for use in production applications with MarkLogic Server. Everyone is encouraged to file bug reports, feature requests, and pull requests through GitHub. This input is critical and will be carefully considered. However, we can’t promise a specific resolution or timeframe for any request. In addition, MarkLogic provides technical support for release tags of mlcp and the Hadoop Connector to licensed customers under the terms outlined in the Support Handbook. For more information or to sign up for support, visit help.marklogic.com.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].