All Projects → italia → daf-kylo

italia / daf-kylo

Licence: AGPL-3.0 License
Kylo integration with PDND (previously DAF).

Programming Languages

java
68154 projects - #9 most used programming language
shell
77523 projects
Dockerfile
14818 projects
groovy
2714 projects
javascript
184084 projects - #8 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to daf-kylo

ormdb
ORM tool for .Net / .Net.Core
Stars: ✭ 14 (-30%)
Mutual labels:  mariadb
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Stars: ✭ 34 (+70%)
Mutual labels:  spark
prosto
Prosto is a data processing toolkit radically changing how data is processed by heavily relying on functions and operations with functions - an alternative to map-reduce and join-groupby
Stars: ✭ 54 (+170%)
Mutual labels:  spark
bigdata-fun
A complete (distributed) BigData stack, running in containers
Stars: ✭ 14 (-30%)
Mutual labels:  spark
trembita
Model complex data transformation pipelines easily
Stars: ✭ 44 (+120%)
Mutual labels:  spark
blog
blog entries
Stars: ✭ 39 (+95%)
Mutual labels:  spark
visions
Type System for Data Analysis in Python
Stars: ✭ 136 (+580%)
Mutual labels:  spark
Spotify-Song-Recommendation-ML
UC Berkeley team's submission for RecSys Challenge 2018
Stars: ✭ 70 (+250%)
Mutual labels:  spark
mmb
Set of Dockerfiles and assets related to them for building Docker images with different services
Stars: ✭ 34 (+70%)
Mutual labels:  mariadb
bigkube
Minikube for big data with Scala and Spark
Stars: ✭ 16 (-20%)
Mutual labels:  spark
docker-compose-lemp-stack
Docker Compose Linux Nginx MariaDB PHP7.2 Stack
Stars: ✭ 55 (+175%)
Mutual labels:  mariadb
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Stars: ✭ 25 (+25%)
Mutual labels:  spark
Covid19Tracker
A Robinhood style COVID-19 🦠 Android tracking app for the US. Open source and built with Kotlin.
Stars: ✭ 65 (+225%)
Mutual labels:  spark
Casper
A compiler for automatically re-targeting sequential Java code to Apache Spark.
Stars: ✭ 45 (+125%)
Mutual labels:  spark
spark-data-sources
Developing Spark External Data Sources using the V2 API
Stars: ✭ 36 (+80%)
Mutual labels:  spark
smolder
HL7 Apache Spark Datasource
Stars: ✭ 33 (+65%)
Mutual labels:  spark
SparkV
🤖⚡ | The most POWERFUL multipurpose chat/meme bot that will boost the activity in your server.
Stars: ✭ 24 (+20%)
Mutual labels:  spark
dllib
dllib is a distributed deep learning library running on Apache Spark
Stars: ✭ 32 (+60%)
Mutual labels:  spark
spark learning
尚硅谷大数据Spark-2019版最新 Spark 学习
Stars: ✭ 42 (+110%)
Mutual labels:  spark
confluent-spark-avro
Spark UDFs to deserialize Avro messages with schemas stored in Schema Registry.
Stars: ✭ 18 (-10%)
Mutual labels:  spark

Daf-Kylo for PDND (Piattaforma Digitale Nazionale Dati), previously DAF (Data & Analytics Framework)

In order to install and use this repo you may deploy all the components into a cloudera shared edge node.

Daf-Kylo is a data lake platform built on Apache Hadoop and Spark. Daf-Kylo provides a data lake solution enabling self-service data ingest, data preparation, and data discovery. Kylo integrates best practices around metadata capture, security, and data quality. Apache Nifi provides a flexible data processing framework for building batch or streaming pipeline templates, and for enabling self-service features.

What is the PDND (previously DAF)?

PDND stands for "Piattaforma Digitale Nazionale Dati" (Italian Digital Data Platform), previously known as Data & Analytics Framework (DAF).

In brief, is an attempt to establish a central Chief Data Officer (CDO) for the Government and Public Administration. Its main goal is to promote data exchange among Italian Public Administrations (PAs), to support the diffusion of open data, and to enable data-driven policies. You can find more about the PDND on the official Digital Transformation Team website.

What is Daf-Kylo?

Daf-Kylo repository contains the set of components used to deploy and manage the PDND data ingestion process.

Folder /docker contains all the docker files for build images of daf-kylo components.

Folder /kubernetes contains all the yaml files for deploy pods and services on kubernetes.

Folder /kylo contains all the kylo stuff such as api documentation for the integration with PDND Portal, kylo templates, kylo patch.

Folder /nifi contains all the nifi templates and customized processors used in ingestion process.

Folder /scripts contains utils scripts for manage pods, log and other kubernetes stuff.

Prerequisites

Project dependencies

Project dependencies can be find by clicking on this link.

Project components

Project Daf-Kylo depends by the following components.

  • ActiveMQ version 5.15.1, available here;
  • Elasticsearch version 5.6.4, available here;
  • MariaDB version 10.3, available here;
  • Spark version 2.2.0, available here;
  • Kylo-Services version 9.1.0, available here;
  • Kylo-UI version 9.1.0, available here;
  • NiFi version 1.7.0, available here.

How to install and use Daf-Kylo

MacOS and Linux

Installing Daf-Kylo on Unix-like systems requires a package manager such as Homebrew. You can download and install Homebrew following the instructions given in the Homebrew official website. Once you have installed Homebrew, you can follow some steps to complete the setup. First step is Homebrew cask installation. Open a terminal and type the following command to install Homebrew cask:

brew tap caskroom/cask  

Then, update all formulas and Homebrew itself by typing

brew update  

Last, install kube-controller-manager, RPM, make and Git by typing

brew install kubectl rpm make git  

How to build Daf-Kylo

To build most of Docker images, kylo code is required (source and compiled). To get it run, you have to download and compile it, using Makefile, by typing the following commands (production and test environment):

Production

make -f Makefile daf-kylo
make -f Makefile build-kylo  

Test

make -f Makefile.test daf-kylo
make -f Makefile.test build-kylo  

Login to nexus repository

docker login nexus.daf.teamdigitale.it

Build Docker images of the components

Once this is completed, you can build every image (production and test environment), by typing the following comands:

Production

make activemq
make mysql 
make kylo-services  
make kylo-ui  
make nifi  

Test

make -f Makefile.test activemq
make -f Makefile.test mysql  
make -f Makefile.test kylo-services  
make -f Makefile.test kylo-ui  
make -f Makefile.test nifi  

Push Docker images to local artifactory repository

Please ensure previously configuration of docker client as well as correct tagging the image has been performed. 'How to' can be found in:
TeamDigitale onboarding 'Setup Docker '
TeamDigitale onboarding 'Push Docker Image'

After config and proper tagging has been done, push can be performed typing: docker push [repositoryurl:repositoryport/artifact:version]

for instance:

Production

./nexus_push.sh prod [namespace]

Test

./nexus_push.sh test [namespace]
The [namespace] is optional.

Deploy components in kubernetes cluster

Please ensure previously configuration of kubectl has been done. 'How to' can be found in: -TeamDigitale onboarding , 'Setup Kubernetes'

Production

After config is done, deploy into kubernetes cluster can be performed typing ./playbook.sh [component] .

As an example, ./playbook.sh prod activemq [namespace].

Pod deletion can be performed typing: ./cleanup.sh [environment] [component].

As an example,
./cleanup.sh prod activemq [namespace]

Test

for instance:
./playbook.sh test activemq [namespace]
or delete by: ./cleanup.sh [environment] [component]

for instance:
./cleanup.sh test activemq [namespace]

Mysql Configuration

By default the kylo database is not created in mysql container, so you have to create it.

Configuration Ldap

To configure Ldap authentication:
Edit the config-maps kylo-services.yaml & kylo-ui.yaml as follows: config-map/kylo-services.yaml shoud be:

Production

    security.auth.ldap.server.uri=ldap://idm.daf.gov.it:389/cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.authDn=uid=admin,cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.password=xxxxxx

Test

    security.auth.ldap.server.uri=ldap://idm.teamdigitale.test:389/cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.authDn=uid=application,cn=users,cn=accounts,dc=daf,dc=gov,dc=it
    security.auth.ldap.server.password=xxxxxx

After these two changes redeploy as follows:

kubectl delete -f config-map/kylo-services.yaml  
kubectl delete -f config-map/kylo-ui.yaml  
  
kubectl apply -f config-map/kylo-services.yaml  
kubectl apply -f config-map/kylo-ui.yaml  

In the above example, it is not take in account the [namespace].

  1. Go to idm.teamdigitale.test and create an user such as dladmin with a password password

After these you are able to login into kylo ui!

As pointed out above, once this is done ldap login will be substituted by default login , this will allow to log in with default user dladmin/thinkbig. This has to be done to create users with the same name that those exist in ldap in order to grant them permissions (same functionality but for groups is currently being fixed by R&D) . Once user/s (or group/s) is/are created change back config-map/kylo-services.yaml and config-map/kylo-ui.yaml and redeploy again. Ldap is now good to go.

Bootstrap note

When kylo starts for the first time it need liquibase for creating Kylo DB, make sure that in the application.properties in kylo-service's config map:

liquibase.enabled=true  

How to views Log

Custom Processors

Here you can find additional information about custom processors created for the DAF.

How to contribute

Contributions are welcome. Feel free to open issues and submit a pull request at any time, but please read our handbook first.

License

Copyright (c) 2019 Presidenza del Consiglio dei Ministri

This program is a free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].