Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → elastic → Eland

elastic / Eland

Licence: apache-2.0

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Programming Languages

python

139335 projects - #7 most used programming language

Labels

machine-learning elasticsearch data-analysis pandas big-data scikit-learn etl dataframe lightgbm

Projects that are alternatives of or similar to Eland

Mars

Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

Stars: ✭ 2,308 (+882.13%)

Mutual labels: dataframe, pandas, scikit-learn, lightgbm

Zat

Zeek Analysis Tools (ZAT): Processing and analysis of Zeek network data with Pandas, scikit-learn, Kafka and Spark

Stars: ✭ 303 (+28.94%)

Mutual labels: data-analysis, pandas, scikit-learn

Dominando-Pandas

Este repositório está destinado ao processo de aprendizagem da biblioteca Pandas.

Stars: ✭ 22 (-90.64%)

Mutual labels: pandas, data-analysis, dataframe

Data Science Ipython Notebooks

Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

Stars: ✭ 22,048 (+9282.13%)

Mutual labels: pandas, big-data, scikit-learn

dataquest-guided-projects-solutions

My dataquest project solutions

Stars: ✭ 35 (-85.11%)

Mutual labels: scikit-learn, pandas, data-analysis

dflib

In-memory Java DataFrame library

Stars: ✭ 50 (-78.72%)

Mutual labels: etl, data-analysis, dataframe

Pandastable

Table analysis in Tkinter using pandas DataFrames.

Stars: ✭ 376 (+60%)

Mutual labels: dataframe, data-analysis, pandas

Koalas

Koalas: pandas API on Apache Spark

Stars: ✭ 3,044 (+1195.32%)

Mutual labels: dataframe, pandas, big-data

Foxcross

AsyncIO serving for data science models

Stars: ✭ 18 (-92.34%)

Mutual labels: dataframe, pandas, scikit-learn

Mlcourse.ai

Open Machine Learning Course

Stars: ✭ 7,963 (+3288.51%)

Mutual labels: data-analysis, pandas, scikit-learn

Setl

A simple Spark-powered ETL framework that just works 🍺

Stars: ✭ 79 (-66.38%)

Mutual labels: data-analysis, big-data, etl

hamilton

A scalable general purpose micro-framework for defining dataflows. You can use it to create dataframes, numpy matrices, python objects, ML models, etc.

Stars: ✭ 612 (+160.43%)

Mutual labels: etl, pandas, dataframe

datascienv

datascienv is package that helps you to setup your environment in single line of code with all dependency and it is also include pyforest that provide single line of import all required ml libraries

Stars: ✭ 53 (-77.45%)

Mutual labels: scikit-learn, pandas, lightgbm

Arch-Data-Science

Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision

Stars: ✭ 92 (-60.85%)

Mutual labels: scikit-learn, pandas, lightgbm

PracticalMachineLearning

A collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.

Stars: ✭ 60 (-74.47%)

Mutual labels: scikit-learn, pandas, data-analysis

Adam qas

ADAM - A Question Answering System. Inspired from IBM Watson

Stars: ✭ 330 (+40.43%)

Mutual labels: pandas, elasticsearch, scikit-learn

Dataframe

C++ DataFrame for statistical, Financial, and ML analysis -- in modern C++ using native types, continuous memory storage, and no pointers are involved

Stars: ✭ 828 (+252.34%)

Mutual labels: dataframe, data-analysis, pandas

Dat8

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+545.11%)

Mutual labels: data-analysis, pandas, scikit-learn

Pbpython

Code, Notebooks and Examples from Practical Business Python

Stars: ✭ 1,724 (+633.62%)

Mutual labels: data-analysis, pandas, scikit-learn

Data Science Notebook

📖 每一个伟大的思想和行动都有一个微不足道的开始

Stars: ✭ 196 (-16.6%)

Mutual labels: data-analysis, pandas

View All Similar Projects ➔

About

Eland is a Python Elasticsearch client for exploring and analyzing data in Elasticsearch with a familiar Pandas-compatible API.

Where possible the package uses existing Python APIs and data structures to make it easy to switch between numpy, pandas, scikit-learn to their Elasticsearch powered equivalents. In general, the data resides in Elasticsearch and not in memory, which allows Eland to access large datasets stored in Elasticsearch.

Eland also provides tools to upload trained machine learning models from your common libraries like scikit-learn, XGBoost, and LightGBM into Elasticsearch.

Getting Started

Eland can be installed from PyPI with Pip:

$ python -m pip install eland

Eland can also be installed from Conda Forge with Conda:

$ conda install -c conda-forge eland

Supported Versions

Supports Python 3.6+ and Pandas 1.0.0+
Supports Elasticsearch clusters that are 7.x+, recommended 7.6 or later for all features to work.

Connecting to Elasticsearch

Eland uses the Elasticsearch low level client to connect to Elasticsearch. This client supports a range of connection options and authentication options.

You can pass either an instance of elasticsearch.Elasticsearch to Eland APIs or a string containing the host to connect to:

import eland as ed

# Connecting to an Elasticsearch instance running on 'localhost:9200'
df = ed.DataFrame("localhost:9200", es_index_pattern="flights")

# Connecting to an Elastic Cloud instance
from elasticsearch import Elasticsearch

es = Elasticsearch(
    cloud_id="cluster-name:...",
    http_auth=("elastic", "<password>")
)
df = ed.DataFrame(es, es_index_pattern="flights")

DataFrames in Eland

eland.DataFrame wraps an Elasticsearch index in a Pandas-like API and defers all processing and filtering of data to Elasticsearch instead of your local machine. This means you can process large amounts of data within Elasticsearch from a Jupyter Notebook without overloading your machine.

➤ Eland DataFrame API documentation

➤ Advanced examples in a Jupyter Notebook

>>> import eland as ed

>>> # Connect to 'flights' index via localhost Elasticsearch node
>>> df = ed.DataFrame('localhost:9200', 'flights')

# eland.DataFrame instance has the same API as pandas.DataFrame
# except all data is in Elasticsearch. See .info() memory usage.
>>> df.head()
   AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
0      841.265642      False  ...         0 2018-01-01 00:00:00
1      882.982662      False  ...         0 2018-01-01 18:27:00
2      190.636904      False  ...         0 2018-01-01 17:11:14
3      181.694216       True  ...         0 2018-01-01 10:33:28
4      730.041778      False  ...         0 2018-01-01 05:13:00

[5 rows x 27 columns]

>>> df.info()
<class 'eland.dataframe.DataFrame'>
Index: 13059 entries, 0 to 13058
Data columns (total 27 columns):
 #   Column              Non-Null Count  Dtype         
---  ------              --------------  -----         
 0   AvgTicketPrice      13059 non-null  float64       
 1   Cancelled           13059 non-null  bool          
 2   Carrier             13059 non-null  object        
...      
 24  OriginWeather       13059 non-null  object        
 25  dayOfWeek           13059 non-null  int64         
 26  timestamp           13059 non-null  datetime64[ns]
dtypes: bool(2), datetime64[ns](1), float64(5), int64(2), object(17)
memory usage: 80.0 bytes
Elasticsearch storage usage: 5.043 MB

# Filtering of rows using comparisons
>>> df[(df.Carrier=="Kibana Airlines") & (df.AvgTicketPrice > 900.0) & (df.Cancelled == True)].head()
     AvgTicketPrice  Cancelled  ... dayOfWeek           timestamp
8        960.869736       True  ...         0 2018-01-01 12:09:35
26       975.812632       True  ...         0 2018-01-01 15:38:32
311      946.358410       True  ...         0 2018-01-01 11:51:12
651      975.383864       True  ...         2 2018-01-03 21:13:17
950      907.836523       True  ...         2 2018-01-03 05:14:51

[5 rows x 27 columns]

# Running aggregations across an index
>>> df[['DistanceKilometers', 'AvgTicketPrice']].aggregate(['sum', 'min', 'std'])
     DistanceKilometers  AvgTicketPrice
sum        9.261629e+07    8.204365e+06
min        0.000000e+00    1.000205e+02
std        4.578263e+03    2.663867e+02

Machine Learning in Eland

Eland allows transforming trained models from scikit-learn, XGBoost, and LightGBM libraries to be serialized and used as an inference model in Elasticsearch

➤ Eland Machine Learning API documentation

>>> from xgboost import XGBClassifier
>>> from eland.ml import MLModel

# Train and exercise an XGBoost ML model locally
>>> xgb_model = XGBClassifier(booster="gbtree")
>>> xgb_model.fit(training_data[0], training_data[1])

>>> xgb_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

# Import the model into Elasticsearch
>>> es_model = MLModel.import_model(
    es_client="localhost:9200",
    model_id="xgb-classifier",
    model=xgb_model,
    feature_names=["f0", "f1", "f2", "f3", "f4"],
)

# Exercise the ML model in Elasticsearch with the training data
>>> es_model.predict(training_data[0])
[0 1 1 0 1 0 0 0 1 0]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 235

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (49) 🔗