All Projects → sckott → Elastic_data

sckott / Elastic_data

Elasticsearch datasets ready for bulk loading

Projects that are alternatives of or similar to Elastic data

Elasticsearch Ruby
Ruby integrations for Elasticsearch
Stars: ✭ 1,848 (+6060%)
Mutual labels:  elastic, elasticsearch
Elastichd
Elasticsearch 可视化DashBoard, 支持Es监控、实时搜索,Index template快捷替换修改,索引列表信息查看, SQL converts to DSL等
Stars: ✭ 2,993 (+9876.67%)
Mutual labels:  elastic, elasticsearch
Docker Elastic Stack
ELK Stack Dockerfile
Stars: ✭ 175 (+483.33%)
Mutual labels:  elastic, elasticsearch
Elastic
Elastic Stack (6.2.4) 을 활용한 Dashboard 만들기 Project
Stars: ✭ 121 (+303.33%)
Mutual labels:  elastic, elasticsearch
Graphql Compose Elasticsearch
Hide Elastic Search REST API behind GraphQL.
Stars: ✭ 498 (+1560%)
Mutual labels:  elastic, elasticsearch
Sigmaui
SIGMA UI is a free open-source application based on the Elastic stack and Sigma Converter (sigmac)
Stars: ✭ 123 (+310%)
Mutual labels:  elastic, elasticsearch
Elastix
A simple Elasticsearch REST client written in Elixir.
Stars: ✭ 231 (+670%)
Mutual labels:  elastic, elasticsearch
Elasticambari
Elastic Service for Ambari
Stars: ✭ 108 (+260%)
Mutual labels:  elastic, elasticsearch
Pfelk
pfSense/OPNsense + ELK
Stars: ✭ 417 (+1290%)
Mutual labels:  elastic, elasticsearch
Elasticsearch Rails
Elasticsearch integrations for ActiveModel/Record and Ruby on Rails
Stars: ✭ 2,896 (+9553.33%)
Mutual labels:  elastic, elasticsearch
Elastic Docker
Example setups for Elasticsearch, Kibana, Logstash, and Beats with docker-compose
Stars: ✭ 118 (+293.33%)
Mutual labels:  elastic, elasticsearch
Datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
Stars: ✭ 814 (+2613.33%)
Mutual labels:  dataset, elasticsearch
Redelk
Red Team's SIEM - tool for Red Teams used for tracking and alarming about Blue Team activities as well as better usability in long term operations.
Stars: ✭ 1,692 (+5540%)
Mutual labels:  elastic, elasticsearch
Elastic Stack
Aprenda Elasticsearch, Logstash, Kibana e Beats do jeito mais fácil ⭐️
Stars: ✭ 135 (+350%)
Mutual labels:  elastic, elasticsearch
Microservice Monitoring
Monitor your Spring Boot application with the Elastic Stack all around
Stars: ✭ 114 (+280%)
Mutual labels:  elastic, elasticsearch
Elasticsearch Comrade
Elasticsearch admin panel built for ops and monitoring
Stars: ✭ 214 (+613.33%)
Mutual labels:  elastic, elasticsearch
Elastic Scout Driver Plus
Extension for Elastic Scout Driver
Stars: ✭ 90 (+200%)
Mutual labels:  elastic, elasticsearch
Windows Installers
Windows installers for the Elastic stack
Stars: ✭ 101 (+236.67%)
Mutual labels:  elastic, elasticsearch
Helk
The Hunting ELK
Stars: ✭ 3,097 (+10223.33%)
Mutual labels:  elastic, elasticsearch
Elasticsql
convert sql to elasticsearch DSL in golang(go)
Stars: ✭ 687 (+2190%)
Mutual labels:  elastic, elasticsearch

elastic datasets

This is a collection of smallish datasets to use for playing with Elasticsearch.

You can only fit so much data in an R package. The R client for Elasticsearch we maintain elastic comes with some data, but of course it's nice to have more, so here it is.

See also nodbi for working with Elasticsearch from R.

Datasets

  • plos_everything.json
  • plos_introductions.json
  • plos_data.json
  • geonames_elastic_bulk.zip - too big for gitub, at dropbox
  • gbif_data.json
  • gbif_geo.json
  • gbif_geopoint.json
  • gbif_geoshape.json
  • gbif_geosmall.json
  • shakespeare_data.json
  • omdb.json

Loading into ES

These datasets are formatted to be ready for bulk loading into Elasticsearch via the bulk API

geonames

geonames_elastic_bulk.zip is about 70 .json files in Elasticsearch bulk format. It was prepared from the Geonames database at http://download.geonames.org/export/dump/. The original data from Geonames was licensed under a Creative Commons Attribution 3.0 License, see http://creativecommons.org/licenses/by/3.0/.

To load the geonames data into Elasticsearch, do as you wish, but e.g., in R you could do:

First, create the index and set the geo_shape mapping

body <- '{
 "mappings": {
   "record": {
     "properties": {
         "location" : {"type" : "geo_shape"}
      }
   }
 }
}'
index_create(index='geonames', body=body)

should return

#> $acknowledged
#> [1] TRUE

Note: the index type is record, and the index name is geonames. The index and index type were set in the json files.

Then use a for loop to load in each file. AKAIK there is a limit on the file size you can load in (let me know if there's a way to get around it), so that's why theres a bunch of json files instead of one big file.

devtools::install_github("ropensci/elastic")
library("elastic")
files <- list.files("path/to/unzipped/files")
for(i in seq_along(files)){
  invisible(
    docs_bulk(
      sprintf("path/geonames%s.json", files[i])
    )
  )
}

The docs_bulk() function uses the /_bulk endpoint to POST data to an index called geonames in your ES server. The output of the bulk load call prints info, that's why we use invisible() so you don't get thousands of lines printed.

Check that it worked:

Search("geonames")$hits$total
#> [1] 6646030

You should have ~ 6.6 million records

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].