All Projects → stestagg → Pytubes

stestagg / Pytubes

Licence: mit
A module for getting data into python from large data sources

Programming Languages

python
139335 projects - #7 most used programming language
cpp
1120 projects
cpp11
221 projects
cython
566 projects

Labels

Projects that are alternatives of or similar to Pytubes

Datasets
TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...
Stars: ✭ 3,094 (+1786.59%)
Mutual labels:  data, numpy
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (+66.46%)
Mutual labels:  data, numpy
Python
This repository helps you understand python from the scratch.
Stars: ✭ 285 (+73.78%)
Mutual labels:  data, numpy
Pyaudiodsptools
Numpy Audio DSP Tools
Stars: ✭ 154 (-6.1%)
Mutual labels:  numpy
Rnn lstm from scratch
How to build RNNs and LSTMs from scratch with NumPy.
Stars: ✭ 156 (-4.88%)
Mutual labels:  numpy
Cheatsheets.pdf
📚 Various cheatsheets in PDF
Stars: ✭ 159 (-3.05%)
Mutual labels:  numpy
Stats
A well tested and comprehensive Golang statistics library package with no dependencies.
Stars: ✭ 2,196 (+1239.02%)
Mutual labels:  data
Hottbox
HOTTBOX: Higher Order Tensors ToolBOX.
Stars: ✭ 153 (-6.71%)
Mutual labels:  data
Pywt
We're moving. Please visit https://github.com/PyWavelets
Stars: ✭ 161 (-1.83%)
Mutual labels:  numpy
Gobblin
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Stars: ✭ 2,006 (+1123.17%)
Mutual labels:  data
React Native Quiet
🤫 Quiet for React Native.
Stars: ✭ 158 (-3.66%)
Mutual labels:  data
Orjson
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy
Stars: ✭ 2,595 (+1482.32%)
Mutual labels:  numpy
Kalman Filter
Kalman Filter implementation in Python using Numpy only in 30 lines.
Stars: ✭ 161 (-1.83%)
Mutual labels:  numpy
Color recognition
🎨 Color recognition & classification & detection on webcam stream / on video / on single image using K-Nearest Neighbors (KNN) is trained with color histogram features by OpenCV.
Stars: ✭ 154 (-6.1%)
Mutual labels:  numpy
Dop
JavaScript implementation for Distributed Object Protocol
Stars: ✭ 163 (-0.61%)
Mutual labels:  data
Anaconda Project
Tool for encapsulating, running, and reproducing data science projects
Stars: ✭ 153 (-6.71%)
Mutual labels:  data
Py
Repository to store sample python programs for python learning
Stars: ✭ 4,154 (+2432.93%)
Mutual labels:  numpy
Gasyori100knock
image processing codes to understand algorithm
Stars: ✭ 1,988 (+1112.2%)
Mutual labels:  numpy
Holiday Cn
📅🇨🇳 中国法定节假日数据 自动每日抓取国务院公告
Stars: ✭ 157 (-4.27%)
Mutual labels:  data
Numsca
numsca is numpy for scala
Stars: ✭ 160 (-2.44%)
Mutual labels:  numpy

pytubes

Source: https://github.com/stestagg/pytubes

Pytubes is a library that optimizes loading datasets into memory.

At it’s core is a set of specialized C++ classes that can be chained together to load and manipulate data using a standard iterator pattern. Around this there is a cython extension module that makes defining and configuring a tube simple and straight-forward.

Simple Example

from tubes import Each import glob tube = (Each(glob.glob("*.json")) # Iterate over some filenames .read_files() # Read each file, chunk by chunk .split() # Split the file, line-by-line .json() # parse json .get('country_code', 'null')) # extract field named 'country_code' set(tube) # collect results in a set {'A1', 'AD', 'AE', 'AF', 'AG', 'AL', 'AM', 'AO', 'AP', ...}

More Complex Example

from tubes import Each import glob

x = (Each(glob.glob('*.jsonz')) .map_files() .gunzip() .split(b'\n') .json() .enumerate() .skip_unless(lambda x: x.slot(1).get('country_code', '""').to(str).equals('GB')) .multi(lambda x: ( x.slot(0), x.slot(1).get('timestamp', 'null'), x.slot(1).get('country_code', 'null'), x.slot(1).get('url', 'null'), x.slot(1).get('file', '{}').get('filename', 'null'), x.slot(1).get('file', '{}').get('project'), x.slot(1).get('details', '{}').get('installer', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('python', 'null'), x.slot(1).get('details', '{}').get('system', 'null'), x.slot(1).get('details', '{}').get('system', '{}').get('name', 'null'), x.slot(1).get('details', '{}').get('cpu', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('lib', 'null'), x.slot(1).get('details', '{}').get('distro', '{}').get('libc', '{}').get('version', 'null'), )) ) print(list(x)[-3]) (15,612,767, '2017-12-14 09:33:31 UTC', 'GB', '/packages/29/9b/25ef61e948321296f029f53c9f67cc2b54e224db509eb67ce17e0df6044a/certifi-2017.11.5-py2.py3-none-any.whl', 'certifi-2017.11.5-py2.py3-none-any.whl', 'certifi', 'pip', '2.7.5', {'name': 'Linux', 'release': '2.6.32-696.10.3.el6.x86_64'}, 'Linux', 'x86_64', 'glibc', '2.17')

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].