All Projects → benhoyt → Scandir

benhoyt / Scandir

Licence: bsd-3-clause
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scandir

Fastbinaryencoding
Fast Binary Encoding is ultra fast and universal serialization solution for C++, C#, Go, Java, JavaScript, Kotlin, Python, Ruby, Swift
Stars: ✭ 421 (-10.62%)
Mutual labels:  performance
Jmeter Plugins
JMeter Plugins
Stars: ✭ 443 (-5.94%)
Mutual labels:  performance
Plumhound
Bloodhound for Blue and Purple Teams
Stars: ✭ 452 (-4.03%)
Mutual labels:  directory
Fuse Box
A blazing fast js bundler/loader with a comprehensive API 🔥
Stars: ✭ 4,055 (+760.93%)
Mutual labels:  performance
Why Did You Update
💥 Puts your console on blast when React is making unnecessary updates.
Stars: ✭ 4,089 (+768.15%)
Mutual labels:  performance
Happypack
Happiness in the form of faster webpack build times.
Stars: ✭ 4,232 (+798.51%)
Mutual labels:  performance
Nuitka
Nuitka is a Python compiler written in Python. It's fully compatible with Python 2.6, 2.7, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8, and 3.9. You feed it your Python app, it does a lot of clever things, and spits out an executable or extension module.
Stars: ✭ 6,173 (+1210.62%)
Mutual labels:  performance
Gearbox
Gearbox ⚙️ is a web framework written in Go with a focus on high performance
Stars: ✭ 455 (-3.4%)
Mutual labels:  performance
Web Performance Monitoring System
A complete performance monitoring system.
Stars: ✭ 436 (-7.43%)
Mutual labels:  performance
React Virtualized
React components for efficiently rendering large lists and tabular data
Stars: ✭ 22,963 (+4775.37%)
Mutual labels:  performance
Tpt Oracle
Tanel Poder's Troubleshooting & Performance Tools for Oracle Databases
Stars: ✭ 429 (-8.92%)
Mutual labels:  performance
Automate Everything
这是我准备写的第一本书,其实早些时候已经打算开始写书了,只是苦于没有写书经验,无从下手。写书不同于博客,写书需要将知识,经验等系统化地讲述出来,而我现在恰巧缺乏这种表现能力。因此我决定在这里将项目中零散的东西记录下来,然后后期润色一下,写成一本书。
Stars: ✭ 430 (-8.7%)
Mutual labels:  performance
Pprof
pprof is a tool for visualization and analysis of profiling data
Stars: ✭ 4,990 (+959.45%)
Mutual labels:  performance
Vulkan best practice for mobile developers
Vulkan best practice for mobile developers
Stars: ✭ 424 (-9.98%)
Mutual labels:  performance
Halide
a language for fast, portable data-parallel computation
Stars: ✭ 4,722 (+902.55%)
Mutual labels:  performance
React Cool Dimensions
😎 📏 React hook to measure an element's size and handle responsive components.
Stars: ✭ 419 (-11.04%)
Mutual labels:  performance
Dwarfs
A fast high compression read-only file system
Stars: ✭ 444 (-5.73%)
Mutual labels:  performance
Trace Nodejs
Trace is a visualised distributed tracing platform designed for microservices.
Stars: ✭ 471 (+0%)
Mutual labels:  performance
Orientdb
OrientDB is the most versatile DBMS supporting Graph, Document, Reactive, Full-Text and Geospatial models in one Multi-Model product. OrientDB can run distributed (Multi-Master), supports SQL, ACID Transactions, Full-Text indexing and Reactive Queries. OrientDB Community Edition is Open Source using a liberal Apache 2 license.
Stars: ✭ 4,394 (+832.91%)
Mutual labels:  performance
Iris
The fastest HTTP/2 Go Web Framework. AWS Lambda, gRPC, MVC, Unique Router, Websockets, Sessions, Test suite, Dependency Injection and more. A true successor of expressjs and laravel | 谢谢 https://github.com/kataras/iris/issues/1329 |
Stars: ✭ 21,587 (+4483.23%)
Mutual labels:  performance

scandir, a better directory iterator and faster os.walk()

.. image:: https://img.shields.io/pypi/v/scandir.svg :target: https://pypi.org/project/scandir/ :alt: scandir on PyPI (Python Package Index)

.. image:: https://github.com/benhoyt/scandir/actions/workflows/tests.yml/badge.svg :target: https://github.com/benhoyt/scandir/actions/workflows/tests.yml :alt: GitHub Actions Tests

scandir() is a directory iteration function like os.listdir(), except that instead of returning a list of bare filenames, it yields DirEntry objects that include file type and stat information along with the name. Using scandir() increases the speed of os.walk() by 2-20 times (depending on the platform and file system) by avoiding unnecessary calls to os.stat() in most cases.

Now included in a Python near you!

scandir has been included in the Python 3.5 standard library as os.scandir(), and the related performance improvements to os.walk() have also been included. So if you're lucky enough to be using Python 3.5 (release date September 13, 2015) you get the benefit immediately, otherwise just download this module from PyPI <https://pypi.python.org/pypi/scandir>_, install it with pip install scandir, and then do something like this in your code:

.. code-block:: python

# Use the built-in version of scandir/walk if possible, otherwise
# use the scandir module version
try:
    from os import scandir, walk
except ImportError:
    from scandir import scandir, walk

PEP 471 <https://www.python.org/dev/peps/pep-0471/>, which is the PEP that proposes including scandir in the Python standard library, was accepted <https://mail.python.org/pipermail/python-dev/2014-July/135561.html> in July 2014 by Victor Stinner, the BDFL-delegate for the PEP.

This scandir module is intended to work on Python 2.7+ and Python 3.4+ (and it has been tested on those versions).

Background

Python's built-in os.walk() is significantly slower than it needs to be, because -- in addition to calling listdir() on each directory -- it calls stat() on each file to determine whether the filename is a directory or not. But both FindFirstFile / FindNextFile on Windows and readdir on Linux/OS X already tell you whether the files returned are directories or not, so no further stat system calls are needed. In short, you can reduce the number of system calls from about 2N to N, where N is the total number of files and directories in the tree.

In practice, removing all those extra system calls makes os.walk() about 7-50 times as fast on Windows, and about 3-10 times as fast on Linux and Mac OS X. So we're not talking about micro-optimizations. See more benchmarks in the "Benchmarks" section below.

Somewhat relatedly, many people have also asked for a version of os.listdir() that yields filenames as it iterates instead of returning them as one big list. This improves memory efficiency for iterating very large directories.

So as well as a faster walk(), scandir adds a new scandir() function. They're pretty easy to use, but see "The API" below for the full docs.

Benchmarks

Below are results showing how many times as fast scandir.walk() is than os.walk() on various systems, found by running benchmark.py with no arguments:

==================== ============== ============= System version Python version Times as fast ==================== ============== ============= Windows 7 64-bit 2.7.7 64-bit 10.4 Windows 7 64-bit SSD 2.7.7 64-bit 10.3 Windows 7 64-bit NFS 2.7.6 64-bit 36.8 Windows 7 64-bit SSD 3.4.1 64-bit 9.9 Windows 7 64-bit SSD 3.5.0 64-bit 9.5 Ubuntu 14.04 64-bit 2.7.6 64-bit 5.8 Mac OS X 10.9.3 2.7.5 64-bit 3.8 ==================== ============== =============

All of the above tests were done using the fast C version of scandir (source code in _scandir.c).

Note that the gains are less than the above on smaller directories and greater on larger directories. This is why benchmark.py creates a test directory tree with a standardized size.

The API

walk()


The API for ``scandir.walk()`` is exactly the same as ``os.walk()``, so just
`read the Python docs <https://docs.python.org/3.5/library/os.html#os.walk>`_.

scandir()

The full docs for scandir() and the DirEntry objects it yields are available in the Python documentation here <https://docs.python.org/3.5/library/os.html#os.scandir>_. But below is a brief summary as well.

scandir(path='.') -> iterator of DirEntry objects for given path

Like listdir, scandir calls the operating system's directory iteration system calls to get the names of the files in the given path, but it's different from listdir in two ways:

  • Instead of returning bare filename strings, it returns lightweight DirEntry objects that hold the filename string and provide simple methods that allow access to the additional data the operating system may have returned.

  • It returns a generator instead of a list, so that scandir acts as a true iterator instead of returning the full list immediately.

scandir() yields a DirEntry object for each file and sub-directory in path. Just like listdir, the '.' and '..' pseudo-directories are skipped, and the entries are yielded in system-dependent order. Each DirEntry object has the following attributes and methods:

  • name: the entry's filename, relative to the scandir path argument (corresponds to the return values of os.listdir)

  • path: the entry's full path name (not necessarily an absolute path) -- the equivalent of os.path.join(scandir_path, entry.name)

  • is_dir(*, follow_symlinks=True): similar to pathlib.Path.is_dir(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases; don't follow symbolic links if follow_symlinks is False

  • is_file(*, follow_symlinks=True): similar to pathlib.Path.is_file(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases; don't follow symbolic links if follow_symlinks is False

  • is_symlink(): similar to pathlib.Path.is_symlink(), but the return value is cached on the DirEntry object; doesn't require a system call in most cases

  • stat(*, follow_symlinks=True): like os.stat(), but the return value is cached on the DirEntry object; does not require a system call on Windows (except for symlinks); don't follow symbolic links (like os.lstat()) if follow_symlinks is False

  • inode(): return the inode number of the entry; the return value is cached on the DirEntry object

Here's a very simple example of scandir() showing use of the DirEntry.name attribute and the DirEntry.is_dir() method:

.. code-block:: python

def subdirs(path):
    """Yield directory names not starting with '.' under given path."""
    for entry in os.scandir(path):
        if not entry.name.startswith('.') and entry.is_dir():
            yield entry.name

This subdirs() function will be significantly faster with scandir than os.listdir() and os.path.isdir() on both Windows and POSIX systems, especially on medium-sized or large directories.

Further reading

  • The Python docs for scandir <https://docs.python.org/3.5/library/os.html#os.scandir>_
  • PEP 471 <https://www.python.org/dev/peps/pep-0471/>_, the (now-accepted) Python Enhancement Proposal that proposed adding scandir to the standard library -- a lot of details here, including rejected ideas and previous discussion

Flames, comments, bug reports

Please send flames, comments, and questions about scandir to Ben Hoyt:

http://benhoyt.com/

File bug reports for the version in the Python 3.5 standard library here <https://docs.python.org/3.5/bugs.html>_, or file bug reports or feature requests for this module at the GitHub project page:

https://github.com/benhoyt/scandir

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].