All Projects → jpeddicord → Askalono

jpeddicord / Askalono

Licence: apache-2.0
A tool & library to detect open source licenses from texts

Programming Languages

rust
11053 projects

Labels

Projects that are alternatives of or similar to Askalono

tools-python
A Python library to parse, validate and create SPDX documents.
Stars: ✭ 65 (-67.66%)
Mutual labels:  licensing
Cryptolens Golang
Client API to access the functionality of Cryptolens Software Licensing API
Stars: ✭ 20 (-90.05%)
Mutual labels:  licensing
Pelock Software Protection And Licensing Sdk
Software copy protection against cracking & reverse engineering with anti-cracking & anti-debugging techniques. Software license key system with time trial options.
Stars: ✭ 109 (-45.77%)
Mutual labels:  licensing
Pyprotect
A lightweight python code protector, makes your python project harder to reverse engineer
Stars: ✭ 317 (+57.71%)
Mutual labels:  licensing
Copyright
Copyright is a simple application for updating all the copyright info in your Swift or Obj-C projects.
Stars: ✭ 5 (-97.51%)
Mutual labels:  licensing
Opendefinition
Open Definition source
Stars: ✭ 87 (-56.72%)
Mutual labels:  licensing
netlicensing.io
Labs64 NetLicensing - Innovative License Management Solution
Stars: ✭ 13 (-93.53%)
Mutual labels:  licensing
License List Data
Various data formats for the SPDX License List including RDFa, HTML, Text, and JSON
Stars: ✭ 182 (-9.45%)
Mutual labels:  licensing
Licensing
Microsoft 365 licensing diagrams
Stars: ✭ 891 (+343.28%)
Mutual labels:  licensing
Licensingviewcontroller
📃 UIViewController subclass with a simple API for displaying licensing information.
Stars: ✭ 107 (-46.77%)
Mutual labels:  licensing
Licensecc
Software licensing, copy protection in C++. It has few dependencies and it's cross-platform.
Stars: ✭ 363 (+80.6%)
Mutual labels:  licensing
Licensed
A Ruby gem to cache and verify the licenses of dependencies
Stars: ✭ 690 (+243.28%)
Mutual labels:  licensing
Specs
COALA IP is a blockchain-ready, community-driven protocol for intellectual property licensing.
Stars: ✭ 98 (-51.24%)
Mutual labels:  licensing
licensor
write licenses to stdout
Stars: ✭ 138 (-31.34%)
Mutual labels:  licensing
Cargo About
📜 Cargo plugin to generate list of all licenses for a crate 🦀
Stars: ✭ 148 (-26.37%)
Mutual labels:  licensing
awesome-open-source-licensing
Cool links, tools & papers related to Open Source Licensing
Stars: ✭ 17 (-91.54%)
Mutual labels:  licensing
Scancode Toolkit
🔎 ScanCode detects licenses, copyrights, package manifests & dependencies and more by scanning code ... to discover and inventory open source and third-party packages used in your code.
Stars: ✭ 1,134 (+464.18%)
Mutual labels:  licensing
Licensepp
Software licensing done right - license++ is a cross platform software licensing library that uses digital signatures to secure use of your application and its licensing
Stars: ✭ 185 (-7.96%)
Mutual labels:  licensing
Npm License Crawler
Analyzes license information for multiple node.js modules (package.json files) as part of your software project.
Stars: ✭ 168 (-16.42%)
Mutual labels:  licensing
Ethicalsource.dev
Home of the Organization for Ethical Source
Stars: ✭ 105 (-47.76%)
Mutual labels:  licensing

askalono

askalono is a library and command-line tool to help detect license texts. It's designed to be fast, accurate, and to support a wide variety of license texts.

askalono crate documentation

Notice

This tool does not provide legal advice and it is not a lawyer. It endeavors to match your input to a database of similar license texts, and tell you what it thinks is a close match. But, it can't tell you that the given license is authoritative over a project. Nor can it tell you what to do with a license once it's identified. You are not entitled to rely on the accuracy of the output of this tool, and should seek independent legal advice for any licensing questions that may arise from using this tool.

Usage

On the command line

Pre-built binaries are available on the Releases section on GitHub. Rust developers may also grab a copy by running cargo install askalono-cli.

Basic usage:

askalono id <filename>

where <filename> is a file (not folder) containing license text to analyze. In many projects, this file is called LICENSE or COPYING. askalono will analyze the text and output what it thinks it is.

If askalono can't identify a file, it may simply be a license it just doesn't know. But, if it's actually source code with a file header (or footer, or anything in between) it may be able to dig deeper. To try this, pass the --optimize flag:

askalono id --optimize <filename>

If you'd like to discover license files within a directory tree, askalono offers a crawl action:

askalono crawl <directory>

As a library

At the moment, Store and LicenseContent are exposed for usage.

The best way to get an idea of how to use askalono as a library in its early state is to look at the example. Some examples are also available in the documentation.

Details

Implementation

tl;dr: Sørensen–Dice scoring, multi-threading, compressed cache file

At its core, askalono builds up bigrams (word pairs) of input text, and compares that with other license texts it knows about to see how similar they are. It scores each match with a Sørensen–Dice coefficient and looks for the highest result. There is some minimal preprocessing happening before matching, but there are no hand-maintained regular expressions or curations used to determine a match.

In detail, the matching process:

  1. Reads in input text
  2. Normalizes everything it reasonably can -- Unicode characters, whitespace, quoting styles, etc. are all whittled down to something common.
    • Lines that tend to change a lot in licenses, like "Copyright 20XX Some Person", are additionally removed.
  3. Tokenizes normalized text into a set of bigrams.
  4. In parallel, the bigram set is compared with all of the other sets askalono knows about.
  5. The resulting list is sorted, the top match identified, and result returned.

To optimize startup, askalono builds up a database of license texts (applying the same normalization techniques described above), and persists this data to a MessagePack'd & zstd compressed cache file. This cache is loaded at startup, and is optionally embedded in the binary itself.

Name

It means "shallot" in Esperanto. You could try to derive a hidden meaning from it, but the real reason is really just that onions are delicious and Esperanto is an interesting language. In the author's opinion. (Sed la verkisto ne estas bonega Esperantisto, do bonvolu konversacii en la angla sur ĉi tiu projekto.)

How is this different from other solutions?

There are several other excellent projects in this space, including licensee, LiD, and ScanCode. These projects attempt to get a larger picture of a project's licensing, and can look at other sources of metadata to try to find answers. Both of these inspired the creation of askalono, first as a curiosity, then as a serious project.

askalono focuses on the problem of matching text itself -- it's often the piece that is difficult to optimize for speed and accuracy. askalono could be seen as a piece of plumbing in a larger system. The askalono command line application includes other goodies, such as a directory crawler, but these are largely for quick once-off use before diving in with more systematic solutions. (If you're looking for such a solution, take a look at the projects I just mentioned!)

Where do the licenses come from?

License data is sourced directly from SPDX: https://github.com/spdx/license-list-data

askalono can parse the "json" format included in that repository to generate its cache.

At this time, askalono is not taking requests for additional licenses in its default dataset -- its dataset is SPDX's own.

Contributing

Contributions are very welcome! See CONTRIBUTING for more info.

License

This library is licensed under the Apache 2.0 License.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].