All Projects → pesterhazy → cljs-corpus

pesterhazy / cljs-corpus

Licence: other
A greppable archive of ClojureScript code

Labels

Projects that are alternatives of or similar to cljs-corpus

open2ch-dialogue-corpus
おーぷん2ちゃんねるをクロールして作成した対話コーパス
Stars: ✭ 65 (+75.68%)
Mutual labels:  corpus
text-classification-cn
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
Stars: ✭ 81 (+118.92%)
Mutual labels:  corpus
pdf-corpus
Python script to quickly create hand-crafted PDF files
Stars: ✭ 17 (-54.05%)
Mutual labels:  corpus
textbox
Text collections made available by the CLiGS group.
Stars: ✭ 19 (-48.65%)
Mutual labels:  corpus
thaigov-corpus
โครงการเก็บรวบรวมข่าวสารจากเว็บไซต์รัฐบาลไทย
Stars: ✭ 19 (-48.65%)
Mutual labels:  corpus
TV4Dialog
No description or website provided.
Stars: ✭ 33 (-10.81%)
Mutual labels:  corpus
nytwit
New York Times Word Innovation Types dataset
Stars: ✭ 21 (-43.24%)
Mutual labels:  corpus
bible-corpus
A multilingual parallel corpus created from translations of the Bible.
Stars: ✭ 115 (+210.81%)
Mutual labels:  corpus
kanji-frequency
Kanji usage frequency data collected from various sources
Stars: ✭ 92 (+148.65%)
Mutual labels:  corpus
CBLUE
中文医疗信息处理基准CBLUE: A Chinese Biomedical Language Understanding Evaluation Benchmark
Stars: ✭ 379 (+924.32%)
Mutual labels:  corpus
When-in-Rome
A meta-corpus of functional harmonic analysis.
Stars: ✭ 35 (-5.41%)
Mutual labels:  corpus
mev-corpus
MEV Data Corpus
Stars: ✭ 77 (+108.11%)
Mutual labels:  corpus
jrte-corpus
Japanese Realistic Textual Entailment Corpus (NLP 2020, LREC 2020)
Stars: ✭ 66 (+78.38%)
Mutual labels:  corpus
malay-dataset
Text corpus for Bahasa Malaysia, https://malaya.readthedocs.io/en/latest/Dataset.html
Stars: ✭ 189 (+410.81%)
Mutual labels:  corpus
PoetryCorpus
Поэтический корпус русского языка
Stars: ✭ 40 (+8.11%)
Mutual labels:  corpus
gum
Repository for the Georgetown University Multilayer Corpus (GUM)
Stars: ✭ 71 (+91.89%)
Mutual labels:  corpus
LanguageCodes
We present a list of languages with their codes, families, regions and etc. We also present a list of multi-lingual corpora (with urls).
Stars: ✭ 70 (+89.19%)
Mutual labels:  corpus
KWDLC
Kyoto University Web Document Leads Corpus
Stars: ✭ 64 (+72.97%)
Mutual labels:  corpus
CLUEmotionAnalysis2020
CLUE Emotion Analysis Dataset 细粒度情感分析数据集
Stars: ✭ 3 (-91.89%)
Mutual labels:  corpus
egret-wenda-corpus
A Public Corpus for Machine Learning
Stars: ✭ 41 (+10.81%)
Mutual labels:  corpus

ClojureScript corpus

In linguistics, a text corpus is a set of texts written in a language. Its purpose is to be analyzed to test hypotheses about the actual usage of the language.

Similarly the aim of cljs-corpus is to provide a searchable local archive of ClojureScript as it is used in the wild.

Why would you use a local archive over tools like Github Code Search?

  • Curation: The projects have been selected to exhibit high standards of quality
  • It's All Text: Because the full text is present on your machine, you can use text search tools like grep or ripgrep to refine your search.
  • Reliability: Compared to Github's fuzzy search, grep is predictable and allows exact string maches as well as regular expression queries.

This corpus contains popular OSS libraries. But recognizing the differences between application code and library code the archive also puts a focus on real-world applications written in ClojureScript.

Usage

Clone this repository and update its submodules:

git clone https://github.com/pesterhazy/cljs-corpus.git
cd cljs-corpus
git submodule update --init

Each time you git pull you will need to update the submodules again:

git submodule update --init

The corpus currently weighs in at >300M so give this some time to complete.

Example queries

For the best search experience, install ripgrep, a fast and reliable replacement for grep.

  1. Find creative use of the Google Closure Library

    rg -w goog
    
  2. How do people use LocalStorage from ClojureScript?

    rg -w localstorage
    
  3. I can't remember the order of arguments for clojure.string/join

    rg -C3 -g '*.clj[sc]' -w join
    

Related work

Author

Paulus Esterhazy [email protected]

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].