All Projects → harvard-lil → Capstone

harvard-lil / Capstone

Licence: mit
CAP database scripts.

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to Capstone

Crawler illegal cases in china
Collection of China illegal cases about web crawler 本项目用来整理所有中国大陆爬虫开发者涉诉与违规相关的新闻、资料与法律法规。致力于帮助在中国大陆工作的爬虫行业从业者了解我国相关法律,避免触碰数据合规红线。 [AD]中文知识图谱门户
Stars: ✭ 2,448 (+2085.71%)
Mutual labels:  law
esaj
Scrapers for many e-SAJ systems
Stars: ✭ 35 (-68.75%)
Mutual labels:  law
Blackstone
⚫️ A spaCy pipeline and model for NLP on unstructured legal text.
Stars: ✭ 465 (+315.18%)
Mutual labels:  law
Choosealicense.com
A site to provide non-judgmental guidance on choosing a license for your open source project
Stars: ✭ 2,648 (+2264.29%)
Mutual labels:  law
BillSum
US Bill Summarization Corpus
Stars: ✭ 31 (-72.32%)
Mutual labels:  law
dre
O projecto agora reside no GitLab
Stars: ✭ 20 (-82.14%)
Mutual labels:  law
Parselawdocuments
对收集的法律文档进行一系列分析,包括根据规范自动切分、案件相似度计算、案件聚类、法律条文推荐等(试验目前基于婚姻类案件,可扩展至其它领域)。
Stars: ✭ 138 (+23.21%)
Mutual labels:  law
Simple Cookie Choices
A simple cookie choices thought to the GDPR rules 🔒🍪
Stars: ✭ 12 (-89.29%)
Mutual labels:  law
Laosheng.top
老生常谈,节约您的搜寻时间。Laosheng.top 中国新闻云媒体,中央外宣与一带一路云媒体,五大洲的报纸、电视、通讯社;The Belt and Road Cloud Media。 解放军微博阵列,明星微博粉丝榜。中央有关部门大全,政府政协人大两院。中国千县地名图,联合国有关部门。 大萌望海楼,找法不用愁。中国法律体系概览,大萌法律读本。 老生常谈排行榜,难搜到的好网站。LSIP 大规模集成网页。😤
Stars: ✭ 21 (-81.25%)
Mutual labels:  law
Tosdr.org
ARCHIVED Source code for tosdr.org
Stars: ✭ 460 (+310.71%)
Mutual labels:  law
Ai law
all kinds of baseline models for long text classificaiton( text categorization)
Stars: ✭ 243 (+116.96%)
Mutual labels:  law
urteile-gesetze-web
Web-Frontend des juristischen Informationssystems urteile-gesetze.de
Stars: ✭ 16 (-85.71%)
Mutual labels:  law
constitucion-mexicana
Constitución Política de los Estados Unidos Mexicanos en formato ReST
Stars: ✭ 47 (-58.04%)
Mutual labels:  law
Tax Calculator
USA Federal Individual Income and Payroll Tax Microsimulation Model
Stars: ✭ 186 (+66.07%)
Mutual labels:  law
Licensee
A Ruby Gem to detect under what license a project is distributed.
Stars: ✭ 476 (+325%)
Mutual labels:  law
Kor Law For Dev
개발자들이 숙지해야할 한국의 법률을 모았습니다.
Stars: ✭ 140 (+25%)
Mutual labels:  law
SwitHak.github.io
SwitHak' Security Place for my Opinions and Work
Stars: ✭ 30 (-73.21%)
Mutual labels:  law
Mycail
中国法研杯-司法人工智能挑战赛
Stars: ✭ 60 (-46.43%)
Mutual labels:  law
Site Policy
Collaborative development on GitHub's site policies, procedures, and guidelines
Stars: ✭ 797 (+611.61%)
Mutual labels:  law
Lexpredict Lexnlp
LexNLP by LexPredict
Stars: ✭ 439 (+291.96%)
Mutual labels:  law

Capstone

CircleCI codecov

This is the source code for case.law, a website written by the Harvard Law School Library Innovation Lab to manage and serve court opinions. Other than several cases used for our automated testing, this repository does not contain case data. Case data may be obtained through the website.

Project Background

The Caselaw Access Project is a large-scale digitization project hosted by the Harvard Law School Library Innovation Lab. Visit case.law for more details.

The Data

  1. Format Documentation and Samples
  2. Obtaining Real Data
  3. Reporting Data Errors
  4. Errata

Format Documentation and Samples

The output of the project consists of page images, marked up case XML files, ALTO XML files, and METS XML files. This repository has a more detailed explanation of the format, and two volumes worth of sample data:

CAP Samples and Format Documentation

Obtaining Real Data

This data, with some temporary restrictions, is available to all. Please see our project site with more information about how to access the API, or get bulk access to the data:

https://case.law/

Reporting Data Errors

This is a living, breathing corpus of data. While we've taken great pains to ensure its accuracy and integrity, two large components of this project, namely OCR and human review, are utterly fallible. When we were designing Capstone, we knew that one of its primary functions would be to facilitate safe, accountable updates. If you find any errors in the data, we would be extraordinarily grateful for your taking a moment to create an issue in this GitHub repository's issue tracker to report it. If you notice a large pattern of problems that would be better fixed programmatically, or have a very large number of modifications, describe it in an issue. If we need more information, we'll ask. We'll close the issue when the issue has been corrected.

Errata

These are known issues — there's no need to file an issue if you come across one of these.

  • Missing Judges Tag: In many volumes, elements which should have the tag name <judges> instead have the tag name <p>. We're working on this one.
  • Nominative Case Citations: In many cases that come from nominative volumes, the citation format is wrong. We hope to have this corrected soon.
  • Jurisdictions: Though the jurisdiction values in our API metadata entries are normalized, we have not propagated those changes to the XML.
  • Court Name: We've seen some inconsistencies in the court name. We're trying to get this normalized in the data, and we'll also publish a complete court name list when we're done.
  • OCR errors: There will be OCR errors on nearly every page. We're still trying to figure out how best to address this. If you've got some killer OCR correction strategies, get at us.

The Capstone Application

Capstone is a Django application with a PostgreSQL database which stores and manages the non-image data output of the CAP project. This includes:

  • Original XML data
  • Normalized metadata extracted from the XML
  • External metadata, such as the Reporter database
  • Changelog data, tracking changes and corrections

CAPAPI

CAPAPI is the API with which users can access CAP data.

Installing Capstone and CAPAPI

Hosts Setup

Add the following to /etc/hosts:

127.0.0.1       case.test
127.0.0.1       api.case.test
127.0.0.1       cite.case.test

Manual Local Setup

  1. Install global system requirements
  2. Clone the repository
  3. Set up python virtualenv
  4. Install application requirements
  5. Set up the postgres database and load test data
  6. Running the capstone server

1. Install global system requirements

  • Python 3.7— While there shouldn't be any issues with using a more recent version, we will only accept PRs that are fully compatible with 3.7.
  • MySQL— On Macs with homebrew, the version installed with brew install mysql works fine. On Linux, apt-get does the job
  • Redis— (Instructions)
  • Postgres > 9.5— (Instructions) For Mac developers, Postgres.app is a nice, simple way to get an instant postgres dev installation.
  • Git— (Instructions)

2. Clone the repository

$ git clone https://github.com/harvard-lil/capstone.git

3. Set up Python virtualenv (optional)

$ cd capstone/capstone  # move to Django subdirectory
$ mkvirtualenv -p python3 capstone

4. Install application requirements

(capstone)$ pip install -r requirements.txt

This will make a virtualenv entitled "capstone." You can tell that you're inside the virtualenv because your shell prompt will now include the string (capstone).

5. Set up the postgres database and load test data

(capstone)$ psql -c "CREATE DATABASE capdb;"
(capstone)$ psql -c "CREATE DATABASE capapi;"
(capstone)$ psql -c "CREATE DATABASE cap_user_data;"
(capstone)$ fab init_dev_db  # one time -- set up database tables and development Django admin user, migrate databases
(capstone)$ fab ingest_fixtures  # load in our test data

6. Running the capstone server

(capstone)$ fab run      # start up Django server

Capstone should now be running at 127.0.0.1:8000.

Docker Setup

We support local development via docker compose. Docker setup looks like this:

$ docker-compose up -d
$ docker-compose exec db psql --user=postgres -c "CREATE DATABASE capdb;"
$ docker-compose exec db psql --user=postgres -c "CREATE DATABASE capapi;"
$ docker-compose exec db psql --user=postgres -c "CREATE DATABASE cap_user_data;"
$ docker-compose exec web fab init_dev_db
$ docker-compose exec web fab ingest_fixtures
$ docker-compose exec web fab import_web_volumes
$ docker-compose exec web fab run

Capstone should now be running at 127.0.0.1:8000.

If you are working on frontend, you probably want to run yarn as well. In a new shell:

$ docker-compose exec web yarn serve

Tip— these commands can be shortened by adding something like this to .bash_profile:

alias d="docker-compose exec"
alias dfab="d web fab"
alias dyarn="d web yarn"

Or:

alias d="docker-compose exec web"

And then:

$ d fab 
$ d yarn serve

Tip- If docker-compose up -d takes too long to run, you might consider the following:

$ cp docker-compose.override.yml.example docker-compose.override.yml

This override file will point the elasticsearch service to a hello-world image instead of its real settings. Use this override file to override more settings for your own development environment.

Administering and Developing Capstone

Testing

We use pytest for tests. Some notable flags:

Run all tests:

(capstone)$ pytest

Run one test:

(capstone)$ pytest -k test_name

Run tests without capturing stdout, to allow debugging with pdb:

(capstone)$ pytest -s

Run tests in parallel for speed:

(capstone)$ pytest -n <number of processes>

Requirements

Top-level requirements are stored in requirements.in. After updating that file, you should run

(capstone)$ fab pip-compile

to freeze all subdependencies into requirements.txt.

To ensure that your environment matches requirements.txt, you can run

(capstone)$ pip-sync

This will add any missing packages and remove any extra ones.

Applying model changes

Use Django to apply migrations. After you change models.py:

(capstone)$ ./manage.py makemigrations

This will write a migration script to cap/migrations. Then apply:

(capstone)$ fab migrate

Stored Postgres functions

Some Capstone features depend on stored functions that allow Postgres to deal with XML and JSON fields. See set_up_postgres.py for documentation.

Running Command Line Scripts

Command line scripts are defined in fabfile.py. You can list all available commands using fab -l, and run a command with fab command_name.

Local debugging tools

django-extensions is enabled by default, including the very handy ./manage.py shell_plus command.

django-debug-toolbar is not automatically enabled, but if you run pip install django-debug-toolbar it will be detected and enabled by settings_dev.py.

Model versioning

For database versioning we use the Postgres temporal tables approach inspired by SQL:2011's temporal databases.

See this blog post for an explanation of temporal tables and how to use them in Postgres.

We use django-simple-history to manage creation, migration, and querying of the historical tables.

Data is kept in sync through the temporal_tables Postgres extension and the triggers created in our scripts/set_up_postgres.py file.

Installing the temporal_tables extension is recommended for performance. If not installed, a pure postgres version will be installed by set_up_postgres.py; this is handy for development.

Download real data locally

We store complete fixtures for about 1,000 cases in the case.law downloads section.

You can download and ingest all volume fixtures from that section with the command fab import_web_volumes, or ingest a single volume downloaded from that section with the command fab import_volume:some.zip.

Working with javascript

We use Vue CLI 3 to compile javascript files, so you can use modern javascript and it will be transpiled to support the browsers listed in package.json. New javascript entrypoints can be added to vue.config.js and included in templates with {% render_bundle %}.

If you want to edit javascript files, you will need to install node and the package.json javascript packages:

$ brew install node
$ npm install -g yarn
$ yarn install --frozen-lockfile

You can then run the local javascript development server in a separate terminal window, or in the background:

$ yarn serve

This will cause javascript files to be loaded live from http://127.0.0.1:8080/ and recompiled on save in the background. Your changes should be present at http://127.0.0.1:8000.

Installing node and running yarn serve is not necessary unless you are editing javascript. On a clean checkout, or after shutting down yarn serve and running yarn build, the local dev server will use the compiled production assets. Under the hood, use of the local dev server vs. production assets is controlled by the contents of webpack-stats.json.

Installing packages: You can install new packages with:

$ yarn add --dev package

After changing package.json or yarn.lock, you should run fab update_docker_image_version to ensure that docker users get the updates, but note that Circle CI will take care of building JS assets and updating the Docker image version.

Yarn and docker: yarn will also work inside docker-compose:

$ docker-compose run web yarn build

yarn packages inside docker are stored in /node_modules. The /app/capstone/node_modules folder is just an empty mount to block out any node_modules folder that might exist on the host.

Elasticsearch

For local dev, Elasticsearch will automatically be started by docker-compose up -d. You can then run fab refresh_case_body_cache to populate CaseBodyCache for all cases, and fab rebuild_search_index to populate the search index.

For debugging, see settings.py.example for an example of how to log all requests to and from Elasticsearch.

It may also be useful to run Kibana to directly query Elasticsearch from a browser GUI:

$ brew install kibana
$ kibana -e http://127.0.0.1:9200

You can then go to Kibana -> Dev Tools to run any of the logged queries, or GET /_mapping to see the search indexes.

Documentation

This readme, code comments, and the API usage docs are the only docs we have. If you want something documented more thoroughly, file an issue and we'll get back to you.

Examples

See the CAP examples repo for some ideas about getting started with this data.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].