All Projects → FengBli → CAIL2018-toy

FengBli / CAIL2018-toy

Licence: other
The final teamwork of data mining course, CAIL-2018 competition. NOTE: this is just quite SIMPLE and TRIVIAL code.

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to CAIL2018-toy

Keras-Application-Zoo
Reference implementations of popular DL models missing from keras-applications & keras-contrib
Stars: ✭ 31 (+34.78%)
Mutual labels:  ml
100-days-of-ai
人工智能 100 天
Stars: ✭ 14 (-39.13%)
Mutual labels:  ml
veridical-flow
Making it easier to build stable, trustworthy data-science pipelines.
Stars: ✭ 28 (+21.74%)
Mutual labels:  ml
yggdrasil-decision-forests
A collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.
Stars: ✭ 156 (+578.26%)
Mutual labels:  ml
pico-ml
A toy programming language which is a subset of OCaml.
Stars: ✭ 36 (+56.52%)
Mutual labels:  ml
dashboard
Project for managing ML model and deploying ML module. It can deploy the Rekcurd service to Kubernetes cluster.
Stars: ✭ 27 (+17.39%)
Mutual labels:  ml
PuzzleLib
Deep Learning framework with NVIDIA & AMD support
Stars: ✭ 52 (+126.09%)
Mutual labels:  ml
awesome-AI-kubernetes
❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+313.04%)
Mutual labels:  ml
go-tensorflow
Tools and libraries for using Tensorflow (and Tensorflow Serving) in go
Stars: ✭ 25 (+8.7%)
Mutual labels:  ml
urb-studies-predicting-gentrification
This repo is intended to support replication and exploration of the analysis undertaken for our Urban Studies article "Understanding urban gentrification through Machine Learning: Predicting neighbourhood change in London".
Stars: ✭ 35 (+52.17%)
Mutual labels:  ml
MixingBear
Package for automatic beat-mixing of music files in Python 🐻🎚
Stars: ✭ 73 (+217.39%)
Mutual labels:  ml
odahu-flow
No description or website provided.
Stars: ✭ 12 (-47.83%)
Mutual labels:  ml
vertex-ai-samples
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud
Stars: ✭ 270 (+1073.91%)
Mutual labels:  ml
vs-mlrt
Efficient ML Filter Runtimes for VapourSynth (with built-in support for waifu2x, DPIR, RealESRGANv2, and Real-CUGAN)
Stars: ✭ 34 (+47.83%)
Mutual labels:  ml
MLSummerSchool
Материалы факультатива по машинному обучению и искусственному интеллекту
Stars: ✭ 27 (+17.39%)
Mutual labels:  ml
deepchecks
Test Suites for Validating ML Models & Data. Deepchecks is a Python package for comprehensively validating your machine learning models and data with minimal effort.
Stars: ✭ 1,595 (+6834.78%)
Mutual labels:  ml
CoreML-samples
Sample code for Core ML using ResNet50 provided by Apple and a custom model generated by coremltools.
Stars: ✭ 38 (+65.22%)
Mutual labels:  ml
spring-cloud-gcp-guestbook
No description or website provided.
Stars: ✭ 55 (+139.13%)
Mutual labels:  ml
yt-channels-DS-AI-ML-CS
A comprehensive list of 180+ YouTube Channels for Data Science, Data Engineering, Machine Learning, Deep learning, Computer Science, programming, software engineering, etc.
Stars: ✭ 1,038 (+4413.04%)
Mutual labels:  ml
lab
A lightweight command line interface for the management of arbitrary machine learning tasks
Stars: ✭ 17 (-26.09%)
Mutual labels:  ml

2018中国“法研杯”法律智能挑战赛 CAIL2018

1. Official Website

2018中国‘法研杯’法律智能挑战赛

2. Time nodes

  • 第一阶段(2018.05.15-2018.07.14):
    • ~ 6月 5日,基于Small数据的模型提交截至。向评测结果高于基准算法成绩的团队发布Large数据
    • ~ 6月12日,基于Large-test数据对前期模型进行重新评测刷榜
    • ~ 7月14日,最终模型提交截至。
  • 第二阶段(2018.07.14-2018.08.14):
    • 主办方根据一个月的新增数据对最终模型进行封闭评测

3. Notice

3.1. Necessary adjustment

在将本项目代码clone或download到本地运行时,需要对如下文件处做简单修改:

  • ./predictor中创建model/目录(github上无法上传空文件夹)
  • ./utils/util.py中的第9行DATA_DIR,改为本地数据文件所在目录
  • 运行./test.py前,将第11行改为测试文件所在目录,第12行改为测试输出结果存放目录
  • 运行./score.py前,将第187行改为上述测试文件所在目录,第188行改为测试输出结果存放目录

3.2. Requirement

  • Language Environment

    • Python 3.5
  • Packages

    • jieba
    • pandas
    • sklearn

3.3. Unfinished Parts

  • ./preprocess/*

4. Updates

2018-05-18 [feng]

  • 数据文件太大,将文件夹从项目中删除
  • 默认数据目录为../data/CAIL2018-small-data,见util.py文件DATA_DIR常量
  • 使用清华中文分词工具thulac-python
  • thulac分词工具速度过慢,暂时使用jieba,后续可以考虑C++版本的各种分词工具
  • Notice:法条预测中,有些案件对应多个法条
  • 添加util.py文件
  • 添加preprocess.py文件,对数据进行中文分词,整合json2csv文件函数
  • 添加stopwords.txt文件,来源GitHub · stopwords-iso/stopwords-zh

2018-05-26 [feng]

  • 使用jieba多线程分词
  • 导入从搜狗词库下载的法律词典
  • 删除CODE_OF_CONDUCT.md文件
  • 添加dictionary/文件夹,包含用户词典及由.scel(搜狗的用户词典文件)文件解码处理的代码
  • 修正util.py中的24行的一处bug

2018-05-28 [feng]

  • 重新组织代码结构,依照官方提供svm_baseline代码
  • 删除preprocess.py
  • 添加train.py文件, ./predictor/目录等

2018-06-01 [feng]

  • 重新组织代码结构:
    • uti.py,law.txt, accu.txt, userdict.txt等文件均放入./utils/目录下
    • 现有的./predictor/目录在模型训练完后,即可直接打包提交
    • 添加本地测试与跑分文件:./test.py./score.py

5. TODOs

  • 考虑将停用词处理放入TD-IDF模型内部
  • 人工对分词结果进行适当修正
  • 对数据进行预分析,即./preprocess/目录下相关内容

6. Scores

0 SVM baseline on small-data

task-1 task-2 task-3 total-score
71.83 68.79 47.83 188.45

1st upload using linearSVC

succeeded after 8 stupid attempts by @FengBlil

date: 05-31

task-1 task-2 task-3 total-score
72.92 69.43 52.56 194.92

2nd upload using RandomForestClassifier

date: 06-01

task-1 task-2 task-3 total-score
62.20 59.99 48.73 170.92

7. Members

Team Members:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].