All Projects → cverluise → PatCit

cverluise / PatCit

Licence: MIT license
Making Patent Citations Uncool Again

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to PatCit

old nesta daps
[archived]
Stars: ✭ 16 (-80.95%)
Mutual labels:  innovation, economics
mxfactorial
a payment application intended for deployment by the united states treasury
Stars: ✭ 36 (-57.14%)
Mutual labels:  science, economics
Lab.js
Online research made easy
Stars: ✭ 140 (+66.67%)
Mutual labels:  science, economics
PH5
Library of PH5 clients, apis, and utilities
Stars: ✭ 14 (-83.33%)
Mutual labels:  science
HackTheStacks
The 3rd Annual American Museum of Natural History Hackathon produced by the BridgeUP: STEM program
Stars: ✭ 32 (-61.9%)
Mutual labels:  science
SHARE
SHARE is building a free, open, data set about research and scholarly activities across their life cycle.
Stars: ✭ 93 (+10.71%)
Mutual labels:  science
typhon
Tools for atmospheric research
Stars: ✭ 47 (-44.05%)
Mutual labels:  science
hacker-laws-tr
💻📖 Programcıların faydalı bulacağı yasalar, teoriler, prensipler ve desenler. #hackerlaws
Stars: ✭ 810 (+864.29%)
Mutual labels:  science
scitizen
Scitizen - Help scientific research for the benefit of mankind and humanity 🔬
Stars: ✭ 21 (-75%)
Mutual labels:  science
Benzina
Benzina is an image-loader package that greatly accelerates image loading onto GPUs using their built-in hardware codecs.
Stars: ✭ 36 (-57.14%)
Mutual labels:  science
edgar-crawler
Download financial reports from SEC's EDGAR. Extract clean textual data from specific item sections and bootstrap your financial NLP research. Software from the research paper published in ECONLP 2021.
Stars: ✭ 71 (-15.48%)
Mutual labels:  economics
libmol
Single Page Web Application for displaying and studying molecular models
Stars: ✭ 29 (-65.48%)
Mutual labels:  science
peakutils
PeakUtils mirror from bitbucket.
Stars: ✭ 25 (-70.24%)
Mutual labels:  science
brian2cuda
A brian2 extension to simulate spiking neural networks on GPUs
Stars: ✭ 46 (-45.24%)
Mutual labels:  science
BeaData.jl
A Julia interface for retrieving data from the Bureau of Economic Analysis (BEA).
Stars: ✭ 17 (-79.76%)
Mutual labels:  economics
galaksio
An easy-to-use way for running Galaxy workflows.
Stars: ✭ 19 (-77.38%)
Mutual labels:  science
Git-for-bio-scientists
Presentation about digital lab journalling with Git
Stars: ✭ 30 (-64.29%)
Mutual labels:  science
dualnback
In n-back task you need to remember n previous spatial or auditory stimuli. N-back is a memory test where n refers on how many previous stimuli must be remembered. Dual means that verbal auditory stimulus and spatial visual stimulus are presented at the same time and must be remembered separately.
Stars: ✭ 22 (-73.81%)
Mutual labels:  science
wikirepo
Python based Wikidata framework for easy dataframe extraction
Stars: ✭ 33 (-60.71%)
Mutual labels:  economics
bac-genomics-scripts
Collection of scripts for bacterial genomics
Stars: ✭ 39 (-53.57%)
Mutual labels:  science

patCit

Documentation DOI

Building a comprehensive dataset of patent citations

👩‍🔬 Exploring the universe of patent citations has never been easier. No more complicated data set-up, memory issue and queries running for ever, we host patCit on BigQuery for you.

🤗 patCit is community driven and benefits from the suppport of a reactive team who is eager happy to help and tackle your next request. This is where academics and industry practitioners meet.

🔮 patCit is based on state-of-the-art open source projects and libraries such as grobid/biblio-glutton and spaCy. Even better, patCit is continuously improving with the rest of its ecosystem.

🎓 Want to know more? Read patCit academic presentation or dive into usage and technical guides on patCit documentation website.

💌 Receive project updates in your mails/gitHub feed, join the patCit newsletter and star the repository on gitHub.

What will you find in patCit?

Patents are at the crossroads of many innovation nodes: science, open knwoledge, products, competition, etc. At patCit, we are building a comprehensive dataset of patent citations to help the community explore this terra incognita. patCit is:

  • 🌎 worlwide coverage
  • 📄 & 📚 front-page and in-text citations
  • 🌈 all sorts of documents, not just scientific articles

💡 How we do? We use recent progress in Natural Language Processing (NLP) to extract and structure citations into actionable piece of information.

Front-page

patCit builds on DOCDB, the largest database of Non Patent Literature (NPL) citations. First, we deduplicate this corpus and organize it into 10 categories. Then, we design and apply category specific information extraction models using spaCy. Eventually, when possible, we enrich the data using external domain specific high quality databases.

Category Classification Information extraction Enrichment BigQuery table Colab notebook
Bibliographical reference

🔜

Office action

Patent

Search report

Product documentation

Norm & standard

Open In Colab
Webpage

Database

🔜

Litigation

Wiki

Open In Colab
All

NR

Open In Colab

In-text

patCit builds on Google Patents corpus of USPTO full-text patents. First, we extract patent and bibliographical reference citations. Then, we parse detected in-text citations into a series of category dependent attributes using grobid[grobid. Patent citations are matched with a standard publication number using the Google Patents matching API and bibliographical references are matched with a DOI using biblio-glutton. Eventually, when possible, we enrich the data using external domain specific high quality databases.

Category Citation extraction Information extraction Enrichment BigQuery table Colab notebook
Bibliographical reference

🔜

Patents

🔜

FAIR

📍 Find - The patCit dataset is available on BigQuery in an interactive environment. For those who have a smattering of SQL, this is the perfect place to explore the data. It can also be downloaded on Zenodo.

👨‍🎓 If you are new to BigQuery and want to learn the basics of Google BigQuery (GBQ), you can take the GBQ Quickstart. This should not take more than 2 minutes and might help a lot !

📖 Access - We maintain a detailed documentation on how to access the data once you have found them on BigQuery or Zenodo. See usage notes on the patCit documentation website.

🔀 Interoperate - Interoperability is at the core of patCit ambition. We take care to extract unique identifiers whenever it is possible to enable data enrichment for domain specific high quality databases. This includes the DOI, PMID and PMCID for bibliographical references, the Technical Doc Number for standards, the Accession Number for Genetic databases, the publication number for PATSTAT and Claims, etc. See specific table for more details.

🔂 Reproduce - You are at the right place. This gitHub repository is the project factory. You can learn more about data recipes and models on the patCit documentation website.

Contributing

There are many ways to contribute to patCit, many do not include coding.

Give feedback - We want to make patCit truly useful to the community. We are thus very happy for feedback.

Share your thoughts - We believe that discussions are much more valuable if they are publicly shared. This way, everyone can benefit from it. Hence, we strongly encourage you to share your issues and request on patCit GitHub repository issue section.

Feel like coding today? - We will be more than happy to receive any contributions from you and the community. We have already started to tag some issues with good first issue and help wanted.

Team

This project was initiated by Gaétan de Rassenfosse (EPFL) and Cyril Verluise (Collège de France) in 2019.

Since then, it has benefited from the contributions of Gabriele Cristelli (EPFL), Francesco Gerotto (Sciences Po), Kyle Higham (Hitsotsubashi University) and Lucas Violon (HEC Paris).

We are also thankful to Domenico Golzio for constant support and to @leflix311, @kermitt2, Tim Simcoe (Boston University) @SuperMayo and @wetherbeei for helpful comments.

Contribution details are available in CRediT.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].