RuDaS: Synthetic dataset generation code and evaluation tools for ILP

RuDaS (Synthetic Datasets for Rule Learning), is a tool for generating synthetic datasets containing both facts and rules, and for evaluating rule learning systems, that overcomes the shortcomings of existing datasets and proper evaluation methods.

RuDaS is highly parameterizable; for instance, number of constants, predicates, facts, consequences of rules (i.e., completeness) amount of noise (e.g., wrong or missing facts) and kinds of dependencies between rules can be selected.

Moreover, RuDaS allows for assessing the performance of rule learning systems by computing classical and more recent metrics, including a new one that we introduce.

In this repository there is also the code (see experiments/README) we used to evaluate representatives of different types of rule learning systems on our datasets demonstrating the necessity of having a diversified portfolio of datasets to help revealing the variety in the capabilities of the systems and thus also to support and help researchers in developing and optimizing new/existing approaches.

Paper & Slides:

ILP publication
Slides presented at ILP @ IJCLR 2021 here
arXiv preprint

How to cite:

@inproceedings{cornelio_thost_rudas,
  author={Cristina Cornelio and Veronika Thost},
  Booktitle = {Proceedings of the {30th} International Conference on Inductive Logic Programming, ILP2020-21 @ IJCLR},
  title={Synthetic Datasets and Evaluation Tools for Inductive Neural Reasoning},
  Year = {2021}}

Requirements:

Python 3

Experiments:

See experiments/README for additional requirements for running the experiments

Available Datasets Description

Example of data

Rules.

p3(X0,X1) :- p7(X1,X0).
p7(X0,X2) :- p6(X0,X1), p6(X1,X2).
p7(X1,X0) :- p9(X3,X1), p9(X1,X0).

Facts.

p9(c127,c381).
p6(c324,c291).
p3(c363,c354).
p7(c61,c96).
...

RuDaS.v0

The datasets described below (see paper for more details) can be found here: dataset1 and dataset2.

#	type	Size	Depth	#Rules	#Rules	#Rules	#Facts	#Facts	#Facts	#Pred	#Pred	#Pred	#Const	#Const	#Const
				min	avg	max	min	avg	max	min	avg	max	min	avg	max
10	CHAIN	S	2	2	2	2	51	74	95	5	7	9	31	47	71
10	CHAIN	S	3	3	3	3	49	70	97	7	8	9	31	43	64
10	CHAIN	M	2	2	2	2	168	447	908	9	10	11	97	259	460
10	CHAIN	M	3	3	3	3	120	508	958	8	10	11	52	230	374
22	RDG	S	2	3	3	3	49	84	122	6	9	11	28	50	84
12	RDG	S	3	4	5	6	56	104	172	8	10	11	41	55	75
22	RDG	M	2	3	3	3	200	646	1065	6	11	11	71	370	648
22	RDG	M	3	4	5	7	280	613	1107	10	11	11	149	297	612
22	DRDG	S	2	3	4	5	60	100	181	6	9	11	29	55	82
12	DRDG	S	3	4	7	11	58	144	573	8	10	11	34	58	89
22	DRDG	M	2	3	4	5	149	564	1027	10	11	11	88	327	621
22	DRDG	M	3	4	7	12	111	540	1126	10	11	11	70	284	680

Dataset generation code

rules can be arbitrary long and with n-ary predicates
anonymous constants and predicates = constants c1 c2 .. , predicates p1 p2 ..
format: prolog standard, 2 different files type: one for facts one for rules
parameters:
- number of constants
- number of predicates
- min/max arity of predicates
- number of rules
- maximal length of rules
- number of reasoning steps (depth of the tree or number of total steps)
- connected components rules category
- min/max number of connected components
- maximal depth of rule graphs
- dataset size: S, M, L, XL
- open-world degree n_OW in [0, 1]
- amount of noise in the data nNoise+ , nNoise- in [0, 1]
Noise:
- adding fact that are not necessary to prove the goal: nNoise+
- removing support facts: nNoise-
- removing consequences facts: n_OW
categories to show capabilities of the ILP system:
- Chain -> h:-b1,b2. b1:-a1,a2. a1:-c1,c2
- Rooted Directed Graph (DG) -> h:-b1,b2. b1:-a1,a2. b2:-c1,c2. a1:-d1,d2. a2:-d3,d4. ...
- Disjunctive Rooted DG -> different rules same head: h:-b1,b2. h:-a1,a2. h:-c1,c2.
- Mixed -> mix of the above.
- All of them can have recursion -> h(X):-h(Y),b1(X,Y). b1(X,Y):-b1(Z,Y),b2(X,Z)

Evaluation Tools for ILP systems

Evaluation tool to compute distances between logic programs:

Herbrand distance: the traditional distance between Herbrand models; two normalized versions of the Herbrand distance
Herbrand accuracy: (H-accuracy), Herbrand distance normalized on the Herbrand base
Herbrand score: (H-score), a new metric we propose in this paper;
Accuracy
Precision (or standard confidence)
Recall
F1-score
Rule-score: a new computationally efficient measure that consider only the induced rules and not the grounded atoms.

Predicate invention is not penalized in the evaluation.

Future extensions

Probabilistic dataset: generate probabilistic facts and/or rules
More expressive logic: for example full first order or higher order

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

IBM / RuDaS

Programming Languages