Subgroup Discovery


This package was developed to assist in discovering interesting subgroups in multi-dimensional data.


The PRIM implementation is based on the 1998 paper “Bump hunting in high-dimensional data” by Jerome H. Friedman and Nicholas I. Fisher. PRIM involves finding a set of “rules” which combined imply unusually large (or small) values of some other target variable. Specifially one tries to find a set of subregions in which the target variable is substantially larger than overall mean.

The objective of bump hunting in general is to find regions in the input (attribute/feature) space with relatively high (low) values for the target variable. The regions are described by simple rules of the type if: {condition-1 & … & condition-n} then: estimated target value. Given the data (or a subset of the data), the goal is to produce a box B within which the target mean is as large as possible. There are many problems where finding such regions is of considerable practical interest.

Often these are problems where a decision maker can in a sense choose or select the values of the input variables so as to optimize the value of the target variable. In bump hunting it is customary to follow a so-called covering strategy. This means that the same box construction (rule induction) algorithm is applied sequentially to subsets of the data.


Developed by Jurian Baas, part of master thesis in Artificial Intelligence.

Special thanks for contributions such as suggestions and bug-fixing to Dr. A.J. Feelders, Utrecht University, Department of Information and Computing Sciences


  1. Friedman, Jerome H., and Nicholas I. Fisher. “Bump hunting in high-dimensional data.” Statistics and Computing 9.2 (1999): 123-143.


To cite this package, use citation(“subgroup.discovery”) in R


This package is licenced under GPL-3

Copyright (C) 2017  Jurian Baas

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see