Overview

Introduction
PatternExplorer is an open-source graphical interface for visual discovery of patterns in data. The software was developed by Caleb Sotelo under the direction of Charles Elkan at the University of California, San Diego. PatternExplorer is implemented as an operator plug-in for the popular data mining suite RapidMiner, which is freely available from Rapid-I.

The goal of PatternExplorer is to allow users to quickly discover knowledge visually. This is accomplished by highlighting statistically significant correlations and via various interactive features.

Features
PatternExplorer takes any data-set as input and produces a matrix of histograms. There is one row for each value of a target attribute, and one column for each other attribute. The matrix (dark blue + red histograms) represents the distribution of values for all attributes over the values of the target:

The PatternExplorer visualization running in RapidMiner. Click for enlarged view.
Light blue histograms represent overall distributions for attribute values (top) and target values (left). Histogram bars in differing significantly from their parent distribution at the top of the column are painted red. Strong differences suggest that the column's attribute is correlated with the target outcome in that histogram's row. This highlighting allows users to quickly identify potentially predictive attributes for the target.

User Interactions:
  1. Change target - Change the target to any attribute in the data set and the matrix is redisplayed accordingly.
  2. Drill-down - Remove a target value and all examples for which the target takes on that value will be excluded from the view. Focus on a target value and only examples for which the target takes on that value will be included.
  3. Change z-score threshold - Slide the z-score bar to change the threshold at which a bar is considered significant.
  4. Expand/Contract Columns - Show/hide values for attributes with many values.  The compact view shows only the most frequent values.
  5. Zoom/pan - Zoom from 10% to 1000%.
  6. Detailed view - Select any histogram from the view and a detailed version is displayed in the side panel.
  7. Browse history - Navigate all changes to the view in web-browser style. Select history browsing options that economize for memory or time.
Additional features include built-in missing value replenishment, numeric attribute discretization, and multi-core support.See the paper (draft) for full details about the software.