MIToobox for C and MATLAB





This toolbox is aimed at people working on discrete datasets for classification. All functions expect discrete inputs. It provides implementations of Shannon's Information Theory functions and implementations of Renyi's Entropy and Alpha Divergence. Versions from 2.0 include weighted information theory functions based upon the work of S. Guiasu from "Information Theory with Applications" (1977). The toolbox was developed to support our research into feature selection algorithms and includes some sample feature selection algorithms from the literature to illustrate its use. Updated versions of the demonstration algorithms are provided (with many others) in the FEAST toolbox we developed to support our research. A Java version of MIToolbox was ported from the C code, and is available here.

MIToolbox works on discrete inputs, and all continuous values must be discretised before use with MIToolbox. Real-valued inputs will be discretised with x = floor(x) to ensure compatibility. MIToolbox produces unreliable results when used with continuous inputs, runs slowly and uses much more memory than usual. The discrete inputs should have small cardinality, MIToolbox will treat values {1,10,100} the same way it treats {1,2,3} and the latter will be both faster and use less memory. This limitation is due to the difficulties in estimating information theoretic functions of continuous variables.

Note: all functions are calculated in log base 2, so return units of "bits".

Contains functions for:

MATLAB Examples:

$ y = [1 1 1 0 0]';
$ x = [1 0 1 1 0]';

$ mi(x,y) %% mutual information I(X;Y)
ans =
0.0200

$ h(x) %% entropy H(X)
ans =
0.9710

$ condh(x,y) %% conditional entropy H(X|Y)
ans =
0.9510

$ h( [x,y] ) %% joint entropy H(X,Y)
ans =
1.9219

$ joint([x,y]) %% joint random variable XY
ans = [1,2,1,3,4]';

Also provided are example implementations of 3 feature selection algorithms (CMIM, DISR, mRMR-D) which use the functions provided by MIToolbox. These example algorithms are provided in two forms, one coded in MATLAB and one coded in C using the MATLAB mex interface. The library is written in ANSI C for compatibility with the MATLAB mex compiler.

All MIToolbox code is licensed under the 3-clause BSD license, except the feature selection algorithms which are provided as is, with no warranty, for demonstration purposes.

If you use this toolbox for academic research please cite as:
   Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
   Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján
   Journal of Machine Learning Research (JMLR). Volume 13, Pages 27-66, 2012. Link.

Compilation instructions:

Download links:

ZIP TAR
MLOSS Project Page

Repository:

I've hosted the source on GitHub here for anyone who wishes to browse through it. It will be updated approximately the same time as this page.

Update History