# MIToobox for C and MATLAB

This toolbox is aimed at people working on discrete datasets for classification. All functions expect discrete inputs. It provides implementations of Shannon's Information Theory functions and implementations of Renyi's Entropy and Alpha Divergence. Versions from 2.0 include weighted information theory functions based upon the work of S. Guiasu from "Information Theory with Applications" (1977). The toolbox was developed to support our research into feature selection algorithms and includes some sample feature selection algorithms from the literature to illustrate its use. Updated versions of the demonstration algorithms are provided (with many others) in the FEAST toolbox we developed to support our research. A Java version of MIToolbox was ported from the C code, and is available here.

MIToolbox works on discrete inputs, and all continuous values must be discretised before use with MIToolbox. Real-valued inputs will be discretised with x = floor(x) to ensure compatibility. MIToolbox produces unreliable results when used with continuous inputs, runs slowly and uses much more memory than usual. The discrete inputs should have small cardinality, MIToolbox will treat values {1,10,100} the same way it treats {1,2,3} and the latter will be both faster and use less memory. This limitation is due to the difficulties in estimating information theoretic functions of continuous variables.

Note: all functions are calculated in log base 2, so return units of "bits".

#### Contains functions for:

• Calculating Entropy, H(X)
• Calculating Conditional Entropy, H(X|Y)
• Calculating Mutual Information, I(X;Y)
• Calculating Conditional Mutual Information, I(X;Y|Z)
• Generating a joint random variable
• Calculating Renyi's Alpha Entropy, H_{\alpha}(X)
• Calculating Renyi's Alpha Mutual Information, I_{\alpha}(X;Y)
• Calculating the Weighted Entropy, H_w(X)
• Calculating the Weighted Conditonal Entropy, H_w(X|Y)
• Calculating the Weighted Mutual Information, I_w(X;Y)

#### MATLAB Examples:

 $y = [1 1 1 0 0]';$ x = [1 0 1 1 0]'; $mi(x,y) %% mutual information I(X;Y) ans = 0.0200$ h(x) %% entropy H(X) ans = 0.9710 $condh(x,y) %% conditional entropy H(X|Y) ans = 0.9510$ h( [x,y] ) %% joint entropy H(X,Y) ans = 1.9219 \$ joint([x,y]) %% joint random variable XY ans = [1,2,1,3,4]'; 

Also provided are example implementations of 3 feature selection algorithms (CMIM, DISR, mRMR-D) which use the functions provided by MIToolbox. These example algorithms are provided in two forms, one coded in MATLAB and one coded in C using the MATLAB mex interface. The library is written in ANSI C for compatibility with the MATLAB mex compiler.

Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection
Gavin Brown, Adam Pocock, Ming-Jie Zhao, and Mikel Luján
Journal of Machine Learning Research (JMLR). Volume 13, Pages 27-66, 2012. Link.

#### Compilation instructions:

• MATLAB/OCTAVE - run matlab/CompileMIToolbox.m
• Linux/Mac OSX C shared library - use the included makefile

ZIP TAR
MLOSS Project Page

#### Repository:

I've hosted the source on GitHub here for anyone who wishes to browse through it. It will be updated approximately the same time as this page.

#### Update History

• 07/01/2017 - v3.0.0 - Refactored internals to expose integer information theoretic calculations.
• 10/01/2016 - v2.1.2 - Relicense from LGPL to BSD. Added checks to ensure input MATLAB types are doubles.
• 02/02/2015 - v2.1.1 - Fixed up the Makefile so it installs the headers too.
• 22/02/2014 - v2.1 - Fixed a couple of bugs related to memory handling. Added a make install for compatibility with PyFeast.
• 30/08/2012 - v2.00 - Released the weighted information theory functions.
• 08/11/2011 - v1.03 - Minor documentation changes to accompany the JMLR publication.
• 15/10/2010 - v1.02 - Fixed bug where MIToolbox would cause a segmentation fault if a x by 0 empty matrix was passed in. Now prints an error message and returns gracefully
• 02/09/2010 - v1.01 - Updated CMIM.m in demonstration_algorithms, due to a bug where the last feature would not be selected first if it had the highest MI
• 07/07/2010 - v1.00 - Initial Release