ML-Plan is an AILibs library
ML-Plan is a free software library for automated machine learning. It can be used to optimize machine learning pipelines in WEKA or scikit-learn.
When publishing articles in which you mention ML-Plan, please cite the following paper:
Felix Mohr, Marcel Wever, and Eyke Hüllermeier. “ML-Plan: Automated machine learning via hierarchical planning”, Machine Learning, 2018.
@article{DBLP:journals/ml/MohrWH18,
author = {Felix Mohr and Marcel Wever and Eyke H{\"{u}}llermeier},
title = {ML-Plan: Automated machine learning via hierarchical planning},
journal = {Machine Learning},
volume = {107},
number = {8-10},
pages = {1495--1515},
year = {2018},
url = {https://doi.org/10.1007/s10994-018-5735-z},
doi = {10.1007/s10994-018-5735-z},
timestamp = {Wed, 01 Aug 2018 13:10:15 +0200}
}
You can bind in ML-Plan via a Maven dependency (using Maven central as repository).
<dependency>
<groupId>ai.libs</groupId>
<artifactId>mlplan-full</artifactId>
<version>0.2.5</version>
</dependency>
dependencies {
implementation 'ai.libs:mlplan-full:0.2.5'
}
If you only need specific projects, you can also install only mlplan-weka
, mlplan-sklearn
, or mlplan-meka
.
The full distribution comes with an CLI and some GUI plugins.
ML-Plan works with the api4.org dataset specification, which has native support for ARFF files and openml:
ILabeledDataset d1 = OpenMLDatasetReader.deserializeDataset(3);
ILabeledDataset d2 = ArffDatasetAdapter.readDataset(new File("path/to/your/arff"));
If you have your data loaded in an object called data
, the shortest way to obtain an optimized classifier via ML-Plan for your data is to run
IClassifier c = new MLPlanWekaBuilder().withDataset(data).build().call()
Analogously, to create a classifier based on sklearn, you can call
ScikitLearnWrapper c = MLPlanScikitLearnBuilder.forClassification().withDataset(data).build().call();
Here, several default parameters apply that you may usually want to customize.
This is just a quick overview of the most important configurations of ML-Plan.
The general process is to make the configurations in the builder, build ML-Plan, and invoke its call
method:
AMLPlanBuilder builder = ... // see below how to get your builder
/* configure the builder */
...
MLPlan mlplan = builder.build();
ISupervisedLearner learner = mlplan.call();
Depending on the concrete context (WEKA, sklearn, classification/regression, etc.), the involved types are more specific than the above general types.
Depending on the library you want to work with, you then can construct a WEKA or scikit-learn related builder for ML-Plan. Both builders have the same basic capacities (and only these are needed for the simple example below). For library-specific aspects, there may be additional methods for the respective builders.
Note that ML-Plan for scikit-learn is also Java-based, i.e. we do not have a Python version of ML-Plan only for being able to cope with scikit-learn. Instead, ML-Plan can be configured to work with scikit-learn as the library to be used.
If you are interested in standard classification tasks such as binary or multinomial classification, create an ML-Plan builder as follows.
MLPlanWekaBuilder builder = MLPlanWekaBuilder.forClassification();
If you have a regression problem instead you may be get an appropriate ML-Plan builder like this:
MLPlanWekaBuilder builder = MLPlanWekaBuilder.forRegression();
ML-Plan for scikit-learn can also be instantiated for both classification, and regression. The way the builders are obtained are analogous to how we create these for WEKA:
Use
MLPlanScikitLearnBuilder builder = MLPlanScikitLearnBuilder.forClassification();
for obtaining an ML-Plan builder to work with scikit-learn to tackle standard classification problems and
MLPlanScikitLearnBuilder builder = MLPlanScikitLearnBuilder.forRegression();
to obtain a builder which pre-configures the builder ready for tackling regression problems with scikit-learn.
Note: If you want to use ML-Plan for scikit-learn, then ML-Plan assumes Python 3.5 or higher to be active (invoked when calling python
on the command line), and the following packages must be installed:
liac-arff
,
numpy
,
json
,
pickle
,
os
,
sys
,
warnings
,
scipy
,
scikit-learn
,
tpot
,
pandas
,
xgboost
.
Please make sure that you really have liac-arff
installed, and not the arff
package.
If you want to be sure that ML-Plan uses the correct python installation on your machine, you can create a file conf/python.properties
in the ML-Plan folder with the following layout:
path = <folder where your python3 binary resides>
pythonCmd = <optional: name of your python3 binary>
MLPlanMEKABuilder builder = AbstractMLPlanBuilder.forMEKA();
Note: Datasets, i.e. Instances objects, have to be loaded according to MEKA’s conventions. More specifically, in order to use Instances for multi-label classification the labels have to appear in the first columns and the class index marks the number existing labels (starting to count from the first column). The dataset preparation can be conveniently achieved as follows.
Instances myDataset = new Instances(new FileReader(new File("my-dataset-file.arff")));
MLUtils.prepareData(myDataset);
/* set the number of CPUs allowed for search to 4 */
builder.withNumCpus(4);
With the builder
variable being configured as above, you can specify timeouts for ML-Plan as a whole, as well as timeouts for the evaluation of a single solution candidate or nodes in the search.
By default, all these timeouts are set to 60 seconds.
/* set the global timeout of ML-Plan to 1 hour: */
builder.withTimeOut(new Timeout(3600, TimeUnit.SECONDS));
/* set the timeout of a node in the search graph (evaluation of all random completions of a node): */
builder.withNodeEvaluationTimeOut(new Timeout(300, TimeUnit.SECONDS));
/* set the timeout of a single solution candidate */
builder.withCandidateEvaluationTimeOut(new Timeout(300, TimeUnit.SECONDS));
builder.withPortionOfDataReservedForSelection(.3); // use 30% of the data for selection
builder.withPortionOfDataReservedForSelection(.0); // disable selection phase
builder.withMCCVBasedCandidateEvaluationInSearchPhase(3, .8); // use 3 repetitions with 80%/20% splits each
If you want to use ML-Plan with the typical types of the WEKA library such as Instances
and Classifier
, you can use the WekaInstances
wrapper class, and obtain the classifier object directly from ML-Plan:
Instances dataset = new Instances(new FileReader("path/to.arff")); // set the class index appropriately
Classifier c = new MLPlanWekaBuilder().withDataset(new WekaInstances(dataset)).build().call().getClassifier();
JavaDoc is available here.
ML-Plan is currently developed in the softwareconfiguration folder of AILibs on github.com.
We welcome contributions to ML-Plan given that the code quality meets our standards. If you would like to add changes to ML-Plan, feel free to create a pull request on the `dev` branch.
Please consider the following: