Intelligent Discovery Assistant
In order to improve the IDA Extensions in future versions, we kindly ask you to participate in our usability survey.
The Intelligent Discovery Assistant is a great help when creating data mining processes. Based on the specification of input data and a modelling task, it automatically creates processes tailored specifically to this data. Based on the analysis of hundreds of processes (meta mining), it selects operators that are specifically well-suited for the problem and data set at hand. E.g., it chooses operators that have achieved good accuracy on similar data sets in the past. Furthermore, it takes care of preprocessing which may be necessary for applying certain algorithms. E.g., it will perform and appropriate normalization, discretization, or missing value replacement when required by the learning algorithm. Here, too, appropriate preprocessing operators are selected based their projected impact on the overall performance of the process.
To use the extension, switch to the IDA perspective identified by this icon: . If you use it for the first time, read the installation instructions below.
Creating a data mining process using the Intelligent Discovery Assistant can be done by following four simple steps. Step 4 contains a small hint indicating what to do next.
- Import data. If your data is already saved in your RapidMiner repository or on RapidAnalytics, there is nothing to do. Otherwise, you can now import CSV or Excel file into your repository. If you want to execute a prediction task make sure to mark your target variable as "label".
- Select data. You can choose up to two data sets by dragging them into the respective area. Once data sets are dropped here, meta data will be displayed. The IDA will automatically identify appropriate roles: training data, test data, or application data.
- Select goal. You can select the goal or task you want to execute. A short informational text is displayed on the right hand side, describing the individual options.
- Evaluate workflows. Once the above settings are made, the instructions in Step 4 should vanish and you should be able to click on "Fetch plans". The actual planning can take a while. A number of plans will be generated and displayed in a table. You can select those which you consider promising and click "Evaluate" to execute these processes and see their respective performances. Finally, you can chose to open one of the processes in RapidMiner which will bring you back to the process design perspective.
For debugging purposes, e.g. to use the planner in eProPlan, you can export meta data of example sets in OWL format to a file. The IDA extension adds another action to the context menu that opens when right-clicking an entry in a repository. This action "Export Meta Data" lets you create such an OWL file which can be opened, e.g., in Protége and interpreted by eProPlan.
The following video demonstrates the usage of the IDA Extension.
This version demonstrates the wizard-style version of the IDA Extension.
The IDA Extension can be downloaded from the Rapid-I Marketplace from within RapidMiner. Details are available here.
Once the extension is available in RapidMiner, go to the IDA perspective which can be identified by this icon:
Before the IDA extension can be used for the first time, you must install dependencies: XSB Prolog and Flora 2. This can be done from within RapidMiner. If you start the planner for the first time (clicking the "Play" symbol in the toolbar of the IDA view), an installation dialog will open. This dialog can be opened at a later time by using the icon on the far right of the IDA view's toolbar. This dialog offers various options (depending on your operating system) to install XSB and Flora2:
- Binary installation. Downloads XSB and Flora 2 binaries for your system and installs them to a fixed folder. This path cannot be changed. Make sure we are running with write permissions for this folder.
- Source installation. Downloads XSB and Flora 2 sources and compiles them on your system. The executables will be installed in the directory which can be specified. This option requires a C compiler to be installed.
- Use existing installation. A pre-existing installation (manual or from a previous IDA installation) exists. Flora 2 and XSB must be installed in subdirectories named "flora2" and "XSB" of a directory which can be specified.
- Client-server-installation. A server installation exists and will be connected to using the hostname and port specified below.