Non-functional workflow assessment aims at providing additional set of measures that describe individual data-mining workflows. These measures, in addition to measures of performance used by probabilistic ranker, provide
additional dimensions for choosing or diferentiating between workows that solve the same task. For example,
given a group of workows estimated to be similarly accurate, it might be helpful to consider also their estimated
time consumption or understandability in order to diferentiate among them and to select one that fits well also
to such parameters. The main motivation for developing this functionality is to assess workows that are
automatically generated by workow planners developed in e-LICO.
The assessment of non-functional characteristics of data-mining workflows is based on two stand-alone components for qualitative and quantitative assessment. There is also a third software component, which takes care of integration among the qualitative and quantitative components and their integration with other workflow ranking tools.
The description of software for the assessment of non-functional characteristics is therefore made of separate parts for:
- qualitative component
- quantitative component
- integrational component
A qualitative multi-attribute model was created for the assessment of non-functional workflow characteristics. Based on description of a data-mining workflow, the model evaluates its qualitative characteristics and provides also an overall combined assessment or score of a given workflow. Overall performance is a topmost concept in the model, which depends on assessments of lower level concepts (characteristics). The model's hierarchical structure below reveals these dependencies:
Methodological and techincal details of the modeling methodology and approaches for assessment that we used are described in "How it Works" tab.
A prerequisite for the model's use is its hosting platform DEXi, where the model can be previewed, changed and used manually for the assessment of workflows. This allows for a user-friendly exploration of the model's structure and rule-based functions and comparison of workflows.
DEXi is available freely at: http://kt.ijs.si/MarkoBohanec/dexi.html
Manual use of this model is, however, not the primary goal of its implementation. The model was made to be used primarily for automatic assessment of large amounts of workflows. Given a known workflow description format, an intermediate component is developed that parses through workflow descriptions to gather all the necessary information, calculates all the mentioned transformations and triggers evaluation according to the qualitative model. Specific tools of this kind were developed for automatic evaluation of workflows in connection with e-LICO components for workflow ranking. They are presented in the third part - the section named Integrational component.
The assessment of quantitative workflow characteristics aims at assessing the resources demanded by
a particular workflow. Knowing approximately resource requirements of workflows (execution-time and memory
requirements) in advance, can be an important aspect in some applications. Execution-time can be even important
in a scientific setting, in cases in which too long execution times could prevent real usability. We have devised a simple and robust experimental scheme for the measurement of resource consumption of individual operators over a set of datasets with diverse characteristics. In deployment, execution-time estimation for the particular workflow is
based on meta-data of the input dataset, and the use of individual operators in the workflow. For each operator runtime is estimated by a robust k-NN approach, which uses dataset meta-data and operator execution-times
determined over a set of pre-designed experimental datasets. Result of this work is a script for estimation of execution times, together with the experimental scheme for its adaptation to new situations/datasets.
The integrational component is a script written to act as an intermediator among the outside ranking components and the two (qualitative and quantitative) components for non-functional workflow characteristics assessment.
The script supports the process with the following steps:
- the outside ranking component for workflow assessment calls the script for non-functional workow assessment with a batch of workflows to assess,
- the script for non-functional workflow assessment goes through the batch, parses the workflow descriptions and calls quantitative and qualitative assessment on each item,
- internally, the qualitative part calls the quantitative assessment once more for its purposes (time estimation),
- the assessments are reported back to the outside ranking component.
The data interchange among the outside ranking component for workow assessment and the script for non-functional workow assessment is done in batches of variable sizes containing workow descriptions. Batches are given as raw text files with each workow description in its own line. Workow description is given in the format that follows the interchange among IDA planner and the ranking component.
- DEXi (for manual GUI use)
DEXi is a multi-attribute modeling tool that is freely available from its home page.
- Download the workflow assessment model file overallWFassessment5.dxi.
- Open this file with DEXi.
Execution-time estimation procedure is available as an executable jar file. You can download it along with a couple of example workflows from here.
- Python (2.6 or later) with networkx package.
The entire set of components for evaluation and integration is provided in the BatchEval.zip archive, which is available here.
In order to use the components, extract BatchEval.zip into a directory where it can be called by Python and
Java. File runtimeEstimation.zip must also be extracted into the same directory.
The usage instructions for the three components of non-functional workflow assessment are given separately and are quite lengthy. To jump to usage description of the component of interest, you can click on a component from this list:
Manual use of the qualitative component (multi-attribute model) is described here. For automated use see usage instructions of the integrational component.
For manual use, the model file overallWFassessment5.dxi must be opened in DEXi, where it appears like this:
The tree structure on the left side shows the hierarchical structure of the model. The icons right of the structure enable modifications in the structure of the model (adding/deleting/copying nodes). Further to the right, there are controls for naming of nodes, for definition of value scales of the nodes and for definition of aggregation (or utility) functions. The naming part is trivial. The scales are defined for each node, therefore, for each concept we are modeling. The scales are qualitative (see example for the overallWFassessment node in the picture). The aggregation functions are used for transformation of lower-level node values into the values of higher-level nodes. As the scales are qualitative, the functions are also of this kind and can be seen as a kind of collections of if-then rules that are defined for every higher-level node. An example of function (it is function definition GUI of DEXi) for ControlFlow is here:
In the picture, it is shown how a qualitative value from a value scale of ControlFlow is perscribed for each possible combination of the values of Interconnectedness, SplitsJoins and Modularity. In this fashion, qualitative aggregation functions are defined for each higher-level node.
Given a defined and finished model, we can define a number of options or candidates for assessment. This is done in DEXi's Options tab. Each option in our case represents a workflow and it is defined with the values of the nodes on the lowest level, which do not depend on aggregation functions, but are direct inputs into the model. An example situation with three workflows as options is given here:
The essential task of the model is evaluation or assessment. The options that we describe get evaluated and the results are presented under the Evaluation tab. Evaluation of our example options with the model would look like this:
We can see that first two options are both evaluated as appropriate, although they are quite different regarding their characteristics what can be seen from evaluations of intermediate concepts. The third option is similar to the first option, but with some additional drawbacks, which cause it to be evaluated a bit worse.
DEXi allows various analyses and visualizations that are explained in detail in the DEXi's user manual which can be downloaded from here.
Execution time estimation procedure is a command line Java program that takes the following input:
java -jar runtime_estimation.jar <workflow_file> <calibration_value> <verbosity_parameter>
Workflow file contains names of operators, the number of operator executions in the workfow, meta-data they operate on, and optionaly a list of parameter-value pairs. If no parameter-value pairs are given, default values are assumed in modeling of the execution time. Example of an input file representing one particular workflow:
Two other, optional inputs to the program, are calibration parameter obtained from running RapdiMiner calibration workflow (available here) and verbosity tag. The calibration parameter is optional and if it is not set (or set to zero) no calibration is performed.
An example run looks like this:
java -jar runtime_estimation.jar test_workflow 0 verbose
And the output is:
Read calibration data from data/calibration_1.csv
Calibration factor is 1.0
Read operator data from data/operators3.csv
Read workflow data from test_workflow
"RM_XValidation" not found (using RM Default Model).
RM_XValidation 1.619 ms
RM_Weight_by_Chi_Squared_Statistic (with 10 repetitions) 2397.751 ms
"RM_Select_by_Weights" not found (using RM Default Model).
RM_Select_by_Weights (with 10 repetitions) 16.186 ms
RM_k_NN_MixedMeasures (with 10 repetitions) 209.992 ms
"RM_Apply_Model" not found (using RM Default Model).
RM_Apply_Model (with 10 repetitions) 16.186 ms
"RM_Performance" not found (using RM Default Model).
RM_Performance (with 10 repetitions) 16.186 ms
Total runtime is 2657.921 ms
After all files are extracted into a folder as explained in Installation tab, the script to call is PPparserBatch.py. It is made in a way that it can be called without arguments, which runs a test on a hard-coded example and should produce something like this:
/$ python PPparserBatch.py
overallWFassessment of this WF is:
good on a scale from bad to excellent.
its topological complexity is low.
its familiarity is high.
its interpretability is low.
its generality is high.
its noise sensitivity is high.
its scalability is medium.
its time performance (speed) is very high.
For regular batch use, an argument has to be added - a text file with workow descriptions in each line. Call
and result on the example batch file is shown below. Result slots contain an overall qualitative assessment and a
execution-time assessment in a separate line. Slots are provided in the same order as the workow descriptions
in the file.
/$ python PPparserBatch.py batch.txt
NOTE: The final version of this script reports the assessment values transformed into numbers from interval [0, 1] for easier integration with the probabilistic ranker. In the above example, for easier comprehension, non-transformed qualitative values and miliseconds are shown.
Explanations of methodological approaches and various technicalities for the three components of non-functional workflow assessment are given separately here and are quite lengthy. To jump to usage description of the component of interest, you can click on a component from this list:
- Qualitative component
- Quantitative component
- Workflow representation
- Dealing with the dynamic changes in algorithm parameters or dataset characteristics
- Datasets for execution-time modelling
- Dataset meatafeatures
- Performing execution-time experiments
- Calibration procedure
- Regression modelling
- Modelled operators
- Default behaviour
- Advice for memory estimation
- Integrational component
The qualitative component is a multi-attribute qualitative model of DEX methodology. These models are typically used for decision analysis tasks and various kinds of assessments where influences of multiple criteria are to be taken into account. The models of DEX methodology are qualitative, hence most suited to situations where the concepts that are to be modeled are of qualitative nature and difficult to represent numerically.
These models are based on a hierarchical decomposition of the problem, where the target concept is hierarchically decomposed into subconcepts (or aggregate attributes) and finally to a finite set of (measurable) basic attributes. Values of aggregate attributes are evaluated with functions, which depend on the corresponding attributes located on the lower levels of the hierarchy. The values of basic attributes are inputs to the system and must be provided by the user.
For each attribute, DEX requires a definition of a set of corresponding qualitative values. These are usually descriptive and take qualitative values such as: very low, low, medium and high. Values of aggregate attributes are estimated using qualitative aggregation functions which are represented with if-then rules. Those are usually presented in tabular form. Using such functions, the values of higher-level (aggregate) attributes are obtained based on the values of lower-level attributes. The attributes at the lowest level are basic descriptors of the options we are assessing and represent inputs to the model that must be provided by the user.
For more information on DEX methodology see the manual of its principal software tool DEXi. In what follows, the internals of the e-LICO qualitative workflow assessment model are presented in detail.
There are two main groups of concepts in the model. The group named UNDERSTANDABILITY aggregates the concepts such as Interpretability, Familiarity and (Structural) Complexity, which express some notion of how easy it is for an average user to comprehend the workflow at hand. The other group, ROBUSTNESS aggregates concepts that describe the workflow's (in essence its algorithms') ability to cope with dificulties and changes in data and problem characteristics. There is also a quantitative assessment of TIME consumption added to the overall model. It is treated as an input on the highest level since the overall qualitative evaluation model covers also the relation with the numerical non-functional assessment.
Let us now describe the components of the model, together with proposed transformation of inputs into qualitative values.
The UNDERSTANDABILITY subtree of the model is composed of three concepts: Complexity,
Familiarity and Robustness.
The COMPLEXITY concept is used to assess the topological features of a workflow and their inuence on understandability. A somewhat lengthier, but more precise naming would be STRUCTURAL COMPLEXITY, since it is the structural aspect of complexity that we cover with this concept. Important indicators in this respect are: the number of operators (boxes), the number of connections, graph degree, connectedness, etc. In the following all input attributes (lowest-level concepts) are explained.
NumNodes: Simply the number of boxes, called "operators" in RapidMiner. The actual numerical value is
categorized into the values [high, medium, low] of this input attribute like this:
[0, 7] is low
(7, 14] is medium
> 14 is high
NumEdges: Number of connections among boxes. The actual numerical value is categorized into the values
[high, medium, low] of this input attribute like this:
[0, 10] is low
(10, 20] is medium
> 20 is high
NumPaths: Number of paths through the workflow. Double connections among operators are not counted
twice as in NumEdges. The actual numerical value is categorized into [many, some, few] like this:
[0, 2] is few
(2, 7] is some
> 7 is many
Density: Graph density of the workflow. Measured on a scale from 0 to 1, where 0 denotes an empty graph
and 1 a full graph. Numerical values from this interval are categorized like this:
[0, 0.20] is low
(0.20, 0.40] is medium
[0.40, 1] is high
SplitsJoins: Number of splits and joins. The actual numerical value is categorized into the values [high,
medium, low] of this input attribute like this:
[0, 3] is low
(3, 8] is medium
> 8 is high
Modularity: The fan-in/fan-out principle is used here in the sense that the module containing submodules
makes sense in most cases, but not in the case when fan-in and fan-out are high, especially at the same time
. This would mean the modularization is poor. Proposed scale [poor, good] with the sum of fan-in and
[0, 4] is good
> 4 is poor
AverageGD: Average graph degree of the workflow's graph representation. The actual numerical value is
categorized into the values [high, medium, low] of this input attribute like this:
[0, 2) is low
[2, 3) is medium
≥ 3 is high
MaxGD: Maximum graph degree of the workflow's graph representation. The actual numerical value is cate-
gorized into the values [high, medium, low] of this input attribute like this:
[0, 3) is low
[3, 5) is medium
≥ 5 is high
Familiarity is intended to assess how familiar are the workflow components to the user. This feature is intrinsically subjective and in any real-world application the transformation from workflow components into this qualitative value should be allowed to be dened by the user. In the simplest case, this can be achieved with a look-up table where component (operator) names are listed, accompanied with qualitative familiarity tags by the user. In order to simulate such a scenario and to provide an approximation of a default look-up table of this kind, we have produced it according to the number of search-engine hits per name. This way, we have tried to approximate familiarity in general, that is, as a kind of default familiarity. The familiarity approximation that was used in the integrational script for calculating inputs for the qualitative model was conducted using Google search on 8th of November 2011. A default familiarity value used for unknown operators is medium.
How interpretable are the results of the data-mining process, the models built from data? The aim of this attribute is to reward machine learning algorithms that build easily interpretable (white box) models, providing an insight into the modeled domain and to penalize black box models, which cannot be used for that additional purpose. The basic solution is a look-up table with machine learning algorithms and their models' interpretability as provided by data-mining experts. A web-based questionnaire was used to gain the votes for a number of selected machine learning algorithms. Its results were discussed at a dedicated brainstorming meeting and were used in the integrational component for determining inputs to the model. All the non-assessed algorithms are given a default value of medium.
The ROBUSTNESS concept deals with handling the variability of datasets that can be used in the workflow regarding the structure, size and quality of data. Robustness has a simple decomposition into three lower level concepts that are also basic attributes or inputs: NoiseSensitivity, Scalability and Generality.
Sensitivity of the workflow to noise in data. In the current solution, this aspect is covered by the sensitivity of the main data-mining algorithm (learner or data descriptor) to noise in data. An assessment of this feature was made for a selection of algorithms. Results from literature and own experiments (on various datasets we were measuring percentage of accuracy drop caused by adding 10%, 20% 30% and 40% of noise) are used in creation of a look-up table. All the non-assessed algorithms are given a default value of medium.
The ability of the workflow to deal with increasing amounts of data. Similarly to the NoiseSensitivity, this attribute is focused on the learning algorithm. A series of experiments was done for the purpose of scalability assessment, which resulted in a look-up table to be used in overall assessment model. The measure for scalability assessment is the increase in time consumption caused by enlarging a dataset by adding additional items and additional attributes. Time consumption per-se was not taken into account here. For example, let us imagine that an algorithm A takes 2 seconds for analyzing a dataset, and 10 seconds if we double the dataset's size. Let us imagine another algorithm B that would take 20 seconds for the dataset and 40 for the doubled one. Because of a smaller time increase ratio, the algorithm B would be considered more scalable in our assessment, even though it takes more time than algorithm A. Such an example, according to our experiments, is rule induction, which takes a lot of time, but given a larger dataset, the relative increase is quite low. The actual time consumption is taken into account elsewhere - in the "Time" component. All the non-assessed algorithms are given a default value of medium.
Generality is used to express what kind of variability in purpose, use and qualitative characteristics of data are allowed by the workflow. For example: a learning algorithm in the workflow might only work with data of specific characteristics (e.g. only with binomial label). While this is perfectly fine when the given dataset is appropriate, it would still be better to use a more general algorithm, given that everything else stays the same. A more general algorithm might be left in the workflow even though the dataset and its characteristics might change over time. For our prototype solution we limited the assessment of generality to learning algorithms. For each of the selected learning algorithms a value of generality is prescribed according to the number of capabilities as denoted in the RapidMiner's operator information overview. Algorithms covering up to 3 (including) listed capabilities are given "low" generality, the ones having 4 to 6 (including) listed capabilities have "medium" generality and the generality of operators with 7 capabilities or more (out of 10 that are considered in predictive modelling) is "high". All the non-assessed algorithms are given a default value of medium.
The run-time performance is used in the overall qualitative model in a categorized form. For this purpose, the run-time component is called and the resulting time (in seconds) is categorized into a qualitative value in this way:
[0, 5) is very high (very high performance, which means very low requirements)
[5, 60) is high
[60, 300) is medium
[300, 3600) is low
> 3600 is very low
Our approach to execution-time estimation is to use robust k-NN regression for modelling algorithm behaviour on various types of datasets. We deduce the information on operator and dataset types from the workflow representation and use it to estimate workflow execution-time offline. Similarities between algorithms allows us to use identical models for multiple algorithms, often belonging to the same class (i.e. tree inductance). We aimed to design a system that satisfies following criteria:
- Robustness and usability: The procedure should be robust and usable for algorithms with different characteristics with as little modifications as possible, including both non-parametric and parametric algorithms, and it should be fast enough in deployment mode.
- Reasonable estimates: Estimation models should provide reasonable estimates and be easily upgraded or extended, even in cases when data set complexity parameters are outside the limits used for learning the models (extrapolation), and should make smaller relative errors as the execution-time gets longer (for very short time even larger relative errors are tolerable).
- Portability/adaptability: The estimation models must be easily calibrated for the particular hardware system but also easily determined or adapted for another data mining platform.
Figure below shows an overview of the execution-time estimation procedure. First, workflows are preprocessed into representation that exposes the information necessary for estimation, see section "Workflow representation" for details. This information is then used to estimate the execution-time of the workflow by summing individual estimates for each operator.
This approach assumes all operators are executed sequentially which is apparently not always true. The simplest case is when some dominating operator has ability to process its multiple iterations in parallel. This case is easily identified from the operator parameters and can be accounted for by adding only the longest iteration to the total time estimate. More complicated cases would include parallelism on the level of multiple operators and would require nontrivial preprocessing of the input workflow.
The following factors influence the runtime of a workflow and are expected as an input:
- Number and type of operators.
- Dataset characteristics on which the operations are performed.
- Parameters of operators.
Modelling the influence of operator parameters requires more extensive experimentation so we decided to constrain our estimate to default parameters only. We used the input format described in the "Usage" section, although any representation that exposes above mentioned information can be used (for example, original RapidMiner xml representation).
Workflow representation has to be able to account for all relevant factors influencing execution-time. This includes dynamic changes in algorithm parameters or dataset characteristics during workflow execution. Unfortunately, very often is the case that dataset characteristics or algorithm parameters are not know beforehand and reliable prediction can be achieved only by dynamically updating the relevant information during runtime. This include cases when:
- Parameters depend dynamically on some variable in the workflow (e.g. macros in RapidMiner).
- Parameters depend dynamically on some characteristic of the dataset that is not represented in the metadata. For example, we can not predict the outcome of selecting all examples that have value greater than some specific value without having the access to the whole dataset.
However, in our approach we covered following cases that can be easily accounted for before workflow execution:
- Dominating operators that have predefined number of iterations.
- Sampling and feature selection operators with explicitly defined parameters.
This method allows us to estimate execution-time of potentially very large number of workflows without the need to actually execute any of them.
We used the uniform design approach to model the basic dataset characteristics that influence execution-time performance: number of examples N, number of attributes M, percentage of categorical attributes L and class distribution (percentage of lesser class) C. In comparison with factorial design where the number of experiments rapidly rises, uniform design tries to uniformly cover the space of factors while performing the limited number of experiments. Uniform design tables for different number of factors and their levels can be found at http://www.math.hkbu.edu.hk/UniformDesign/.
We also included one additional factor: problem complexity Pc. This encapsulates the class separability in case of simple two-class classification: random (inseparable), sum classification (linearly separable) and handwritten digit classification (nonlinearly separable, based on a Gissete dataset). The three model datasets are further processed to obtain enough distinct datasets for experimental design. The preprocessing operations include binominalization (converting multi-class datasets to two-class), fitting data to specifications (random selection of N examples and M attributes, ensuring class imbalance C and introduction of L nominal attributes by random discretization into 2 to 5 bins) and class label standardization ("positive" and "negative" labels for every dataset).
Each dataset can be described by an arbitrary number of metafeatures that can be used to characterize its influence on the behaviour (e.g. execution-time) of various algorithms operating on it. For our estimates we choose those metafeatures whose number is constant i.e. the ones that do not depend only on a certain attribute but the dataset in whole. Having efficiency in mind, we rely on the metafeatures that are either readily available (through a Rapid Miner meta-feature set for datasets) or their calculation is fast enough to render fast deployment. These dataset meta-features represent independent variables (together with the measured execution-time as target) which we use as attributes in k-NN estimation of execution-time.
Experimental datasets obtained through the process described in the previous subsection were used with a RapidMiner workflow that extracts the metafeatures of the input dataset, loops over experimental datasets applying the predefined operator and measures execution time. It repeats the run for each operator-parameter setup several times (5-10) in order to assess the variance in measurements and obtain smooth estimate.
The workflow for execution-time measurement and experimental dataset construction, as well as the results of measurements are available [here]. All modelled operators are listed in section "Modelled operators".
When applied on a different machine or computing environment, the reference execution-times used for training the model need to be scaled (or calibrated) if we want to avoid systematic error caused by the difference in computing power. We provided a simple calibration procedure (RapidMiner workflow) that loops a number of times over the single operator (Naive Bayes) and logs average execution-time of this operator on the user's environment. This average execution-time represents the scaling constant to be supplied to the execution-time estimation procedure, prior to usage in a particular computing environment.
For modelling runtime estimation we use robust k-NN-based regression model. In comparison to standard k-NN there is no need to set definite k as a parameter. Instead, k is determined automatically for each individual data point and depends roughly on the density of points in the vicinity. It also has a built-in feature weighting where each attribute is weighted according to its relative importance in estimating the dependent value y.
Of course, other methods of regression modelling are also possible and are probably worth considering. We experimented with polynomial models in simplified setting with randomly generated polynomial classification datasets using only two metafeatures for prediction (number of examples N and number of attributes M in dataset) and achieved reasonable execution-time estimates for individual operators. Figure below shows execution-time experiments and estimates for "Weight by Gain" operator on 500 randomly generated test datasets.
We decided to model representative number of operators from two main categories that carry most of the execution-time load in majority workflows: classification operators and feature weighting operators. Here is the complete list:
Data to Weights
Linear Discriminant Analysis
*Logistic Regression Evolutionary
Regularized Discriminant Analysis
Single Rule Induction Single Attribute
Support Vector Machine LibSVM C SVC linear
Support Vector Machine LibSVM C SVCrbf
Weight by Chi Squared Statistic
Weight by Deviation
Weight by Gini Index
Weight by Information Gain
Weight by Information Gain Ratio
Weight by PCA
Weight by Relief
Weight by Rule
Weight by SVM
Weight by Uncertainty
Weight by User Specification
Weight by Value Average
The ones with the prefix "W-" are from Weka extension. Operators marked with a star are modelled by default values. Methodology for defaults is explained in the next subsection.
In case no explicit model is available for a particular operator we are forced to assume some default behaviour. they are treated using the same procedure as modelled operators, but using experiment tables that are crafted in one of the following ways:
- Default model table (based on experiments using "Default Model" operator), is applied for all non-modelled operators for which no experimental measurements are available.
- Default model tables for specific operator types, based on averages of experiment tables of modelled operators of the same operator type. For example, we are currently modelling "Logistic Regression" and "Logistic Regression Evolutionary" operators as generic logistic regression operators that behave identically as "W-SimpleLogistic" operator.
In contrast to execution time estimation, memory estimation tries to predict the largest allocated memory space at any time necessary for the operator function; the block of memory whose parts cannot be deallocated because they are currently in use. Unfortunately, the main problem with memory estimation are uncontrollable interventions of garbage collection processes. In Java this is exemplified by the use of soft references which are managed in nondeterministic way by the garbage collector. Our experiments show that the estimations are too stochastic for reliable prediction, and that workable solution should probably take into account inner workings of the memory management system, as well as the context in which the process is executed. Pure machine learning approach where predictions are based solely on experiments with datasets and algorithms of various characteristics is probably not enough to achieve reasonable estimates.
The integrational component is made to:
- receive input describing a batch of workflows to assess (from the probabilistic ranker)
- parse this input and generate input appropriate for the qualitative and the quantitative component
- call the two components for each individual workflow in the batch and collect their assessments
- calculate the necesaary batch statistics (for ranking) and report results
The components to connect are largely independent and written in Prolog, Python and Java. To give to an interested person some hints on which parts of the code must be adapted to enable assessments of arbitrary workflows in arbitrary format, here are some instructions and comments on some specific pieces of code.
Let us first take a look at the files contained in the BatchEval.zip file:
- PPparserBatch.py: main script for non-functional workflow assessment
- WFmodel.py: overall qualitative model encoded in python
- proDEX.py: helper script (library) needed by the main script
- batch.txt: an example of a batch of workflow descriptions
- runtime_estimation.jar : main script for execution-time estimation
- README_runtime: file with instructions and explanations for the execution-time estimation component
- data: directory containing files needed by the execution-time estimation component
- workflows: test workflows for execution-time estimation component
The use (manual or automated use by the probabilistic ranker) starts by calling the PPparserBatch.py script with a file that contains an arbitrary number of workflow descriptions. Such a set od workflow descriptions is called a batch. Batches are given as raw text files with each workflow description in its own line. An example of such file is batch.txt, which is included in the BatchEval.zip archive. Workflow description is given in format that follows the interchange among IDA planner and the probabilistic ranker. An example workflow description is here:
[Unit_Main_Process, RM_Group_Models_empty([produces - i1436]), -1, RM_Discretize_by_User_Specification_all([producesData - i1598, producesPrePropModel - i1599, simpleParameter_attribute_filter_type - value("all",string), simpleParameter_include_special_attributes - value("false",string), uses - dataset_srbct_khan]), -1, RM_Group_Models_append([produces - i1603, usesFirstModel - i1436, usesSecondModel - i1599]), -1, RM_Group_Models_empty([produces - i1605]), -1, RM_Group_Models_append([produces - i1607, usesFirstModel - i1603, usesSecondModel - i1605]), -1, RM_XValidation([producesPV - i1664, producesPredictionModel - i1646, simpleParameter_number_of_validations - value(10,integer), uses - i1598]), RM_k_NN_NominalMeasures([produces - i1646, simpleParameter_measure_types - value("NominalMeasures",string), uses - i1610]), -1, RM_Apply_Model([produces - i1660, usesData - i1611, usesModel - i1646]), -1, RM_Performance_Classification([produces - i1664, uses - i1660]), -1, -1]
For each item in the batch (each workflow description), this input gets parsed and processed into a format suitable for the qualitative and quantitative components. The qualitative component was transformed for this purpose into a Python code format (WFmodel.py file), which can be called directly by the main script. Another thing to note regarding the qualitative component is the calculation of qualitative inputs based on workflow descriptions. For some of the inputs, these depend only on the learning algorithm used and are expert assessed. These assessments are encoded in lines 260-264 and can be updated and/or extended there. Transformations of all inputs (expert-based and calculated numeric ones) into probability distributions that the assessment component takes are done in lines 270 - 372 where boundaries of intervals for transformations into qualitative values can be observed and/or changed.
Extending the integrational component to work in other situations:
- if different workflow descriptions are to be used - change the parsing part of the PPparserBatch.py script
- if changes to the qualitative model would be needed - change the WFmodel.py qualitative model description
- if different transformations of inputs are wished for - change the transformations in lines 260 - 372
- if new execution-time experiments are performed - add the corresponding entries to the operators.csv file
- if changes to the quantitative model would be needed - change the RuntimeEstimationWithMFPropagation.class in runtime_estimation.jar