]>
See DMOP.owl for version info.
DMKB.owl. The ABox of DMOP. Describes data mining algorithms and operators that form part of a data miner's operational domain knowledge.
Primary Reference: J. R. Quinlan. C4.5, Programs for Machine Learning, Morgan Kaufmann, 1993.
-g
false
-s
false
-m
2
-c
0.25
CARTc: Breiman et al., Classification and Regression Trees, Chapman and Hall, 1984.
-1
$M_{S}=\frac{k\overline{r_{cf}}}{\sqrt{k+k(k-1)\overline{r_{ff}}}}$
where $M_S$ is the heuristic merit of a subset $S$ containing $k$ features, $\overline{r_{cf}}$ is the mean feature-class correlation $(f\inS)$, and $\overline{r_{ff}}$ is the average feature-feature intercorrelation. The numerator of this equation represents the features' correlation with the class, the denominator the correlation/redundancy among the predictive features. CFS estimates the degree of correlation between features $X$ and $Y$ using symmetrical uncertainty
$SU=2.0\left(\frac{H(X)+H(Y)-H(X,Y)}{H(X)+H(Y)}\right)$.
5
CHAID stands for Chi-Squared Automatic Interaction Detection.
CHAID: G. Kass. An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29:2, 119-127, 1980.
CenterAndReduce (aka Standardize): rescale all values of a continuous feature by centering them around the mean and normalizing them by the standard deviation.
CenterToMean (aka MeanSubtraction):is the process of transforming a vector of variable values so that its mean is equal to 0: $\mathbf{x}=\mathbf{x}_{o}-E[\mathbf{x}_{o}])$.
Chi2TestBasedFSAlgorithm: Add: implementedBy value RM-WeightByChiSquaredStatistic and implementedBy value Weka-ChiSquaredAttributeEval
Primary Reference:
J. Rennie, L. Shih, J. Teevan, D. Karger. Tackling the Poor Assumptions of Naive Bayes Text Classifiers, Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003).
FishesLinearDiscriminantAlgorithm: is given by Bishop 06 as an example of Discriminant Functions as he defined them. Similarly Hand et al 01, p. 331, cite it as a discriminative model (they merge Discriminative and Discriminant Function models).
ID3: J. R. Quinlan. Induction of Decision Trees, Machine Learning 1: 81-106, 1986.
ID3 uses a probability-based technique of distributing instances with missing values over the different branches [Quinlan, 1986].
IREP: Fürnkranz and Widmer's incremental reduced error pruning algorithm.See primary reference.
IREP: J. Fürnkranz, G. Widmer. Incremental Reduced Error Pruning. 11th International Conference on Machine Learning, 1994.
IREP*: W. Cohen. Fast Effective Rule Induction. 12th International Conference on Machine Learning, 1995.
IREP*: the improved version of Fürnkranz and Widmer's incremental reduced error pruning algorithm developed by Cohen for RIPPER. See reference under RIPPER-Algorithm.
Add: implementedBy value RM-WeightByInformationGain
Add: implementedBy value Weka-InfoGainAttributeEval
InfoGainRatioFWA: Add: implementedBy value RM-WeightByInformationGainRatio and implementedBy value Weka-GainRatioAttributeEval
LinearCombinationCoefficient is the coefficient or weight of a given feature in a linear model, e.g. a linear SVM model.
Niels Landwehr, Mark Hall, Eibe Frank (2005). Logistic Model Trees. Machine Learning. 95(1-2):161-205.
Marc Sumner, Eibe Frank, Mark Hall: Speeding up Logistic Model Tree Induction. In: 9th European Conference on Principles and Practice of Knowledge Discovery in Databases, 675-683, 2005.
NBTree hasBiasVarianceProfile {HighVariance}: A decision tree (with the exception of a decision stump) typically has high variance, but Naive Bayes is known to have high bias. An extensive experimental study is needed to determine avec reasonable certainty whether the combination of decision trees and Naive Bayes results in lower variance for NBTree.
NaiveBayesDiscrete is the original NaiveBayes algorithm which handles only discrete variables. It is the common core of all Naive Bayes algorithms, which vary essentially in the way they handle continuous variables.
NaiveBayesKernel: G. John and P. Langley. Estimating Continuous Distributions in Bayesian Classifiers, Eleventh Conference on Uncertainty in Artificial Intelligence, 1995.
NaiveBayesKernel: The high-bias profile is true of NB in general but to be taken with caution for NB kernel. Need to measure that empirically because NBKernel has no bandwidth parameter and everything depends on how this choice is hardcoded in Weka. [mh 2010-01-02]
NaiveBayesKernel: Even if the model structure is a Joint Probability Distribution, no model is stored since the training set is used to fit a Gaussian around each example in order to estimate the density at the query point.
NaiveBayesMultinomial: A. McCallum and K. Nigam. A Comparison of Event Models for Naive Bayes Text Classification, AAAI-98 Workshop on Learning for Text Categorization, 1998.
NormalizeByStandardDeviation: divide all values of a continuous feature by their standard deviation.
PRISM: J. Cendrowska. PRISM: An algorithm for inducing modular rules. International Journal of Man-Machine Studies. Vol.27, No.4, pp.349-370, 1987.
ReliefF-Margin: TODO: add latexFormula using Kononenko, 1994 or Sikonja, 2003.
RipperAlgorithm: William W. Cohen. Fast Effective Rule Induction. 12th International Conference on Machine Learning, 1995.
Ripper hasQuality ToleratesMissingValues: Ripper handles missing values as follows: all tests involving the attribute A are defined to fail on instances for which the value of A is missing. This encourages the learner to separate out the positive examples using tests that are known to succeed.
Ripper hasQuality toleratesNoise: See Cohen, 1995.
Ripper hasHypothesisComplexityControlStrategy {PerformanceBasedEarlyStopping}: Ripper stops adding a rule to a rule set when a rule is learned that has error rate greater than 50%. [Cohen, 1995]
SingleConditionRuleBasedFWAlgorithm: Add to its instance SingleConditionRuleBasedFWA: implementedBy value RM-WeightByRule
Add: implementedBy value Weka-OneRAttributeEval.
SumOfSquaredLinearCoefficients is shorthand for the sum of the squared weights or coefficients of the different features in a linear model, e.g. one built by a linear SVM.
TODO
Add: implementedBy value RM-WeightByUncertainty
Add: implementedBy value Weka-SymmetricalUncertAttributeEval
-1
Whitening is a preprocessing method that consists in transforming the observed (and centered, i.e. mean-subtracted) vector $\mathbf{x}$ linearly so that the transformed vector $\mathbf{\tilde{x}}$ is white, i.e. its components are uncorrelated and their variances equal unity (the covariance matrix of $\mathbf{\tilde{x}}$ equals the Identity matrix: $E{\mathbf{\tilde{x}\tilde{x}}^{T}=\mathbf{I}}$.
Ref.: A. Hyvarinen, E. Oja (2000). Independent Component Analysis: Algorithms and Applications, Neural Networks, 13(4-5): 411-430.