public class CaseStatisticsAnalyzer
extends java.lang.Object
Dit-Yan Yeung and C. Chow. Parzen-Window Network Intrusion Detectors. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 4, pages 385–388 vol.4, 2002.
Extensions are made to exploit dependencies in business processes.
Constructor and Description |
---|
CaseStatisticsAnalyzer() |
CaseStatisticsAnalyzer(StochasticNet stochasticNet,
org.processmining.models.semantics.petrinet.Marking initialMarking,
CaseStatisticsList statistics) |
Modifier and Type | Method and Description |
---|---|
double |
computePValueByApproximateIntegration(org.apache.commons.math3.distribution.RealDistribution dist,
double x) |
double |
computePValueByApproximateIntegration(ReplayStep step) |
CaseStatisticsList |
getCaseStatistics() |
java.util.List<ReplayStep> |
getIndividualOutlierSteps(CaseStatistics selectedCaseStatistics) |
org.processmining.models.semantics.petrinet.Marking |
getInitialMarking() |
java.lang.Double |
getLogLikelihoodCutoff(TimedTransition tt) |
org.apache.commons.math3.distribution.RealDistribution |
getLogLikelihoodDistribution(TimedTransition transition) |
int |
getMaxActivityCount() |
double[] |
getModelDensities(ReplayStep x,
org.apache.commons.math3.distribution.RealDistribution assumedErrorDistribution,
double assumedErrorRate)
Returns the likelihood ratio of the
ReplayStep x stemming from an error distribution,
or from the original distribution. |
CaseStatisticsList |
getOrderedList() |
int |
getOutlierCount(CaseStatistics cs) |
double |
getOutlierRate() |
double |
getPValueOfStepIntegral(ReplayStep step) |
java.util.List<ReplayStep> |
getRegularSteps(CaseStatistics selectedCaseStatistics) |
StochasticNet |
getStochasticNet() |
boolean |
isOutlierLikelyToBeAnError(ReplayStep step)
Let X be this node's random duration variable having the value x.
|
void |
setCaseStatistics(CaseStatisticsList caseStatistics) |
void |
setOutlierRate(double outlierRate) |
void |
updateLikelihoodCutoffs() |
void |
updateStatistics(double outlierRate) |
public CaseStatisticsAnalyzer()
public CaseStatisticsAnalyzer(StochasticNet stochasticNet, org.processmining.models.semantics.petrinet.Marking initialMarking, CaseStatisticsList statistics)
public CaseStatisticsList getOrderedList()
public double getOutlierRate()
public void setOutlierRate(double outlierRate)
public CaseStatisticsList getCaseStatistics()
public void setCaseStatistics(CaseStatisticsList caseStatistics)
public StochasticNet getStochasticNet()
public int getMaxActivityCount()
public int getOutlierCount(CaseStatistics cs)
public double[] getModelDensities(ReplayStep x, org.apache.commons.math3.distribution.RealDistribution assumedErrorDistribution, double assumedErrorRate)
ReplayStep
x stemming from an error distribution,
or from the original distribution.
Assume that there is but one child (could use weighted average of scores for multiple children).
Let's assume an error distribution that can shift the duration of this step and also affect the duration of the next step. We compare the joint probability of the two durations x and y (y is the activity that follows x) in the original model that we learned from historical observations with the distribution that results when we add an error along the y=-x line. Latter is correct because, if x is a measurement error, it also affects the duration of the child in a conversely. For example, when the end of x is mistakenly measured later, than the duration of y is also affected (it is shorter than expected).
x
- ReplayStep
to compute the error score forassumedErrorDistribution
- the RealDistribution
that is assumed as noise in the data for measurement errorsassumedErrorRate
- the rate of error occurrence (must be between 0 inclusive and 1 exclusive)
public boolean isOutlierLikelyToBeAnError(ReplayStep step)
We compare the probability of P(children | X) with the marginal probability of P(children | parents(X) ). If we see that the marginal probability is higher than the one given X=x, we assume that it is a single (measurement) error in the log. In the other case, we assume that X fits with the following events and is just a regular outlier.
step
- ReplayStep
Example:
U V <- parents (if there are more than one, it was a parallel split) \ / X <- variable / \ Y Z <- children (if there are more than one, the process forked into multiple parallel branches)
here, we compute P(Y=y,Z=z | X=x) and compare it with integral over X of P(Y=y, Z=z, X | U=u, V=v) That is, we compare u v \ / x with X <- and integrate over all the values of X / \ / \ y z y z
public double getPValueOfStepIntegral(ReplayStep step)
public double computePValueByApproximateIntegration(ReplayStep step)
public double computePValueByApproximateIntegration(org.apache.commons.math3.distribution.RealDistribution dist, double x)
public java.util.List<ReplayStep> getIndividualOutlierSteps(CaseStatistics selectedCaseStatistics)
public java.util.List<ReplayStep> getRegularSteps(CaseStatistics selectedCaseStatistics)
public org.processmining.models.semantics.petrinet.Marking getInitialMarking()
public java.lang.Double getLogLikelihoodCutoff(TimedTransition tt)
public org.apache.commons.math3.distribution.RealDistribution getLogLikelihoodDistribution(TimedTransition transition)
public void updateStatistics(double outlierRate)
public void updateLikelihoodCutoffs()