Monday, July 29, 2013

Anomaly Detection using Oracle R Enterprise (ORE) | SVM for Anomaly Detection

R is an open source scripting language and environment for statistical computing, data analysis & graphics.R provides an integrated suite of software facilities for data manipulation, calculation and graphical display- it's an integrated environment. Around 2 million users in the world are widely using R especially by corporate analysts & data scientists.

Anomaly/Outlier detection has wide applications. It can be used in fraud detection, for example, by detecting unusual usage of credit cards or telecommunication services. In addition, it is useful in customized marketing for identifying the spending behavior of customers with extremely low or extremely high incomes, or in medical analysis for finding unusual responses to various medical treatments.

Oracle R Enterprise is a flavor of R that enhances the open-source R for analyzing and manipulating data in Oracle database through R, transparently. It helps to use in-database predictive analytics algorithms seamlessly through R. 

One of the classification algorithms implemented in ORE is Support Vector Machines. In ORE, the interface function ore.odmSVM is used to train a support vector machine using Oracle Data Mining. It can be used to
carry out general regression, classification, and anomaly detection. The SVM algorithm automatically handles missing value treatment and the transformation of categorical data, but normalization and outlier detection must be handled manually. So, we need to do feature normalization before feeding the training data to the model. The function predict computes predictions based on the input data and model.

An example for anomaly detection is shown based on the existing dataset 'mtcars' available in R documentation:

DATA SET: Motor Trend Car Road Tests
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
[1] mpg Miles/(US) gallon
[2] cyl Number of cylinders
[3] disp Displacement (
[4] hp Gross horsepower
[5] drat Rear axle ratio
[6] wt Weight (lb/1000)
[7] qsec 1/4 mile time
[8] vs V/S
[9] am Transmission (0 = automatic, 1 = manual)
[10] gear Number of forward gears
[11] carb Number of carburetors

For anomaly detection, we first build a classification model based on support vector machine and then use the anomaly detection approaches. We've used gear, no. of cylinders, V/S ratio as the features for building the anomaly detection model.

R Script:
  if (!  
    ore.connect("Database_name", "SID","", "password",port=1521, all=TRUE)  
  m <- mtcars  
  m$gear <- as.factor(m$gear)  
  m$cyl <- as.factor(m$cyl)  
  m$vs  <- as.factor(m$vs)  
  m$ID  <- 1:nrow(m)  
  MTCARS <- ore.push(m)  
  svm.mod <- ore.odmSVM(gear ~ .-ID, MTCARS,"classification")  
  svm.res <- predict (svm.mod, MTCARS,"gear")  
  with(svm.res, table(gear,PREDICTION)) # generate confusion matrix  
  svm.mod <- ore.odmSVM(~ .-ID, MTCARS,"anomaly.detection")  
  svm.res <- predict (svm.mod, MTCARS, "ID")  

Based on the above designed model, we can predict the anomalous design of the cars. The result stored in the svm.res can be used to observe the class of each of the test set. The table function returns the confusion matrix from which we can analyze the accuracy of the algorithm for our particular data set. To present the result obtained through the above model in the Oracle Business Intelligence Enterprise Edition (OBIEE), we can push the above result into the Oracle database and metadata repository (RPD) can be built accordingly.

(Queries are always welcomed !!)


  1. Suffering through a game is one thing, but suffering through it together with a tight-knit group of friends is an entirely different experience. And ffxi gil price will help you get more fun in-game.


Related Posts Plugin for WordPress, Blogger...