Jan 23, 2017

Machine Learning in Cyber Security Domain - 3: Fraud Detection

Fraud is one of the ancient thing in human history. As there is always people who is fraudulent, there is also people who defrauded. The money e.g. credit cards are well-known targets for being targeted by fraudulent activities. With the development of e-marketing sector, the count of fraudulent activities are rising day by day. Users credit cards informations stored in some companies’ databases, such company types as banks, online shopping companies or online service providers. We witness a growing presence of frauds on online transactions with the widespread use of internet day by day. As a consequence of this, the need of automatic systems which able to detect and fight fraudster is emerged.

Fraud detection is notably a challenging problem because;
  • Fraud strategies change in time, as well as customers’ spending habits evolve.
  • Few examples of frauds available, so it is hard to create a model of fraudulent behaviour.
  • Not all frauds are reported or reported with large delay.
  • Few transactions can be timely investigated.
With the large number of transactions we witness everyday;
  • We can not ask human analyst to check every transactions one by one.
  • We wish to automatise to detection fraudulent transaction.
  • We want accurate prediction, i.e. minimise missed frauds and false alarms.
It can be overcome this bad situations with systems which developed with machine learning techniques. Systems can learn complex fraudulent pattern by examining the data in large volumes. And this systems can also create optimal model for fraudulent activities which has complex shapes. Thus, successful predict can be done for new type of fraud. And system can adopt itself to timely changing distribution (fraud evolution). However systems need enough samples to achieve successfully learning.
Basicly, user profile created for every user in the detection systems. This profile must be updated timely. When the system has trained with enough samples, systems has detailed information about users spending habits for monthly, weekly or daily. For example, suppose that while a student can spend $100 for a week, a businessman can spend $1000 for week. While a fraudulent activity with $400 spend at once has high fraud probability for student, similarly it has very low fraud probability for businessman. Of course, special days such as new year, birthday or weekends must be considered when creating algorithm for fraud detection, because students can also spend too much in these days. Generally, fraudulent does not know victim’s spending habits, because of that fraudulent activity has inappropriate matching to user profile presumably. But if fraudulent activity fits to user profile, it may be hard to detect.
In machine learning literature, fraud detection systems can be build with supervised, unsupervised or mixed approaches. Every type of approach has a little difference according to working logic. Approaches and their working logics given Figure below.
Screen Shot 2016-12-17 at 17.25.23.png

In supervised learning, using labeled historical fraud data to create user profile in the train phase . This type of approaches are similar to Signature Based IDS/IPS, so this type of approaches can detect fraudulent activities if they are well known, but new type of fraudulent activities can not detected by these systems. Systems which is trained with unsupervised algorithms can detect unknown fraudulent activities. This type of approaches are similar to Anomaly Based IDS/IPS, so although this type approaches can detect unknown fraudulent activities, in some cases they may not detect well known fraudulent activities.  To achieve the disadvantages of both techniques, mixed approaches are developed. In this type of approaches, supervised and unsupervised algorithms works together, so both well-known and unknown fraudulent activities can be detected efficiently.