Jan 10, 2017

Machine Learning in Cyber Security Domain - 1: Fundamentals

In recent years, attackers have been developing more sophisticated ways to attack systems. Thus, recognizing these attacks is getting more complicated in time. Most of the time, network administrators were not capable to recognize these attacks effectively or response quickly.

Therefore, there is a lot of software has been developed to support human in order to be able to manage and protect their systems effectively. Initially, these software has been developed to handle some operations like mathematical calculations which seem very complex for human being. And then we need more. Next step was extending the ability of software using artificial intelligence and machine learning techniques. As technology advances, huge amount of data is being produced to be processed every day and every hour. Finally, the concept of “Big Data” was born and people began to need more intelligent system for processing and getting make sense of these data. For this purpose there are a lot of algorithms have been developed until today. These algorithms are used for many research area such as; image processing, speech recognition, biomedical area, and of course cyber security domain.

Beside all of these, basically the main purpose of Machine Learning techniques is providing decision mechanism to software as people do. Cyber security domain is one of the most important research area worked on. The Centre for Strategic and International Studies in 2014 estimated annual costs to the global economy caused by cybercrimes was between $375 billion and $575 billion. Although sources differ, the average cost of a data breach incident to large companies is over $3 million. Researchers have developed some intelligent systems for cyber security domain with the purpose of reducing this cost.


MACHINE LEARNING


As a beginning, Artificial Intelligence (AI) focus on to gain ability to a computer act like human. For this purpose, researchers tried to develop ai applications which can not be detected as computer by real users.  So, first generated ai applications tried to pass Turing Test successfully. The Turing test is a test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human. After that researchers discovered that it is not so easy to create an AI which works similar to human brain completely. Because of this, AI was started to use more specific application domain such as face recognition, object recognition etc.

Machine learning is a type of artificial intelligence (AI) that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can change when exposed to new data. Although it has gained a high momentum in recent years, actually machine learning is almost as old as computer history. Data which are produced from computers or sensors are processed and derived some meaning from this data since the use of first computers. So why machine learning is so popular in recent years? Because, we have as much data as never before and we need to make sense of this data. Therefore, it is called as BIG DATA.

Big data is being generated by everything around us at all times. Every digital process and social media exchange produces it. Systems, sensors and mobile devices transmit it. Big data is often characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. Although big data doesn't equate to any specific volume of data, the term is often used to describe terabytes, petabytes and even exabytes of data captured over time. With the commencement of widespread use of IoT technology, the data to be processed will grow even larger in future.

It is impossible to analyze big data directly for humans. So people are developed some intelligence systems using machine learning to analyze big data more easily.  Big Data and Machine Learning are two component which are complementary each other. If we want to analyze Big Data, we have to use Machine Learning techniques, on the other hand if we want to create an intelligent system using machine learning we have to use large amount of data.

Deep Learning is one of the most trending topic in machine learning. Because, this technique allow to gain high accuracy rate for intelligent systems with the power of big data. Representative figure about artificial intelligence, machine learning and deep learning and chronological improvement of this concepts is given below. (Source of image: https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligence-machine-learning-deep-learning-ai/ ).

Machine Learning techniques are used wide range of application area in globe. So that every human use countless intelligent systems which are developed using machine learning techniques countless time in a single day. When using mobile phone, surfing on the internet, buying something on the internet, we are facing a lot of intelligent systems. Companies that develop technology have spent huge amount of money for developing more intelligent systems. Almost all machines will be intelligent in the future, because intelligent systems make life easier. And of course, people love applications that make life easier.

Gartner publishes emerging technology trends every year in hype cycle format. Hype cycle format is representative graph about trending topics. This format assumes that before a technology is used in worldwide, there are 5 steps to achieve. (1) Innovation trigger, (2) Peak of Inflated Expectations, (3) Through of disillusionment, (4) Slope of Enlightenment, (5) Plateau of Productivity. When a technology reaches to plateau, the technology is starting to use entire world. While reaching plateau takes a long time for some technologies, some technologies can reach to plateau quickly. Hype cycle also represents required time for reaching plateau. Gartner Hype Cycle for Emerging Technologies figure for 2016 is given below.


As shown in the figure, Machine Learning is in the Peak of Inflated Expectation step, and time to reach plateau will take about 2 to 5 years. When compared to other technologies this time is very small. Big companies such as google, facebook, apple are already spending huge amount of money for the improvement of artificial intelligence and machine learning. Most of them are using deep learning technique in some of their projects . Detailed examples are given in the next section.

With the widespread use of internet, there are huge amount of data flowing on the internet. To detect events which have malicious behavior is getting more difficult with increasing data flow. Like other application areas, cyber security domain need to strengthen their structure using machine learning technique.

In continuation of this chapter, it is given examples about machine learning application areas for the clear understanding. Then, it is given that brief introduction about deep learning and application examples of deep learning in globe. Finally, it is given that technical review about machine learning techniques in this section.

Application Areas in Daily Life


Deep Blue is one of the most important milestone in the AI history. Deep Blue was a chess-playing computer developed by IBM. It is known that the first computer chess-playing system wins chess match against a reigning world champion. Deep Blue won its first game against a world champion Garry Kasparov on February 10, 1996. However, Kasparov won three and drew two of the following five games, defeating Deep Blue by a score of 4–2. Deep Blue was then heavily upgraded, and played Kasparov again in May 1997. Deep Blue won game six, therefore winning the six-game rematch 3½–2½ and becoming the first computer system to defeat a reigning world champion in a match under standard chess tournament time controls. (Deep Blue has given right.)

Chess was thought to be a game of intelligence. Playing chess good is very hard task even for humans. Because of this, the first chess match winning by a computer against to world champion was talked about too much in those years.

How it could be possible? Let’s take a deeper look. The Shannon number is a conservative lower bound (not an estimate) of the game-tree complexity of chess of 10120, based on an average of about 103 possibilities for a pair of moves consisting of a move for White followed by one for Black, and a typical game lasting about 40 such pairs of moves. Shannon calculated it to demonstrate the impracticality of solving chess by brute force, in his 1950 paper "Programming a Computer for Playing Chess". 10120 number is very huge, as comparison total number of atoms in the universe are estimated between 1079- 1081. If a variation calculating takes 1 microsecond, calculating every variation takes 1090 years. For achieving this situation, computers need a lot of processor capacity.  Since those days technology did not allow it, therefore depth of tree which is calculated  is limited.

As we have explained before, AI focus on creating system which works similar to human brain. Due to some reasons, AI can be applied to some specific application area. Chess has been one of the successfully applied artificial intelligence area  to a field of practice.

There is some other mind game about challenge of artificial intelligence. Its name is GO. GO which was invented in china is an abstract strategy board game for two players, in which the aim is to surround more territory than the opponent. In the years when deep blue defeat Kasparov, some people are considered that humans can not be defeated by a computer in GO or it takes very very long time. Almost twenty years after 1996, AI is defeated humans in the GO. Let’s take a look at the numbers about GO.  Initially, GO table size is changeable, so people can play GO with table size 7x7, 9x9, 19x19 or 21x21. In our example we think we want to play GO with table size 19x19 against to computer. And assume that average move count is about 200 in the game of experts. (Because researches show so.) Average choice count for every move is about 250. Total number of variations that must be calculated by the computer is 3×10511, when table size is 19x19.  This number is much more than that of chess. (Do not forget total count of all atoms in the universe are between 1079- 1081.) In professional games, overall move counts can take 350. Total number of variations that must be calculated by the computer for this move count is 1.3×10895. You get the idea why this problem is so hard to solve. (If you want to analyze more number about this topic check this link).

Humans are defeated by artificial intelligence application with name ALPHA GO in GO game. AlphaGo is a computer program developed by Google DeepMind in London to play the board game Go. In October 2015, it became the first Computer Go program to beat a professional human Go player without handicaps on a full-sized 19×19 board. In March 2016, it beats Lee Sedol in a five-game match, the first time a computer Go program has beaten a 9-dan professional without handicaps. This is the one other most important milestone in AI history. Alpha Go was trained with deep learning. How this success can achieve using a program which has trained by deep learning is explained in the following sections.

The two examples which are explained above focus on defeat human in areas that require intelligence. But most of AI applications focus on support human instead of defeat them. This type of applications use machine learning technique to learn specific problem to support people. Every person in the world uses many applications which has been developed using machine learning in their daily life consciously or unconsciously. It is given that some examples of these type of applications below.

Recommendation Systems are one of the well known machine learning subject in the literature and business sector. Recommender Systems are software tools and techniques providing suggestions for items to be of use to a user. The suggestions which has provided are aimed at supporting users in various decision-making processes, such as what items to buy, what music to listen, what movie to watch  or what news to read. Recommender systems have proven to be valuable means for online users to cope with information overload and have become one of the most powerful and popular tools in electronic commerce. Correspondingly, various techniques for recommendation generation have been proposed and during the last decade, many of them have also been successfully deployed in commercial environments.


The interesting thing is that system calculate information about items and estimate approximately how much a user will vote on a item not seen before. Amazon recommends book or some other products that the user probably likes. Facebook shows advertisements, recommends friendship relations or some events that the user probably likes. Youtube recommends video to user and Spotify recommends music to user. There are countless examples on this subject.

Recommendation systems are used widespread across the globe. According to a report published by NetFlix in 2014, ⅔ of movies watched at NetFlix are watched as a result of recommendation. Recommendations generate 38% more click through for Google News. Similarly, 35% of amazon sales are made through suggestion systems. Youtube and some other firms are using recommendation systems strongly. Recently recommendation systems are developed using deep learning. Big companies such as Youtube and Facebook benefit from power of deep learning in large quantities.

Another well known machine learning application area is the activity recognition. The main purpose of this type of application is detecting which activity performed by user at certain time.  This process can be done on the mobile phone or some external devices such as smartwatch. Big mobile phone producers research on this topic heavily. Such big companies Apple and Samsung has mobile application for activity recognition which is one of the default application their phones. For the develop intelligence system for activity recognition, it is needed informations which is produced by sensors. Accelerometer, gyroscope and GPS sensors are most commonly used sensor in this area. It is used machine learning techniques to detect which activity performed by user. This type of applications can give us informations about burned calorie, how many kilometer walked or how healthy the user's daily life is.

Machine learning can also be used for prediction about future. For example, in weather forecasting applications current weather data and past data processed and gathering information about future weather conditions. Another example of prediction is atm cache optimization. The money which is located on atm is not useful for a bank when that money is not being used by customers.  In this situation money neither useful for customer nor bank. If it is developed an intelligent system to predict optimum money for atm weekly or monthly, banks can use that money for other purposes. In a recent study, banks can double the number of ATMs without changing the total amount of money in overall ATM’s using an intelligent system that estimates the optimum amount of money in ATMs. Some other example is house price prediction. In this type of problem, system try to predict actual value for house using information about house, house location, knowledge of nearby transportation vehicles or land value like informations. There are so many other examples of forecasting about future.
Image processing is one of the most frequently used field of machine learning techniques. In imaging science, image processing is processing of images using mathematical operations by using any form of signal processing for which the input is an image, a series of images, or a video, such as a photograph or video frame; the output of image processing may be either an image or a set of characteristics or parameters related to the image. Some examples about image processing using machine learning techniques are; face recognition, fingerprint recognition, moving object recognition, information retrieval from image or medical applications. Moving object recognition is widely used in the military purposes or traffic intensity detection like applications. There are a lot of study which has gained high accuracy rate using deep learning technique in this field. Machine learning can also be used for text based applications like language translate in real time, detect main idea about an article etc.

Another trending topic of machine learning area is Autonomous Car. An autonomous car (driverless car, self-driving car, robotic car) is a vehicle that is capable of sensing its environment and navigating without human input. Autonomous cars can detect surroundings using a variety of techniques such as radar, lidar, GPS, odometry, and computer vision. Google's self driving car is an autonomous car project. For creating autonomous car, the system must be equipped with a strong artificial intelligence. (Image source: https://waymo.com/ )
Project started in 2009 and completed in 2015. This project completed its first driverless ride on public roads. This project is testing in Austin Texas now. In December 2016, Google transitioned the project into a new company called Waymo, housed under Google’s parent company Alphabet. Alphabet describes Waymo as “a self-driving tech company with a mission to make it safe and easy for people and things to move around.” The new company plans to make self-driving cars available to the public in 2020 (image source McKinsey & Company). Google’s self drive car designed for autonomous driving, so this car has no pedal or steering wheels in it. All processes are doing with sensor input.
There is no directly input from human. These inputs are processing by machine learning techniques. Google is not only autonomous car producer in sector. Many of the big companies in the automobile industry are doing research on driverless cars.

What is Deep Learning?

Deep learning (also known as deep structured learning, hierarchical learning or deep machine learning) is a branch of machine learning based on a set of algorithms that attempt to model high level abstractions in data. Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.
It was developed following the early Perceptron learning algorithm, which was limited in its ability to understand the ambiguity of “or” within natural language. To resolve this problem several layers of learning algorithms needed to be developed. There may a lot layers in deep learning according to problem complexity. And in this algorithm, we can use large amount data to train system. Processing large amount of data and having a large number of neurons-layers require high processor capacity. CPUs are inadequate for this job now. The system which want to run deep learning need much more CPU power. This is where the GPUs came into play. GPU-accelerated computing is the use of a graphics processing unit (GPU) together with a CPU to accelerate deep learning, analytics, and engineering applications. GPUs play a huge role in accelerating applications in platforms ranging from artificial intelligence to cars, drones, and robots. (See more at: http://www.nvidia.com/object/what-is-gpu-computing.html).
A simple way to understand the difference between a GPU and a CPU is to compare how they process tasks. A CPU consists of a few cores optimized for sequential serial processing while a GPU has a massively parallel architecture consisting of thousands of smaller, more efficient cores designed for handling multiple tasks simultaneously.
The core of deep learning is that we now have fast enough computers and enough data to actually train large neural networks. That as we construct larger neural networks and train them with more and more data, their performance continues to increase. This is generally different to other machine learning techniques that reach a plateau in performance. This is the key point why deep learning has became so trending topic today. Representative figure is given below. ( Source of image: Andrew Ng )
Deep learning methods aim at learning feature hierarchies with features from higher levels of the hierarchy formed by the composition of lower level features. Automatically learning features at different levels of deliberation permit a system to learn complex functions mapping the input to the output directly from data, without depending completely on human-crafted features. An example of working mechanism of deep learning is  given below. (Source of image: http://fortune.com/ai-artificial-intelligence-deep-machine-learning/ ).

There are a lot of companies which are already starting to use deep learning. Explanations for the most famous ones are given continuation of this section. (Source of four texh giants get serious about deep learning: http://fortune.com/ai-artificial-intelligence-deep-machine-learning/ ).
Startup Deep Genomics, which is backed by Bloomberg Beta and True Ventures among others, has fed deep learning machines tons of existing cellular information in order to teach machines to predict outcomes from alterations to the genome, whether naturally occurring or through medical treatment. The technology could provide the most precise understanding of an individual’s specific disease or abnorm ality and how that person’s well being can best be advanced.
A more devices become internet-enabled, hackers have an increasing number of entry points to infiltrate systems and cloud infrastructure. The best cybersecurity practices not only create more secure systems but can predict where the next attack will come from. This is critical since hackers are always on the hunt for the next vulnerable endpoint, so protecting against cyber attack requires “thinking” like a hacker. Companies like Israel-based and Blumberg Capital-based Deep Instinct aim to use deep learning in order to recognize new threats that have never been detected before and thus keep organizations one step ahead of cyber criminals.
There are already plenty of cars on the road with driver-assistance capabilities, but these cars still rely on users to take over when an unforeseen event occurs that the car isn’t programmed to respond to. As Sameep Tandon of startup Drive.ai notes, the challenge with self-driving cars is handling the “edge cases,” such as weather. This is why, using deep learning, Drive.ai plans to help the car build up experience through simulations of many kinds of driving conditions. Nvidia is also working on self-driving car technology. Nvidia says it has used deep learning to train a car to drive on marked and unmarked roads and along the highway in various weather conditions, without the need to program every possible “if, then, else” statement. In this sector, Google and Many of the big companies in the automobile industry are doing research on driverless cars.
Since deep learning has already seen widespread experimentation and refinement for textual analysis, it’s no surprise that Google, the leader in search, has made widespread deep learning-based updates to its search technology. Google’s deep learning-based RankBrain technology was added to how Google manages and fills search queries back in 2015. The technology helps handle queries that have not been seen before.
So Apple moved Siri voice recognition to a neural-net based system for US users on that late July day (it went worldwide on August 15, 2014.) Some of the previous techniques remained operational but now the system leverages machine learning techniques, including types of deep learning. When users made the upgrade, Siri still looked the same, but now it was supercharged with deep learning.
(Some of examples are taken from this link. If you want read more, you may check this link also.)

Technical Review

In the continuation of this documentation we've explained the subjects of cyber security which are made more powerful with machine learning. Briefly these subjects are spam filters, IDS/IPS systems, false alarm rate reduction, fraud detection, cyber security rating, incident forecasting and secure user authentication systems. Finally there is one main title about bypassing security mechanism which is developed for offensive purposes.
Before starting to explain the subjects of cyber security, we want to give a brief introduction for technical background about machine learning whereby one can understand following topics more easily. As the beginning, let's take a quick look at the General Structure of Machine Learning in figure below.(Source of image: http://www.isaziconsulting.co.za/machinelearning.html)


Machine learning problems can be divided into three main categories according to the characteristics of the problem. These are supervised learning, unsupervised learning and reinforcement learning. Supervised learning techniques divided into two subcategories as Classification and Regression. In classification problem, we have completely divided classes and main work is defining test sample to find the class which  actually belongs to. When our dataset classes are not separate, so it means we have continuous data, this type of problems are called regression problems.
Unsupervised learning techniques divided into two subcategories as Clustering and Dimensionality Reduction. Clustering problem basically cluster samples according to similarities of these samples regardless of class information. Another unsupervised learning techniques is recommendation systems. Recommendation Systems are used to recommend something for the users. It can be a movie, music or something which is sold in the market place.
Basically, there is one main difference between supervised and unsupervised approaches. Supervised learning techniques use labeled events in order to use in training phase. Unlikely unsupervised techniques using unlabeled events for training step. Generally, quality of the dataset which is used in training phase is one of the most important thing for high accuracy rate. When our model is completely finished, test samples will be produced in real time data.
Reinforcement learning is an area of machine learning concerned with how software agents ought to take actions in an environment so as to maximize some notion of cumulative reward. Generally this type of techniques are used in robotic application areas.
For the use of machine learning techniques we must implement two phase. First of these is training phase, the latter is test phase. In training phase, system learns a model with the algorithm which is used. This model defines the solution of the problem which we want to solve. And in the test phase, the model we use in the first step is tested. So we can analyze how successful our model is.



Screen Shot 2016-12-06 at 14.18.08.png
Finally there is one more thing we want to explain about learning. Learning can be done at once (batch learning) or can be done continuously (incremental learning).
Which one you use is totally up to definition of the problem. Incremental learning can be considered as a version of batch learning which is updated timely. In figure on the side, you can see classification of the subjects of the cyber security domain with machine learning. It's highly recommended to take a look at this figure,  before you start reading topics below.