Jan 16, 2017

Machine Learning in Cyber Security Domain - 2: Cyber Security Rating and Incident Forecasting

Before starting to explain how rating and forecasting mechanism works and which machine learning algorithms can be used in it, we want to give you brief introduction about why we need cyber security rating, where it can be used in real world, and how these informations can be useful for companies.

Purpose of the Cyber Security Rating

Annual cost of cyber security breaches is nearly $500 billion and average cost for large companies is $3 million in “The Centre for Strategic and International Studies” report published in 2014. Because of that, Cyber Security infrastructure has vital importance for all companies, especially which stored valuable information in virtual environment and internet. The purpose of improve cyber security infrastructure, companies have been spent huge amount of money. But company's CEO or IT directors can not evaluate effectively how much improvement achieved into cyber security infrastructure for their expenditure.  This is called return of investment in business terminology. For achieving this negative situation, it must be calculated for evaluating metrics to be unsterdand how strongly build cyber security infrastructure. Cyber Security Ratings stands right in middle of this calculation. With this score, everyone can understand how good designed a company’s cyber security infrastructure easily. So, understanding return of investment is the first main purpose of cyber security rating mechanism.

This score must be addressed global scale. So, companies can have evaluate own infrastructure in comparison with other companies in the same sector or can have quite understanding of general situation in the globe. Comparable cyber security infrastructure with others is the second main purpose of Cyber Security Ratings.
Manager like CEO’s are getting some tactical decision at certain times for the future of the company. Risk factor has vital degree importance for the getting these tactical decisions. Cyber security ratings support the manager to making tactical decision for the future. Managers can also determine policies over their vendor using that vendor's cyber security ratings. So, supporting to making tactical decision is the third main purpose of Cyber Security Ratings.
Cyber Security Ratings can also be used for Vendor Risk Management (VRM).  Large companies working with so many 3rd party suppliers (also known as vendor), and shared some valuable information of the company with these vendors. In principle of “YOU ARE STRONG AS YOUR WEAKEST POINT”, the incidents which may cause to vendors can affect to other firms so easily. For the really strong infrastructure, companies must be working with companies which have strong cyber security infrastructure. In addition, large companies can see dynamic changes in ratings of their vendors and can take some precaution if vendors have slow ratings. So, analyse the risk and support to Vendor Risk Management is the fourth main purpose of the Cyber Security Rating.
Insurer firms insures companies with several purposes. Recently, managers insure their company for the risk of cyber security breaches. Insurers must have the information about client’s security infrastructure before the insuring that company. Previously, insurers asked questions for the informations about security infrastructure to client. This question list too long and hard to answer quickly and effectively for the healthy feedbacks. Insurers also want to take Penetration Test Report about the clients. But all of these information isn’t continuous, so it can be change timely. It is very useful cyber security rating which is created by continuous data for this problem. Finally, supporting cyber insurers to see dynamic changes in security infrastructures about firms is the fifth main purpose of the cyber security rating.
In summary, all the main purpose of the cyber security ratings are listed below.
  1. Understanding return of investment
  2. Creating comparable cyber security mechanism
  3. Support to making tactical decision
  4. Analyse risks and support VRM
  5. Support Cyber Insurers to see dynamic changes

Calculating Cyber Security Rating

For the purpose of creation of a global standard about cyber security rating. The informations which will be used in rate calculation must be collected online and passively, in other words data must be collected with not creating directly connection to target systems.  For rating systems must work continuously, continuous active scan is able to cause to exhaust the system which is targeted or able to crush the systems.  Beside these, active scan can only be done in permissive  situations, otherwise it’s illegal and this means crime. Because of this requirements, data is collecting on the internet using public database, reputation sites, blacklists, and some sources like this. More useful information of data is given in table below. (Note that the reference of this table is Bitsight Tech.)

Cyber security infrastructure is dependent so much criteria, because of this, it is more reasonable that having more than one score, in place of having a single score. Scoring mechanism can be divided into subcategories. Every scores of subcategories are calculated using totally different criterias. Examples of these subcategories are DNS Health, SSL Strength, Asset Reputation, Leaked Email, SMTP Controls, Hacktivist Shares, etc… When calculation scores of subcategories, it is used data which are associated with that category. After calculating scores of every subcategory, it is calculated overall score according to subcategories’ importance.

Cyber Incident Forecasting

For the calculate Cyber Security Rate, it is needed that implement an algorithm. When the algorithm works, process this information and retrieve some knowledge like rating score or forecasting incidents. In this chapter we will give you general information about Forecasting Cyber Security Incidents”. If you want to read more detailed information about this topic, we recommend you to take a close look to this paper. In this part we have benefited greatly from this work.
Predict an incident before it occurs is a very useful innovation for preventing cost which is caused by incidents. In real world applications, this type of predictions can save money or human life. For example, predict an earthquake before it occurs can save time to people for getting some precautions. Thus, the deaths due to the earthquake can be greatly reduced. Another example, in old days, a canary went down to work with coal miners. An allusion to caged canaries (birds) that miners would carry down into the mine tunnels with them. If dangerous gases such as carbon monoxide collected in the mine, the gases would kill the canary before killing the miners, thus providing a warning to exit the tunnels immediately. Similarly, in cyber security domain forecasting mechanism can save money, reputation or valuable informations such as source code of important application or some product’s chemical formula etc.
For we can predict an hacking incident before it occurs using machine learning algorithms, we need the dataset which include incident reports and externally observable features about the firms in training phase.  
In the referenced paper, it is defined two main category for defining security posture about companies. First of these is Mismanagement Symptoms and the latter is Malicious Activity Data. Mismanagement Symptoms has five features and every one of them shows misconfiguration settings on a network. These features does not give directly information about the whether system is vulnerable or not. But there is correlation between these features and hacking incidents. The features defined as (1) Open Recursive Resolver, (2) DNS Source Port Randomization, (3) BGP Misconfiguration, (4) Untrusted HTTPS Certificates, (5) Open SMTP Mail Relays.
Malicious Activity Data separated three types: (1) Spam Activities, (2) Phishing and Malware Activities, (3) Scanning Activities. This malicious activity data collected time based and collected recent 14 days and recent 60 days. One finally dataset used in the paper, this dataset has incident reports from three different resources; (1) VERIS Community Database, (2)Hackmageddon, (3)Web Hacking Incident Reports. This dataset used for labeling the security posture data about companies.
The informations about mismanagement symptoms and malicious activity data mapped to companies which has information on the public dataset which used in this paper. In this way training dataset has been created with externally observed data which is added label informations such as ‘hacked’ or ‘not hacked’  according to incident reports.
All dataset which is used in this paper are given below.
Screen Shot 2016-12-14 at 17.51.47.png
Finally the dataset which is created by combining security posture data and incident reports separated into two part. One part is used in training phase, and the other part is used in test phase. Random Forest and Support Vector Machine Algorithms was implemented.  As a result  %90 True Positive rate, %10 false positive rate and %90 overall accuracy were gained with Random Forest Algorithm. (If you want to know detailed information about how they achieve this success rate, we recommend you to read the paper which we gave the link above.)
There is one more very important thing which was not take into consideration into referenced academic paper above for incident forecasting mechanism. That is POPULARITY of COMPANIES. Generally, people think as if a company hacked, the reason is weak cyber security infrastructure. In fact, most of time this is wrong. For example, researches show that financial companies has stronger cyber security infrastructure than the companies which are worked in other sectors ( health, education etc. ).  But financial companies have faced much more hacking incidents than others business sector’s companies..
Let me explain why it is so.
Attackers need a motivation to hack companies. The motivations can be getting attention, steal valuable information or hacktivist reasons etc.
Now, consider a company with very very weak cyber security infrastructure and has lowest score for almost all cyber security subrating categories, but this company very small, not known large amount of people and has no valuable information which is not worth the money in its network. In real world, this company may live long without suffer for hacking situations, although it can be hack easily. The reason is that there is no motivation to hack for hackers.
But the other side, consider a very large business company with very strong cyber security infrastructure and this company has highest  score almost all cyber security rating subcategories. This company may be suffered to hacking situation in short time. Because hackers have very good motivation to hack this company.
Although companies have spent huge amount of money to strengthen their companies, nevertheless they are hacked. Bank companies, governments, largest tech companies in the world (apple, adobe, linkedin, yahoo...) have been targeted by hackers until today. The reason is strong motivation for hacker due to popularity of companies. Because of this, popularity is very important feature for cyber incident forecasting, this information must be added in forecasting mechanism.
How can we define popularity? Actually there are so many data available for defining popularity such as count of employee, sector, annual income, company value on the stock market, number of customer, value of stored data on company’s database, location (country, state...), count of company name passed in daily news etc.
We want to give you some examples about this situation.
  1. 2014 JPMorgan Chase data breach was a cyber-attack against American bank JPMorgan Chase that is believed to have compromised data associated with over 83 million accounts – 76 million households (approximately two out of three households in the country) and 7 million small businesses. The data breach is considered one of the most serious intrusions into an American corporation's information system and one of the largest data breaches in history. (This paragraph copied on wikipedia.)
  2. Dropbox Data Breach: A huge cache of personal data from Dropbox that contains the usernames and passwords of nearly 70 million account holders has been discovered online. The information, believed to have been stolen in a hack that occurred several years ago, includes the passwords and email addresses of 68.7 million users of the cloud storage service. (This paragraph copied on this link.)
  3. Yahoo Says 1 Billion User Accounts Were Hacked:  Yahoo, already reeling from its September disclosure that 500 million user accounts had been hacked in 2014, disclosed Wednesday that a different attack in 2013 compromised more than 1 billion accounts. The two attacks are the largest known security breaches of one company’s computer network. The newly disclosed 2013 attack involved sensitive user information, including names, telephone numbers, dates of birth, encrypted passwords and unencrypted security questions that could be used to reset a password. (This paragraph copied on this link.)
(Click this link, if you want to read more information about hacked companies which are very famous.)

The Difference Between Rating and Forecasting

Actually, we can not explain the similarity or dissimilarity of rating and forecasting, because this two topic is not in the same category. Two topic are complementary each other. The main purpose of rating mechanism is evaluating cyber security infrastructure with some metrics according to some data which are collected passively from the internet. On the other hand the main purpose of forecasting cyber incident detect hacking incidents before it occurs. Forecasting mechanism must use passively collected informations to define cyber security infrastructure. So, rating mechanism can be considered as a step in forecasting mechanism. With the other words forecasting mechanisms must evaluate some metrics to determine how cyber security infrastructure works strongly before predict incidents.
In the paper which is explained above in this chapter, this evaluation is doing rating in machine learning algorithm, so we can not see the evaluated rating, because algorithm jump the solution by learning how strongly build infrastructure in it. After algorithm works we can evaluate features with their importance looking at how its effect for cyber incidents. So, it is understood that rating mechanism works dispersed in algorithm. This is a approach using in academic literature.
However, rating score can give us so many valuable information about cyber security infrastructure, so there is one other approach developed. With this approach rating mechanism splitted to different layer from machine learning algorithm.
Score which is calculated using rating mechanism is an input value for machine learning algorithms in this type of approach. Representative figures have given below.
Screen Shot 2016-12-14 at 23.02.40.png

Cyber Security Rating can give us valuable information about cyber security infrastructure even if it is not used in forecasting mechanism for we can understand how strong our infrastructure against to cyber threats.