Big data analytics is the large-scale analysis and processing of information. It has been applied in many areas and in recent years it has garnered attention from the security field for its potential power to weigh up, and connect security-critical data smartly at an extraordinary level. Making a distinction between ordinary data analysis and big data analytics for security is clear-cut. Information security is using analysis of networks traffic, system logs and information sources to know threats and detect malicious activity.
Data-driven information security dates back to bank fraud detection and anomaly-based intrusion detection systems (IDSs). Although analyzing logs, network flows, and system events for forensics and intrusion detection have been a problem in the information security community for decades, conventional technologies aren’t always adequate to support long-term, large-scale analytics for several reasons: first, retaining large quantities of data wasn’t economically feasible before. As a result, in traditional infrastructures, most event logs and other recorded computer activities were deleted after a fixed retention period (for instance, 60 days). Second, performing analytics and complex queries on large, unstructured datasets with incomplete and noisy features was inefficient.
For example, several popular security information and event management (SIEM) tools weren’t designed to analyze and manage unstructured data and were rigidly bound to predefined schemas. However, new big data applications are starting to become part of security management software because they can help clean, prepare, and query data in heterogeneous, incomplete, and noisy formats efficiently. Finally, the management of large data warehouses has traditionally been expensive, and their deployment usually requires strong business cases. The Hadoop framework and other Big Data Management tools are now commoditizing the deployment of large-scale, reliable clusters and therefore are enabling new opportunities to process and analyze data.
Fraud detection is one of the most visible uses for big data analytics: credit card and phone companies have conducted large-scale fraud detection for decades; however, the custom-built infrastructure necessary to mine
Big Data for fraud detection wasn’t economical enough to have wide-scale adoption. One of the main impacts of big data technologies is that they’re facilitating a wide variety of industries to build affordable infrastructures for security monitoring.
In particular, new big data technologies—such as the Hadoop ecosystem (including Pig, Hive, Mahout, and RHadoop), stream mining, complex-event processing, and NoSQL databases—are enabling the analysis of large-scale, heterogeneous datasets at unprecedented scales and speeds. These technologies are transforming security analytics by facilitating the storage, maintenance, and analysis of security information. For instance, the WINE platform1 and Bot-Cloud2 allow the use of MapReduce to efficiently process data for security analysis. We can identify some of these trends by looking at how reactive security tools have changed in the past decade. When the market for IDS sensors grew, network monitoring sensors and logging tools were deployed in enterprise networks; however, managing the alerts from these diverse data sources became a challenging task. As a result, security vendors started the development of SIEMs, which aimed to aggregate and correlate alarms and other network statistics and present all this information through a dashboard to security analysts.
Big Data Tactics Broaden Vital Information Accessible to Security Experts; this is possible by data correlation, consolidation and contextualization of assorted data sources for long instances of time.