You can get security data under GDPR

Artificial intelligence is the topic of so many conversations in the cybersecurity industry these days, and rightly so. As we gather more intelligence on cyber threats, we can use AI to do ever smarter things with that data to prevent attacks. The potential of AI will grow as we collect more data, but businesses need to keep in mind several key issues when implementing AI for cybersecurity.

Companies have never generated so much data as they do today. They are gathering ever more security data, logs, events and “artifacts” – threat data that may indicate an attack. But gathering 1000 security events in a day or a week isn’t enough.

If you want to use machine learning, data analytics or AI to identify security threats you need to gather lots of data, but critically, it must be the right kinds of data.

Machine learning – which rapidly finds patterns in data – adds value when it crunches massive data sets and can spot the needle in the haystack. The more data you feed in, the more accurate and effective it becomes.

But If you ask companies how much security data they have, they’ll often say not much, as they discard it after a few weeks or only keep it as metadata. They invariably cite concerns over GDPR and data protection.

Organizations are understandably nervous that some of the security data may be personally identifiable information so they could fall foul of strict rules in GDPR limiting how long personal data can be stored.

However, the legislation makes provision for cybersecurity. Data can be kept for “no longer than you need it,” according to the UK’s Information Commissioner, so if it is needed for security analysis, it can be discarded once that analysis has taken place.

And a clause in GDPR’s Article 5 says data may be kept for research or statistical purposes. The key point is data may be legally stored for months for analysis by AI algorithms if it is for the purposes of cybersecurity.

This then raises the question of how organizations are physically able to store such huge amounts of data and where they will get the computational horse power to carry out rigorous machine learning analysis of the data.

All of this becomes possible today for one reason. The cloud. Businesses can store the petabytes of data that need analysing in remote locations run by cloud providers whether Amazon Web Services, Microsoft Azure or any other. That way, there is no need to use up valuable storage space in an organization’s own data centers.

It is vital to make sure that all sorts of data – the rights sorts of data – are stored for cybersecurity analysis. That means capturing not just the bad and suspicious data. Machine learning relies on comparing the good with the bad, so data on legitimate web traffic is needed as well.

The data must be cross referenceable. It could be cloud security data or what’s known as user behaviour analytics (UBA). This analyses the behaviour of systems and the people using them to identify potential cybersecurity threats. It could require gathering data from security information and event management (SIEM) software and other tools, then analysing it for anomalies and potential threats.

An organization can also access immense computational power through the cloud, which runs huge stacks of CPUs – the brains of computing. Again, many businesses are nervous about storing security data in the cloud precisely for security reasons. They worry that the cloud is far from safe, though it is no more prone to attacks than other parts of a network. Cloud infrastructure provides the scale for businesses to store data and apply machine learning to identify threats. This opens the way for vast improvements in cybersecurity.

So, what’s the takeaway here? Cybersecurity has relied on humans to identify correlations between events occurring together and identifying potential threats. There is so much discussion of the power of AI and machine learning to do smarter cross-checking of data much faster and more effectively than what a human could do. Yet to achieve this you must have the right data for AI to be effective, and the computational capacity to process it in a timely fashion.

Of course, the cybercriminals are themselves looking at how to use AI to sharpen their attacks. The future of cybersecurity will be as much about machines fighting machines as conflict between humans.

The article is written and contributed by Greg Day, vice president and chief security officer for EMEA at Palo Alto Networks.