Data and AnalyticsIT & Software Development

Three Factors That Weaken Data Classification—and Lead to Data Breaches

Zach Capers profile picture
By Zach Capers

Published
6 min read
Header image for the blog article "Three Factors That Weaken Data Classification—and Lead to Data Breaches"

Changing your approach to data management and classification might prevent your next breach.

Data classification is an essential strategy business leaders must leverage to protect their company. But Capterra's 2023 Data Management Survey* finds that two in three (67%) businesses with a data classification program report a data breach within the last two years, and a quarter of those same respondents report multiple breaches.

With this in mind, we set out to determine what the businesses that did not experience a data breach are doing differently. What we discovered is that they're doing more to mitigate specific human errors and behaviors that weaken data classification programs. Let's dive in.

/ Key survey findings

  • Most companies use three (27%) or four (41%) data classification levels.

  • Companies that use four levels report data breaches at a higher rate than those using three levels.

  • Manual data classification methods are associated with more data breaches compared to automated methods.

  • Security-focused companies fare better than those motivated by regulatory compliance.

  • By far, the most common type of data breaches are databases or other data sources being left unsecured (48%), followed by malicious outsider attacks (38%).

What is data classification?

Data classification is the process of categorizing different types of data according to sensitivity or type. Companies use data classification to improve data security, support data governance, and maintain regulatory compliance.

Data classification schemes begin with three fundamental themes

Selecting the number and type of classifications for your businesses depends on its specific needs and requirements. To begin, there are three fundamental classifications: public, internal, and confidential.

Public data: This data poses no risk to the company and is freely available to the public. Examples include public website information, white papers, and marketing collateral.

Internal data: This data is specifically intended for internal use and comprises everything from presentation decks to general internal emails. Some internal data is considered confidential, which we explain next.

Confidential data: This data requires additional protection due to its potential to harm the company legally, financially, or competitively. This data is commonly labeled as confidential or restricted, though various labels abound. Examples of confidential data include personally identifiable information (PII) and financial documents.

Many organizations opt to add a fourth level of classification for highly confidential information while others use five, six, or even more to maintain granularity for specific needs (e.g., protecting trade secrets) or to comply with industry, regulatory, or contractual requirements. Our data shows that most companies use three (27%) or four (41%) levels. Another 18% use five levels while 9% use six or more—5% use only two, likely sticking to public and internal.

Less is more—and three is better than four—when selecting data classification levels

As we explained, there are three fundamental types of company data. One of those types, confidential data, can be further refined with more and more fine-grained sensitivity. But this can cause problems because the difference between labels such as sensitive, restricted, and confidential is not readily apparent, requiring nuanced explanations and understanding.

Our data shows that companies using four data classification levels are significantly more likely to report data breaches than those using three levels.

Graphic showing companies which use four data classification levels experience more data breaches

Three in four (75%) companies using four classification levels report a data breach in the last two years compared to only 61% of companies that use three levels—a meaningful difference that could be attributed to confusion created by adding a fourth level.

Interestingly, data breaches drop again at five levels or more. As mentioned above, companies that use five or more levels often do so for specific reasons and are likely to have more mature data management programs. Our data shows that these companies tend to be larger in size and have more years in business than companies with three or four classification labels.

What you should do: Most companies should stick to three levels of classification, only moving to four or more if absolutely necessary or specifically required. If you do decide to add four or more, be sure to use unambiguous labels and provide clear explanations for how they are distinct from one another.

Data classification by hand causes data breaches to expand

Before you can classify information, you need to find and identify it using data discovery methods. Then, once you've identified your data, you need to label it. These steps can be performed manually or with the use of software to help automate the process. Our research finds that companies using manual methods are much more likely to experience one or more data breaches.

Graphic representing data breaches are reported far more by companies using manual data classification.

A full 86% of companies using mostly or fully manual methods to identify and tag data report data breaches compared to only 55% of companies using mostly or fully automated methods. Put another way, companies using mostly automated methods are more than three times as likely to avoid a data breach compared to those using mostly manual processes.

As for companies using an even mix of manual and automated methods, they fall somewhere in between, indicating a clear trend. Our data shows a strong correlation between manual methods and increased data breach incidents, likely stemming from human error during the process. 

What you should do: Use data discovery and data management software to help identify and classify structured and unstructured data throughout your network—and reduce human error in the process.

Motivation matters—security concentration more effective than regulatory fixation

As mentioned, a prime benefit of data classification is streamlining compliance with industry and governmental regulations. The ability to label specific types of data makes it easier, for example, to identify customer credit card records for PCI compliance. But regulatory compliance is often something you have to do rather than something you’re motivated to do.

We asked companies what they view as the primary benefit of data classification. The top answer is to strengthen data security (42%) followed by improving regulatory compliance (34%).

Taking a look at these two groups, 80% of the companies that are motivated by compliance report a data breach in the last two years compared to only 62% of those motivated by security. More dramatically, 25% of the compliance-focused group report multiple breaches compared to only 14% of our security-driven group. Those motivated by security rather than compliance are about twice as likely (39% vs. 20%) to avoid data breaches altogether.

Graphic showing companies that say security is the top data classification benefit report fewer breaches

These findings suggest the motivation behind your data classification program matters. Complying with regulations tends to devolve into box checking, whereas security concerns are more tangible, visceral, and immediate.

What you should do: Design and implement your data classification program through a security lens first and foremost to foster a sense of urgency among staff.

Most data breaches result from human factors—not malicious actors

As shown by our research, data breaches tend to be associated with human error. In fact, of the companies that reported data breaches in the last two years, the most common type of breach was a database or other data source left unsecured (48%) while only 38% report that a hacker or other outsider had maliciously accessed data.

Data classification helps your company ensure that sensitive data is secure and improves access controls that minimize the impacts of malicious actors. But to minimize human error and ensure your efforts are effective, we recommend the following:

  • Stick to three levels of classification, if at all possible

  • Use software to help identify, manage, and classify data

  • Design and promote your data classification program through a security lens

Need help identifying and organizing your data? Check out our catalog of data discovery software and review our shortlist of data management tools.


Methodology

*Capterra's 2023 Data Management Survey was conducted in July 2023 among 298 respondents to learn more about data classification practices at U.S. businesses. All respondents were screened for leadership positions in IT with close involvement in data management and strategies at companies that use a data classification system.


Was this article helpful?


About the Author

Zach Capers profile picture

Zach Capers is a senior analyst at Capterra, covering IT security, data privacy, and emerging technology trends. A former internal investigator for a Fortune 50 company and researcher for the Association of Certified Fraud Examiners (ACFE), his work has been featured in publications such as Forbes, Business Insider, and Journal of Accountancy.

visitor tracking pixel