top of page
Contact Us

Do You Have Enough Data For Machine Learning?

The fear of not having enough data can stall an enterprise's digital strategy.

When you think you do not have much data, you stop to look at potential possibilities with existing data. However, it becomes a singular focus to collect additional data. You invest in making changes to your product to bring in sensors or vendors who coach you on how to collect additional data. Doing this without exploring what value you can bring in with existing data is equal to diversifying your portfolio without knowing your current asset allocation.

Enough Data For Machine Learning
Enough Data For Machine Learning

So how much data is needed? Professor Yaser Abu-Mostafa from Caltech answered this question in his online course. The answer is, as a rule of thumb, you need roughly 10 times as many examples as there are degrees of freedom in your model. The more complex the model, the more you are prone to overfitting, but that can be avoided by validation. Much fewer data can be used based on the use case.

At Ascendo AI, we provide a field service SaaS application for manufacturers to do service planning. In one of our use cases, a manufacturer had thousands of devices but only a few field service technicians. We provided AI-based automation to help the manufacturer reduce dispatch of field service technicians. Essentially, this allowed the company to use existing data to optimize for the given number of technicians.

The automation steps to reduce dispatch include:

  • Automatically assigning a service rep.

  • Providing the service rep with a potential solution for the problem.

  • Predicting issues before they happen to remotely fix them.

But when it does require a field service technician to go in and fix the problem, they can only handle a few incidents every day. In this case, it is critical to predict high-priority incidents that need the most attention versus being 100% accurate in your predictions across all incidents.

This way of thinking reduces the need to have every piece of data possible to even start the digitization journey. Data based on internal surveys that we conducted has shown that enterprises only use 1% of data collected, while 33% of the data is actually usable. And according to Forbes contributor Bernard Marr, "On average, companies use only a fraction of the data they collect and store." It is critical to work with software that can extract value from data you already have.

Whether you build (in-house) or buy (vendor), your software should specialize in identifying gaps in your data. To do so, your software would specialize in:

  • How to prepare data and partition it for training and testing.

  • What algorithms and heuristics to use with it.

  • How to call out relevant patterns.

  • Actions that can be performed from a reliable system in production.

  • Identifying inefficiencies in the data so you can add processes to clean them.

  • Showcasing the most critical area for new data to be collected.

The need for data depends on the variety of data we need. For example, to predict a device failure, you would need data based on the normal status of the device as well as data when it generates anomalies. The higher the variety, the fewer data (in terms of time and volume) is required.

Beginning a journey of whether you have enough data will show inconsistencies that you likely never realized, show holes in your business processes that you thought were perfect, deliver cost savings on what you thought was already optimized and, hopefully, generate additional revenue from where you thought the pie could not possibly be any bigger.

Learn more,


bottom of page