Skip to content

28 September, 2020

Eliminating Excel from business critical processes

image942

Article By

Rógvi Dávid Arge, Senior Data Scientist, Flowtale
Rasmus Hanberg Thomsen, Partner, Flowtale


It is safe to say that Microsoft Excel is an immensely popular application. It is part of Microsoft’s Office, which according to windowscentral has around 1.2 billion users worldwide, and is used in everything from project planning, data storage, cash-flow forecasting, and much more. In this article we’re focusing on what happens when Excel sheets evolve into something the application was never intended to be used for, a general solution approach, and how we solved it for a global engineering consultancy.

Introduction to the Problem

Fig 1: Excel applications most often end up being either a bottleneck, a source of data quality issues or both

Within companies, Excel based applications often start as simple tools developed over time by an employee to simplify their own work, or to standardize some process. If the tool is a success, depending on its initial use-case the company either becomes dependent on it as an integral part of their infrastructure, reporting pipeline, or as a part of company specific processes.

Fig 2: (LEFT) Excel applications often start out as simple stages in more complex workflows, which in the early stages enable an employee to do a complicated process using only their Excel sheet. (RIGHT) As the application gains traction it evolves into a monstrosity that Excel isn’t built to handle, which introduces bottlenecks, data quality issues, concurrency issues, persistence issues, and more

This poses a number of problems ranging from data processing bottlenecks, manually maintained reporting pipelines, to multi-user application maintenance. The paradox here is that as user adoption increases or as the application becomes a more integral part of the business processes, the bottleneck issues and maintenance costs become larger than the gains of using it. Furthermore, in many cases the problem is unavoidable, because the capital investments and development time needed for creating robust solutions that solve the same issue are greater than the perceived value. If this sounds familiar to you, follow along and in the next section we’ll describe a solution approach and how we used it to solve this problem for a client.

A General Solution Approach and How we Implemented It

Before we get into the solution approach, we’ll look into the types of issues that relate to Excel applications. We start by splitting the issues into application independent or dependent ones.

Application independent issues are key issues related to all Excel based applications and relate to:

  • distribution
  • versioning
  • data consistency
  • security
  • persistence

Application dependent issues can be very different depending on the type of application. If we take a multi-user project planning tool as an example, the application specific issues might be data storage, data collection, and concurrency. If on the other hand we’re looking at a data collection sheet for reporting or quota calculations, then the issues might be related to data quality, concurrency, and dependence on manual inputs.

What we can gather from these issue topics is that Excel applications in general have issues related to:

  • versioning
  • distribution
  • data consistency
  • security
  • data persistence
  • code quality
  • data quality
  • concurrency
  • security
  • accessibility
Fig 3: Excel applications often have issues related to these topics

Many organizations have experienced these issues first hand, but in our experience most of these problems can be solved using best practices within software development and software architecture.

Without getting into details about software architecture paradigms or software best-practices, we should mention that we create solutions with DevOps in mind by developing component based applications using the layered architecture approach, always using version control systems, ensuring code reviews and testing, building CI/CD pipelines to ensure standardized and tested code, and set up monitoring of the solutions to ensure that issues are caught and handled quickly. This handles a lot of the issues pertaining to modularity, data flow, versioning, code quality, distribution, and application performance.

Fig 4: DevOps development and operations cycle

General Solution Approach

Let’s say that your business is dependent on a project planning tool that is used by 25 or more people. The tool is being used to continuously update the status of various projects, assign tasks, and update metadata related to the project. To understand how this application can be decomposed we can break it into its constituent parts, and find other alternatives. We’ll do that by looking at the application as-is and figuring out how it and other alternatives fit into the layered architecture.

Fig 5: Layered architecture overview and dependency direction

Presentation Layer

The presentation layer is the visual interface that the users see. When dealing with Excel applications, Excel worksheets are most often the interfaces as well. There are cases when the Excel application has been modified to such a degree that custom made VBA userforms are in fact the user interface, but given that this is still Excel, our decomposition of the application is still relevant. So for the presentation layer we have:

  • As-Is: Excel
  • To-Be:
    • Website with necessary tools
    • PowerBI
    • Tableau
    • Apache Superset

Application Layer

The application layer is the engine running the application, and making sure data is stored and retrieved. For the application layer we have:

  • As-Is: Excel
  • To-Be:
    • Application frontend in programming language; examples:
      • Django or Flask (Python)
      • Shiny (R)
      • React, Vue, Angular (JavaScript)
    • Application backend in programming language; examples:
      • Django, Flask or FastAPI (Python)
      • Shiny (R)
      • Node (JavaScript)
      • ASP.NET Web Application (C#)
    • Web server, reverse proxy, and or load balancer; examples:
      • Nginx
      • Apache HTTP Server
      • Lighttpd
      • Caddy
    • Application server; examples:
      • uWSGI
      • uvicorn
      • Kestrel (C#)
      • Node (JavaScript)
    • Application workers; examples:
      • Celery (Python)
      • Prefect (Python)
      • Hangfire (C#)
    • Message brokers; examples:
      • Redis
      • Amazon SQS
      • RabbitMQ
    • Application database that fits the need of the application; examples:
      • PostgreSQL (Open Source!)
      • Microsoft SQL Server
      • CosmosDB
      • Amazon RDS

Just to be clear, it isn’t always necessary to have all of these services, and sometimes these services aren’t enough. The application architecture is heavily influenced by the problem at hand, and the IT infrastructure of the business.

Domain Layer

The domain layer is where we apply business logic. This is where we affiliate data with business entities and connect the relevant sources. For the domain layer we have:

  • As-Is: Excel
  • To-Be: Data models defined in a programming language, such as django models, and accompanying functionalities. If we had chosen Django as our backend, then the dependencies would still be from application layer to domain layer, as the domain layer would be the corresponding functionalities within the django models, used to apply business logic to the data gathered from the infrastructure layer

Domain layer knowledge can in some cases be handled by the business itself. E.g. if the application always does certain types of transformations to wrangle the data into a specific form, then it might make sense to set up a data mart or a virtualization layer so that when data is extracted from the infrastructure layer, it is extracted either through APIs, or domain specific databases (data marts). In that case the domain layer of the application could be disregarded as it would functionally be considered infrastructure data.

Infrastructure Layer

The infrastructure layer is the data sources that the application depends on. From here we should be gathering all of our data, and in some cases this is also where we should be storing it. For the infrastructure layer we have:

  • As-Is: Excel or business data warehouse
  • To-Be:
    • Business data warehouse
    • Data marts
    • APIs

Solution

The application is now split up into layers that can be built using a variety of smaller components, each of which serves a specific purpose. The huge benefit of decomposing the application as we have is that we’ve highlighted a large number of viable solutions that can be used to structure our final replacement solution. Choosing which specific tools to use for each of these layers now becomes a question of what the business needs. If the business already has competences within JavaScript or C# and they want to maintain the solution themselves, then it might make sense to choose the .NET framework or to use JavaScript. If they don’t want to maintain it themselves, then to reduce the total development cost and the operating costs, it makes most sense to choose a set of tools that the developers and maintainers are most comfortable with and which also are robust. Figuring out what the value proposition is, as well as the various benefits of doing such an Excel replacement is something we’ll go through later in this article.

Our Solution

Now let’s get into how we used this approach to break apart an existing project management tool into its constituent parts, and chose alternatives that enabled concurrency, removed application downtime, increased data consistency, added security, and made the application accessible at all times to all relevant users.

Presentation Layer

For the presentation layer, the company was using Excel and PowerBI. We wanted to use something that was similar to Excel, and which enabled the same workflow. Given that we wanted the application to be an internally hosted web service, we chose to go with Handsontable, as this enabled sheet-like behaviour in the web browser, which meant that the current users of the application would have the least overhead when switching. This meant we didn’t need much change management, which was beneficial to them. Furthermore the presentation layer also had an integration to PowerBI, which meant that the data the users registered in the application could be used to visualize certain numbers of relevance to them.

Application Layer

For the application layer, the company was using VBA, and R scripts to gather and update the data in the worksheets. This posed a number of data consistency, persistence, and uptime issues. Furthermore the business did not want to maintain the application, so given our expertise within Python, we chose to develop it using the web framework Django, a swiss army knife that has stood the test of time. So in our case the application layer consisted of a couple of standard tools, such as:

  • Python (Django) frontend
  • Python (Django) backend
  • uWSGI as an application server
  • Nginx as a web server, reverse proxy, and load balancer
  • Python (Celery) workers to handle tasks
  • redis as a database to handle celery task queue

Decomposing the application into these services meant that they could handle various tasks independently. This meant that data gathering, presentation, and handling input could all be managed concurrently through a variety of functionalities. The services were all spun up in docker containers, which handles service communication and ensures that the application can be run on any operating system as long as docker is present. Application security was handled through the backend by integrating it with the company’s own Active Directory, and the django backend and celery workers also handled scheduled extractions of relevant company data to ensure data consistency.

Domain Layer

The domain layer was also an integral part of the Excel sheet and all business logic was directly applied in the data gathering scripts within the Excel sheet, or in R scripts. Because of the way we designed the application architecture we used Django models to add business logic on top of the infrastructure data.

Infrastructure Layer

For both our solution and their solution the company data warehouse was the infrastructure layer.

Solution

Fig 6: Final application architecture

The final solution was a dockerized set of services deployed on a Windows virtual machine hosted on Azure, made available on the company intranet as a web service and only accessible to the relevant users, where all security aspects were governed by their internal IT professionals. The application supported and still supports concurrent use, and serves as an integral part of the company’s project management workflow. Because of the choice of tools we’ve also managed to maintain a very low maintenance cost.

In the next section we’ll go through the commercial drivers.

Return-on-Investment (ROI) Drivers

Excel licensing fees of using Excel are arguably modest, at least compared to the value generated through proper usage of the product, because there are plenty of use-cases where Excel is truly the best solution. However, when Excel becomes an integral part of a non-simple process with multiple stakeholders and moving parts, Excel-based solutions can become risky to use.

Due to this we would advise calculating the ROI of Excel-elimination projects based on the combination of eliminated operational risk and efficiency gains, let’s call it automation. These parameters are of course very dependent on the specific use-case, meaning you would need to sit down and do the estimation internally ((or call us at Flowtale, and we can help you out).

These are some of the ROI drivers we typically see where Excel has become an integral part of a complex process:

  1. Less operational risk
  2. Actionability
  3. Higher automation rate
  4. Eliminating information security risks

The above sections explain, in some detail, why each of these drivers is in play. Below I will try to highlight how to calculate the ROI for each of these drivers

Operational Risk

For operational risk, the calculation is a complicated one. The term operational risk is simply too wide for a single formula. However, it all boils down to assessing the costs you have for each of the ‘operational mistakes’ using an Excel-based solution allows a user to make. For each input-cell; what is the cost for the company if this data-input is wrong and inaccurate? In some cases the error is visible, and the only cost is to redo the mistake, or to reconcile with previous versions. In other cases strategic decisions are based on the input and the consequences can be too big to put in numbers.

Actionability

As Excel based solutions are highly manual, not just for inputs but also very often in its data collection, aggregation and output generation, there is a certain latency when going from data input to decision ready output. This latency varies a lot, but typically several days and in some cases several weeks depending on the organization and process. This latency means that decisions often are made on days old data, which in some organizations does not have a significant impact, but in others can have a great impact.

Estimating the cost of delayed information will give you a hint of how big the benefit is in eliminating the information latency. E.g. If you’re collecting production output per factory where the collection, aggregation and output generation takes two days, you could end up with two days of excess production of a certain good, and/or be behind two days worth of production of different goods. The value of this is easy to calculate for the controller.

Automation

For higher automation rate, the calculation is a simple one:

Determining what the automation gains are can then be tricky. Here it’s important to consider not just the input and output processes, but also the time spent reconciling versions, time spent sharing the files with the right stakeholders and similarly requesting the finalized sheets to be sent/uploaded/updated.

Information Security Risk

When it comes to data security CIS, Centre for Internet Security, has described the risk-reduction ROIs formula’s very neatly in this article so we will not go in depth with it here. What we can say is that similar to automation gains, the ROI from reducing information security risks can quite simply be defined as:

Calculating risk reduction is a matter of evaluating the data used, and its “means of transportation”. We far too often see Excel sheets being shared via email, even where they leave the internal information environment. Even in cases where the sheets contain highly confidential or sensitive information. However, most cases involve uploading to an internal file sharing platform, where the major risk is uploading the right files in the right place. Bottom line is that the risk is a multiple of the sensitivity of the content and the “means of transportation”

Conclusion

We’re going to end this article with a small summary. In this article you’ve seen how to decompose an existing Excel based application into smaller components and how these smaller components could be structured in a way that enables concurrency, persistence, data quality, versioning, and distribution. You’ve also been given a glimpse of how we solved it for one of our clients, and we’ve given a short overview of what the ROI drivers are for projects that seek to replace Excel based applications.

Now, what most software savvy people already know, is that what we’ve outlined here isn’t specifically related to Excel based applications, but is often an issue that occurs when solutions aren’t given much thought prior to development. In software solutions the old proverb measure twice and cut once is just as valid, if not more so. The reality is that given Excels popularity, a lot of people already use it, and its extended functionalities, which is why so many good and bad Excel applications see the light of day, and even become integral parts of business operations. We hope that by reading this article, you’re better equipped to evaluate your business’ dependency on Excel or any other bottleneck solution and whether you should consider replacing it.

If you are in doubt of whether replacing Excel or any other such solution, we encourage you to reach out to us at Flowtale and together we can find out what solution suits you best!