Architecting Inputs for Stability
Creating robust input architectures that balance backend stability with frontend usability
Input architecture is an extremely critical part of DA models. The decisions made at this stage determine whether a model can scale efficiently, remain stable under user interactions, and support the advanced calculation techniques discussed in later chapters.
A lack of dynamic front end within spreadsheets makes it extremely difficult to satisfy the dual objective of a robust backend and very friendly front-end. This chapter tries to balance the two so that the compromises neither affect the stability of the model, nor be a major deterrent for user adoption.
Key Insight: The guidance in this chapter focuses specifically on practices where inputs are directly entered into the spreadsheet, balancing robust backend architecture with user-friendly frontend design.
Scope
The guidance in this chapter focuses specifically on practices where inputs are directly entered into the spreadsheet. While this guide encourages importing data directly from source systems and remains open to use of forms and PowerApps, which enhance user experience and control, these topics fall outside the scope of this version.
Certain standard practices for input, including, keeping the input in a separate cell and distinguishing them with different colour codes, continue to be relevant while building financial models with dynamic arrays.
However, certain additional guidelines are relevant for DA models, which are outlined here.
Definitions and Terminologies
In the context of this guide, the input variables are classified into four categories. The terminology that is used, and their meaning are as under:
(i) Scalars:
These variables have only one value and is kept in a single cell. For instance, the WACC variable in most financial models are static across time and is, thus, an example of a scalar input.
(ii) Vectors or one-dimensional data:
These variables have multiple values and are presented as a list of values either in a single row or single column. The size of the list can increase (or, in fewer cases, decrease). For example, when projecting long-term financial data, analysts may apply distinct sales growth rates to each forecasted period.
Exhibit 1: Example of vector data set
| 2XX5 | 2XX6 | 2XX7 | 2XX8 | 2XX9 | |
|---|---|---|---|---|---|
| Sales growth rate | 11% | 10% | 8% | 8% | 8% |
(iii) Multi-dimensional data:
These refers to variables that have two or more dimensions and the data can grow on any of the dimensions. For example, if an analyst needs to input bill of materials (BOM) data for a multi-product company, the input may spread across multiple rows and columns as shown in Exhibit 2.
Exhibit 2: BOM — an example of a multi-dimensional data
| Product | Chem 1 | Chem 2 | Chem 3 | Chem 4 |
|---|---|---|---|---|
| Product A | 150 | 150 | ||
| Product C | 180 | 190 | 150 | |
| Product D | 120 | 110 | ||
| Product E | 170 | |||
| Product F | 60 |
(iv) Single group, mixed nature data:
These are variables that belong to a same group or list, but the nature of one or more-line items in the group varies from the other items. For instance, in a funding schedule, all different source of capital can be visually presented as a list.
Exhibit 3: Funding schedule — an example of single-group-mixed-nature data
| Source | K | Weight |
|---|---|---|
| Debt | 9% | 40% |
| Preferred shares | 12% | 20% |
| Equity | 18% | 40% |
| Total | 13% | 100% |
Structured Tables:
It refers to a range that is marked as a table in the spreadsheet. In MS Excel it is done by marking the area as a “Table” from the insert menu or by using the CTRL + T shortcut key combination. These tables are not just formatted to appear as table for the human eye, but they are recognized by the software as such.
2.1 Architecture for Scalar Input Variables
Prescriptions:
Keep scalar inputs in a separate cell, with proper labels and colour coding
All scalar inputs must be maintained in separate cells with clear labels and consistent color coding for easy identification.
- Ensures centralized updates: Keeping scalar variables in a separate cell ensures that when the values of those variables are changed, such a change is consistently applied across the file without having to repeat the same action again. Further, keeping the values in a separate cell makes it possible to create data tables for performing sensitivity analysis.
- Aligns with current practices: This is in complete alignment with commonly recommended best practice documents such as FAST or SMART.
Recommendations:
Protect scalar input sheets with locked cells
All scalar input should be kept in a separate sheet that should be protected and all cells other than the input cells should be locked for editing. The sheet should not allow deleting of rows or columns.
- Reduces the chance of inadvertent deletion of cells: The main technical risk with a scalar input cell is that it can get deleted and throw #REF! error, in linked cells. Preventing deletion of rows or columns reduces this risk.
Note: Readers should note preventing deletion alone is not fail proof for #REF! error. When a cell that is cut from elsewhere is pasted in an unlocked cell, it causes the same problem.
Lock unit conversion factors in separate cells
Unit conversion factors such as number of months in a year, number of metres per kilometre, should be kept in a separate set of cells but the cells should be locked for editing.
- Reduces ambiguities: Although unit conversion factors themselves do not vary, ambiguities can occur when these values are entered directly into a cell. For instance, multiplying a volume by 12 may not clearly indicate whether monthly figures are being annualised or quantities are being converted from dozens to units. By placing such constants in a separately labelled cell, the purpose becomes clear and transparent to the reader.
- Prohibits unwarranted changes: The purpose of keeping unit conversion factors in a separate protected sheet, away from standard input, is to ensure that people do not inadvertently change their values.
The other alternative is to directly store these values in a name under the Name Manager. Since the names are out of sight, there is less chance of inadvertent changes. But this reduces transparency of the model and does not fail proof any intentional manipulations of the variables.
Name cells with scalar variables
Cells with scalar variables should be named. Such names should be brief, yet descriptive. For instance, a cell containing the WACC value can be named as _WACC.
- Easy access: Cell names can be easily called with help of IntelliSense from any sheet by the modeler without having to navigate to the source sheet.
- Improved readability: When a formula uses a descriptive cell name, it makes it easy for users to understand what the cell represents without having to navigate away to the source sheet and read the labels. For instance, let us say the WACC is in input sheet C8. An end user who is trying to understand the logic can easily understand what
=1/(1+_WACC)is doing without having to navigate to the input sheet and checking what does C8 represent. - Reduces vulnerability in programs: When you are referring to a cell in VBA or Scripts, references to cell names are not vulnerable when rows/columns are inserted or deleted. This improves integrity of the model.
Note: Naming cells has some drawbacks, too; users must follow clear naming conventions to prevent confusion, especially in models with many variables. This version of the document does not cover naming best practices.
2.2 Architecture for Vector Input Variables
Prescriptions:
Organize vector inputs as structured tables
Vector inputs must be organized as a structured table. In the case of time series data, this would involve the timelines to be placed as row heading, flowing from top-to-bottom, while the fields would be placed across columns.
- Dynamic Expansion: Financial models are often updated, increasing the number of items in a vector. Inputs in dynamic array formulas must update automatically as new items are added. Using a structured table ensures all related formulas update when the list expands.
Note: Users, who are mostly accustomed to seeing timelines placed horizontally and line items placed vertically, would need to reorient in the case of table architecture. In Appendix 1, we discuss alternative arrangement involving placing timelines horizontally, along with its pros and cons.
Avoid TRIMRANGE function for vector inputs
Avoid storing vector input in traditional range of cells to be used with TRIMRANGE function or trim operators in formula.
TRIM operators introduce higher level of spreadsheet risks that have been detailed in para P17.
Do not protect sheets with input tables
Sheets with input tables must not be protected with “Protect sheet” feature.
- Prevents dynamic expansion of table: As on the date of last update of this document, structured tables do not expand in protected sheets. Therefore, the sheets should be unprotected to ensure the model is dynamic.
Keep incompatible datasets in separate tables
Data sets that are not compatible across the key dimension should be kept in separate tables. Forceful grouping of incompatible data set in a single table can lead to confusion, inefficient referencing, and errors in dynamic array calculations.
Exhibit 4: Improper grouping of incompatible data in single table
| Year | Sales | Expected growth rate | Gross Profit | Expected margins |
|---|---|---|---|---|
| 2XX3 | 300 | 146 | ||
| 2XX4 | 327 | 152 | ||
| 2XX5 | 8% | 48% | ||
| 2XX6 | 7% | 46% |
- Enhanced Clarity: Each table has a clear purpose, making it easier for users to understand and update specific datasets without navigating unrelated data.
- Improves scalability: Since all columns belong to one group, adding more columns will not disrupt the existing structures or formula dependent on the table. For example, if a table is exclusively used to hold historical income statement data and analyst needs a formula that would return all historical income statement line items for year 2023, it can be written as follows:
=Drop(FILTER(tbl_IS, tbl_IS[Year]=2023),, 1) - Reduces spreadsheet risk: When unrelated data are forced together, it would lead to many blank cells or reference errors, which may lead to incorrect computations. By avoiding such forced grouping, the risk of such error is reduced.
Recommendations:
Give structured tables descriptive names
Structured tables should be given brief and descriptive names replacing the default names given by MS Excel.
- Improves readability: The default names given by MS Excel are non-descriptive. A custom name that is descriptive, therefore, would make it easy to understand a formula. For instance, a cell reference such as
=IS_History[Gross Margin]clearly tells us that we are extracting gross margin from historical period and not forecasts. - Easy access: When the names are descriptive, it is easy to recollect and type the table name directly in a cell, further aided by IntelliSense, without having to manually navigate to each table and select them.
Use validation columns for row-level validation
Modelers are encouraged to use column(s) to perform row level validation of input to check whether all input columns are correctly filled.
For instance, let us say there is an IS_Table with historical financials with nine fields, including one for the timeline and another eight for eight different line items, we may insert a 10th column to check whether all the first 9 columns are filled with a formula such as this:
=COUNT(IS_Table[@Year]:[PAT])=(COLUMNS(IS_Table)-1)
This formula would return TRUE only if all the input columns are duly filled.
📊 Exhibit 5: Input table with a validation column
Manual Insertion Required: Please upload an image showing an input table with validation column implementation.
- Prevents stray rows from affecting calculation: When working with tables it is common that an end user may add additional rows to the table either inadvertently, or intentionally to serve as a place holder. The formulae referring to the table can filter out such stray rows using the validation column.
Use multi-table validation columns
Modelers are encouraged to use additional calculated columns to perform multi-table validations.
For instance, one may add a column in income statement to check whether all columns of balance sheet for the same period have been filled up.
- Allows synchronization of calculations: When a calculation requires data from multiple tables (say asset turnover ratio), the multi-table validation columns allow us to filter out data sets when corresponding data is not available in another data set.
Allow calculated columns in input tables
This guide does not discourage using any calculated columns, which may aid end users in the input process. However, such column shall not be used in core calculation sections.
For instance, a calculated PAT column helps analysts check the accuracy of entered financials by comparing computed number with reported number. Likewise, using a moving average of sales growth can guide arriving at a projection assumption. Including these calculations alongside input data can be beneficial.
- Balances user experience with rigid structure: Allowing calculated columns in input table to allow figures flowing elsewhere improves the end user experience while keying information despite the rigid architecture.
Avoid homogenous data of multiple entities in single table
The guide discourages including homogenous data of multiple entities in a single input table.
For instance, while technically it does not create an issue if all historical data including income statement, balance sheet, and cash flow items are kept in one table, modelers should consider keeping them in separate tables.
- Improves user experience during input: When data of multiple entities are mashed up in a single table, it is likely to be overwhelming to enter the input data in the table.
Note: This specific guidance pertains only to tables where input is fed. It does not concern itself with tables that are imported from external sources using any ETL tools.
2.3 Architecture for Input of Multi-dimensional Data
Multi-dimensional data must also be organized as a structured table. And all the prescriptions and recommendations that applied to organizing vector as a structured table also apply to them. This section lays out additional guidelines that are exclusively relevant for multi-dimensional data.
Prescriptions:
Organize three or more dimensions in long form
Variables that have three or more dimensions must be organized in the long form in a single table. Avoid creating multiple two-dimensional tables.
For instance, a company may provide segmental report with revenue for multiple periods, broken down by geography, which is further broken down by business segments. In annual reports, companies may provide separate table for each geography. However, in dynamic array models, they should be presented in a long form format as shown in Exhibit 6.
Exhibit 6: Illustration of long data format for multi-dimensional data
| Year | Geography | Business | Sales | EBITDA |
|---|---|---|---|---|
| 2XX3 | Americas | Beverage | 300 | 80 |
| 2XX3 | Americas | Snacks | 280 | 75 |
| 2XX3 | APAC | Beverage | 280 | 75 |
| 2XX3 | APAC | Snacks | 220 | 40 |
| 2XX4 | Americas | Beverage | 315 | 78 |
| 2XX4 | Americas | Snacks | 305 | 79 |
| 2XX4 | APAC | Beverage | 240 | 55 |
| 2XX4 | APAC | Snacks | 240 | 55 |
- Scalability: This format would be able to easily fit any additional item added to a given dimension without needing to create a new table. For instance, in the example provided, if the company were to foray into a new region, say Europe, in 2XX5, the segmental data can be easily added as extra rows in the same table for 2XX5.
- Enhanced analytical power: This structure is optimized for data processing and analysis functions in spreadsheets. Numbers from the table can be easily crunched and queried across dimensions using auto filters, SUMIFS, FILTER and many other functions.
- Only viable format for true DA models: In pure DA models, formulae must not be modified once the models are built and published. In this format, since no additional tables are required to be added when a new segment is added, such additions will seamlessly integrate into the file.
Note: In the case of data sets that are not likely to be used in “what-if” analysis i.e. not likely to be changed after entering once, modelers can consider keeping the input in more human friendly format and transforming them into long form format using Power Query or any such transformation tools. However, parametric estimates should always be directly entered into such long form table.
Use appropriate format for two-dimensional variables
Two dimensional variables must be in long form format if the grid size is likely to become too big to eyeball. Cross tab formats, however, should be preferred for data sets that are likely to remain small.
For instance, the BOM table in Exhibit 2 is cross tab format, which is acceptable if the number of output products and input materials are expected to remain constant. However, if the company is like to add many more products need many more other materials, the format must be in long form.
- Human friendly structure: Cross tab format is intuitive when the data size is small. However, they become less user friendly when their size gets larger. For instance, users may find it difficult to identify the correct inter-section cell to input the right value when the grid is large.
- Allows easy eyeballing and in-situ analysis: Long form data can be easily filtered and eyeballed for quick analysis and sanity checks as compared to a large grid.
Recommendations:
Normalize data to reduce redundant input
Normalize data to reduce redundant input by creating reference tables for dimensional data.
For instance, in the segmental data illustration in Exhibit 6, each combination of geographic and business can be given a unique name. Another table can be created which provides details of what geography and business segment are present in the unique name. This is shown in Exhibit 7, where we have one “Segment table” that gives a unique name to segments along with its details. In the segmental data, the user would enter the unique segment name and have the system auto populate the geography and business segments using XLOOKUP.
Exhibit 7: Illustration of normalizing data to avoid redundant input
Segmental data
| Year | Segment | Sales | EBITDA | Geography | Business |
|---|---|---|---|---|---|
| 2XX3 | Am_Bev | 300 | 80 | Americas | Beverage |
| 2XX3 | Am_Snacks | 280 | 75 | Americas | Snacks |
| 2XX3 | APAC_Bev | 280 | 75 | APAC | Beverage |
| 2XX3 | APAC_Snacks | 220 | 40 | APAC | Snacks |
Segment details
| Segment | Geography | Business |
|---|---|---|
| Am_Bev | Americas | Beverage |
| Am_Snacks | Americas | Snacks |
| APAC_Bev | APAC | Beverage |
| APAC_Snacks | APAC | Snacks |
Although the segmental data is not “Normalized” in the strictest sense as it does contain the redundant columns, this approach reduces the redundancy from input perspective.
- Balance user experience with rigid structure: Multi-dimensional data necessarily needs to be in long form format, which is laborious and needs redundant inputs. By normalizing the structure, we reduce the number of redundant input variables that need to be typed.
2.4 Architecture for Single Group of Mixed Nature Data
This situation calls for complex balancing between technically strong design principles and human friendly approach and the possible options fall within a broad spectrum. Therefore, this guide does not prescribe any practice as a must follow practice for this. However, there are certain recommendations.
Recommendations:
Use “Tabulated Scalars” for mixed nature data
When a set of variables belong to a single group but the nature or behaviour of one or more items vary from the others, treat all the individual items as scalars but format them to visually appear in tabulated form (“Tabulated Scalars”).
- Flexibility: Structured Tables have rigid structures and mixed nature data may not fit in easily. Tabulated scalars allow flexibility.
- Allows Intuitive flow: Other technically sound possibilities, such as organizing each nature of data in separate tables may affect intuitive flow for the end users. For instance, in the example in Exhibit 3, debt and preference shares details could have been kept in a structured table, while the weight of equity is calculated elsewhere, outside of the table. However, such an approach would reduce the intuitiveness of the flow, especially if the analyst is trying to optimize weights to minimize WACC.
- Easy to read: Tabulating these scalars together as a group rather than treating them as independent scalar variables make it easy for the readers to understand the variables better.
Important Note: Tabulated scalars do not allow dynamic referencing as well as a Structured table and is more vulnerable to user action such as inserting or deleting rows. The guide, therefore, recommends users to be very selective in choosing to use tabulated scalar structure over a Structured table and treat the structure as an exception than a norm.
Summary
In this chapter, we discussed that a robust input architecture is a critical foundation for building stable, scalable, and efficient Dynamic Array (DA) model. The architectural choices for input directly affect the stability of subsequent calculations.
However, as spreadsheets lack an embedded dynamic front end feature, certain pragmatic compromises need to be made to balance robust architecture with human friendliness.
Key Guidelines
- Primary of structured tables: Input data, other than scalars, must be necessarily organized as a structured table.
- Ensure separation of concerns: Data sets that do not share the same dimension (such as historical financials and forecast assumptions) must not be in the same table.
- Fail proofing with validation columns: Tables should have a calculated validation column to check row level validity and mark stray rows as invalid.
- Use long form for multi-dimensional variables: Variables with 3 or more dimensions must be necessarily in long form. Two dimensional inputs must be captured as a cross tab when the grid size is expected to be small; but they must be organized in long form if the grid size is likely to grow.
- Practices for scalar values are unchanged: Scalar variables should be kept in a separate cell and properly colour coded and preferably, named.
- Mixed nature data should be treated as scalars: Data that belong to single group but have different nature should be treated as a scalar data but they need to be grouped and formatted to appear as a table to human eye.
In the next chapter, we shall discuss how the calculations need to be performed and understand how the input architectures discussed in this chapter play key role in ensuring stable, scalable, and efficient calculation sections.