Making Calculations Dynamic โก
Leveraging Dynamic Arrays and Lambda Functions for Scalable and Auditable Financial Models
Dynamic array and Lambda fundamentally change how calculations are performed in spreadsheets ๐. The following outlines this change:
Key Insight: The calculation approach moves from individual cell-by-cell calculations that are copy-pasted to dynamic array-based expressions, enabling more possibilities that were not possible earlier.
- ๐ Dynamic Arrays: Favors in-memory computations where intermediate steps can be stored in temporary variables rather than storing them in cells.
- ๐ง Custom Functions: Allows creating custom user-defined functions, using spreadsheets’ own formula language within the workbook.
- ๐ Iterative Logic: Enables using iterative and recursive logics in functions.
- โ ๏ธ Risk Reduction: Eliminates circular reference problems and reduces the need for enabling iterative calculations in the workbook.
However, these enhancements create an overwhelming number of choices for performing the same calculation including some that are inherently risky. This chapter offers the guidelines on the appropriate approach that satisfies the BEST principles.
๐ Terminologies and Definitions
In this chapter and other chapters, certain terms are used with specific meaning in the context of this guide.
๐น Lambda(s) vs LAMBDA:
In the context of this guide, the term “Lambdas” refers to any custom function created using the LAMBDA function in spreadsheet. Some part of the guide may also refer to them as custom functions.
The term “LAMBDA” written in all upper case refers to the built-in function in spreadsheet which is used to create the custom functions.
3.1 ๐ฏ Array Referencing Practices
๐ Prescriptions:
๐ Use dynamic array format for referencing
When referring to a dynamic array range, always refer to it using the dynamic array format and avoid using fixed range references.
Example: If a dynamic array in cell B5 spills to B9, reference the result as =B5# instead of =B5:B9. This is critical to ensure the model dynamically updates if the formula in B5 produces larger results.
Rationale:
- ๐ Dynamic expansion: If the referred spilled array expands, dynamic references will automatically consider such additional values. Static references break this adaptability.
๐ฏ Reference array elements, not direct cells
Specific cells or ranges within an array should be referred as an element of array and not through direct cell reference. Functions such as TAKE, DROP, FILTER, XLOOKUP, CHOOSECOL can be used for this purpose.
Example: For instance, let us say we have the previous 20 days stock prices, in ascending order of dates, as a single array spilling from cells B3:B22. Here are the right and wrong approaches:
=DROP(B3#,1)/DROP(B3#,-1) - 1
=B4:B22/B3:B21 - 1
Rationale:
- ๐ง Stable under change: When the size of the array expands or shrinks, array-based reference will automatically adapt, while static references will fail.
๐จ Assign cell references to variables in LET functions
Any reference to a cell or range inside a LET function must be assigned to a variable and used. A particular cell or range must not be referred more than once inside a given LET function.
Example: say cells C15, D15, and E15 have start date, end date, and current date and the let function needs to calculate two variables: (i) Total duration, which is D15 minus C15 and (ii) Lapsed time, which is MIN(E15-C15, D15-C15).
The appropriate way to use these cells is as follows:
=LET(StDt, C15,
EndDt, D15,
CurrentDt, E15,
TotalDur, EndDt - StDt,
LapsedPrd, MIN(CurrentDt - StDt, EndDt - StDt),
...)
Rationale:
- ๐ฏ Centralizes changes in cell reference: If the cell reference needs to be changed, the change will need to be done only once. This reduces the risk of inconsistent changes.
๐ Create “sort proof” references
References to tables and arrays must be in a way that the formula returns correct answer even if the order of the underlying data changes.
Example: Let us say we have SalesTbl with annual sales by year, with the sales for 2025 at the bottom. But sorting can move the year’s data from its original position. To reliably get the sales figure for 2025, regardless of sorting, use a formula such as this:
=XLOOKUP(LookupYr, SalesTbl[Year], SalesTbl[Amt])
Where LookupYr is 2025
Rationale:
- ๐ก๏ธ Increases integrity and stability: User interactions such as sorting the underlying data are very common. Sort proofing ensures output is not affected by such interactions.
๐ซ Filter out invalid or stray rows
Any reference to an input table should anticipate invalid or stray rows and filter them out.
Example: Continuing with the previous example, let us say we need to get the latest year sales from the SalesTbl, which also has an IsValid column โ that returns TRUE only when the row meets the validation criteria. A formula such as the one below would give correct answer even if end users add place holder rows for future periods.
=LET(LatestYr, MAXIFS(SalesTbl[Year], SalesTbl[IsValid], TRUE),
Return, XLOOKUP(LatestYr, SalesTbl[Year], SalesTbl[Amt]),
Return)
Rationale:
- ๐ก๏ธ Avoids unwarranted dynamic expansion: Safeguards the models from being affected by common end user interactions such as including placeholder rows or filling up partial data sets.
๐ Handle external references with ETL tools
External references must always be handled using ETL tools, such as Power Query or SQL, and directly linking a cell to another workbook must be avoided.
Sometimes, a model requires data from another modelโfor example, a Master Budget may need information from various functional budgets, or an industry analysis may need data from individual files for different companies. In MS Excel, Power Query can integrate such data efficiently.
Rationale:
- ๐ Avoids risk of stale values: Data brought through copy-pasting could turn stale when other models are updated.
- ๐ฏ Prevents incorrect references: Live workbook links are vulnerable to changes in cell locations. Importing data from structured export sheets reduces both risks.
๐ก Recommendations:
๐ Use structured referencing for tables
Use structured referencing while referring to Structured tables.
- ๐ Improves readability: Structured reference with clear table and column names are easier to read and modelers can easily refer to tables with IntelliSense.
- ๐ฏ Reduces chances of wrong references: Structured reference ensures table columns are always referred in their entirety and wrong columns can be easily spotted.
Note: Structured references are absolute but act as relative references when dragged right. Users should be aware of this behavior.
๐ท๏ธ Name cells containing spill formulas
Name the cells that contain the spill formula.
- ๐ Improves readability and accessibility: Named references are easier to understand and access than relative references, improving formula readability throughout the guide.
3.2 ๐งฎ Calculations and Calculation Blocks
๐ Prescriptions:
โฑ๏ธ Use model-specific timelines
Timelines in a financial model must be model specific and not schedule or line item specific.
Say a project has a life of 25 years, while the debt taken for it should be repaid in lesser duration of 20 years. If an analyst is projecting cash flows for the full 25 years, the debt schedule must also flow for the entire 25-year period, with a formula that makes all the values in the debt schedule going to zero for years 21 to 25.
However, this guide does not discourage, modelling certain line items at a higher frequency โ than others โ in a separate schedule and aggregating them in a common schedule.
For instance, an analyst may choose to model revenue and operating expenses for every quarter while modelling interest and taxes at an annual level and present all of them on annual level. Such a practice is acceptable provided the quarterly schedules are organized in a separate calculation section.
Rationale:
- ๐ง Prevents fragmentation and errors: Ensuring all arrays are of uniform length ensures that operations on multiple arrays work seamlessly. Different length arrays may cause DA formula errors.
๐ Use dynamic arrays for vector/matrix outputs
All calculations that produce a vector or matrix output must be modeled as a dynamic array and no such item shall be modeled using legacy scalar iterative approach.
Rationale:
- ๐ก๏ธ Ensures integrity of calculations: Since the entire vector/matrix is driven by one formula, there’s no risk of inconsistent formulas within a range.
- ๐ง Reduces dependency on end users: Using only dynamic arrays eliminates manual tasks for end users. Legacy methods require copy-pasting and adjusting formulas, increasing error risk.
โ ๏ธ Important: The spreadsheet risk when legacy approach is mixed with dynamic array is far greater than in a pure legacy approach. This guide recommends using pure legacy approach over mixed approach if modeling any array using dynamic approach is impossible.
๐ซ Avoid enabling iterative calculations
Modelers must avoid enabling iterative calculations in DA models. They must opt for alternative closed form solutions, or iterative/recursive Lambdas.
Rationale:
- โ ๏ธ Avoids unpredictable model behavior: If variables that drive spill array size become circular, they may cause incorrect spilling leading to unpredictable results.
- ๐ Extremely difficult to debug: When external iterative calculations affect in-memory DA computations, bugs are far more difficult to trace and debug.
๐ซ Avoid TRIMRANGE functions
Modelers must not use TRIMRANGE functions or trim operators while referring to a cell as they increase the risk of being affected by stray values.
Rationale:
- โ ๏ธ Inadvertent stray values should not affect calculation: End users may type stray values in what appears to be blank cells, leading to all dependent calculations automatically extending and producing erroneous output.
๐ Avoid using calculated columns in structured tables
Calculated columns in structured tables must not be used for model calculations except in the following circumstances:
- Validation columns, described in recommendation R6 and R7 can be used for filtering out stray or invalid data.
- When the calculations need to be done iteratively across the rows in a table and it is impossible to do it as a dynamic array with acceptable level of performance.
Rationale:
- ๐ฆ Prevent model errors from polluted tables: Although structured tables are also dynamic and expand, they are not fail-proof. For instance, cell in a calculated column can be overwritten, especially when the sheet is unprotected. It makes such calculated columns vulnerable for unauthorized editing.
๐ Avoid stacking multiple variables in one array
Modelers must not stack multiple variables in one array except in the following cases:
- All the variables are inter-related and computed together iteratively, or recursively.
- The stack is used to present the final output of the model โ for display or export.
- The stacking is required as an intermediate step inside a formula, but the formula does not output such a stack.
โ Term loan schedule: Even though you could create one function returning opening balances, interest expense, borrowings, repayments, and closing balance as a single schedule, each of these six variables should be presented as individual vectors.
โ Cash sweep modeling: For a revolver, cash flows, interest, taxes and borrowings are inter-related and must be modeled iteratively using REDUCE – stacking is necessary here.
โ Output sheets: Stacking multiple schedules together is acceptable for output or export sheets if individual vectors aren’t used elsewhere in the model.
Rationale:
- ๐ฏ Improves Accessibility: When each variable is presented in an independent vector, it is easy to refer in all dependent calculations.
- ๐ง Maintains model stability when new line items are added: When variables are stacked, functions like
INDEXorCHOOSEROWSrequire specific row or column references. If the number of items or their order in the stack changes, these functions may return incorrect results.
๐ฌ Include validation schedule to support audit of complex in-memory computation.
When major computations are processed in-memory and intermediate steps are not directly observable, modelers must include validation utilities to verify the output.
๐ Example: If a fund holds hundreds of debentures, a scalable approach would be to model the entire balances, redemptions, and incomes for multiple periods together in a single schedule rather than preparing separate schedules for each debenture and separately aggregating them.
Although these are scalable, auditing or validating them would be extremely difficult.
๐ก Solution: The model should have separate calculation section that allow reviewers to drill down the calculation at individual debt level. This calculation section will serve as a validation utility for the main model.
Rationale:
- โ๏ธ Balances auditability and scalability: In-memory computations are highly scalable and efficient. However, they significantly reduce the transparency of the model. Having a utility that can help validate the calculations, and which are easy to understand will improve the transparency, and hence, the trust on the financial model.
๐ง Ensure that model layout will not create spill error
Calculations and calculation blocks must be positioned in a way that they do not potentially cause a spill error if the dynamic array expands.
๐ Positioning Rules:
- โก๏ธ If a calculation block is expected to expand into more columns, do not place other calculation sections to its right.
- โฌ๏ธ If it is expected to expand across rows, avoid placing anything below it.
- ๐ Totals are best placed above or to the left of such blocks.
Rationale:
- โ ๏ธ Prevents spill errors: Keeping other cells clear along the array’s direction helps avoid spill errors. While these errors are usually fixed by adding rows or columns, they may affect downstream models importing data (until the error is resolved).
๐ก Recommendations:
๐ Show intermediate steps transparently
Modelers should show intermediate steps transparently in the model unless it significantly affects scalability or performance.
Rationale:
- โ๏ธ Balances transparency and scalability: Breaking calculations into intermediate steps makes it easy to follow the model without significantly compromising performance or scale.
โ Use approved Lambdas for complex computations
Complex in-memory computations should be done through approved Lambda rather than as direct formula in the cell. Writing complex logic directly in a cell should be opted only if the modeler wants to protect the IP.
Rationale:
- ๐ก๏ธ Increases reliability: Reviewers and auditors can easily validate an approved Lambda rather than complex multi-line logics written directly in cells.
๐ค Create machine-friendly export sheets
Models should have well-structured and machine friendly set of export sheets that can be used by downstream models to import data.
Rationale:
- ๐ Allows seamless integration: Downstream models can use SQL, Power Query or other methods to easily extract data if the file has well-structured export sheets.
Note: Export sheets are different from output sheets – they are meant for machine reading, not presentation. Appendix 5 suggests best practice for export sheets
๐ Summary
In this chapter we had discussions on how calculation section should be set up and considerations that should be given in choosing a proper calculation approach so that models remain stable, scalable and efficient. ๐ฏ
๐ฏ Key Guidelines
๐ Strict Array Reference Principles:
- Dynamic arrays must always be referred using the # operator and static range reference must be avoided
- Specific cells or ranges within an array should be referred as array elements, not through direct cell reference
- Cell references inside LET function must be assigned to variables and same cell must not be referred more than once
- Structured references and named cell references are preferred over relative cell references
๐๏ธ Calculation Setup Practices:
- Timeline must be model specific and not schedule specific
- Multiple variables must not be stacked together except for certain exceptional cases
๐ก๏ธ Fail-proof Formula Against User Interactions:
- Sort proof financial models
- Prevent stray data and incomplete data updates from causing dynamic expansion
๐ซ Avoid Legacy Practices:
- Legacy approach must not be mixed with dynamic array approach
- Iterative calculation setting must stay disabled
โ๏ธ Balance Memory vs Sheet Storage:
- Intermediate steps should be stored in sheets if it doesn’t cause performance issues
- If performance issues occur, in-memory calculations must be preferred with approved Lambda and validation sections
๐ Next Steps: This chapter guided that complex logic should be handled using approved Lambda. In the next chapter, we shall discuss what process and practice should be followed so that approved Lambdas are more reliable than explicitly written formulae.