Tackling Technical Debt with Generative AI

Background

In previous articles, we’ve written about our experience here at Encora using code synthesis tools like GitHub Copilot. Those articles have covered our overall experience with using these tools and their limitations. In this article, we will go more into depth on their use in paying down technical debt and how they can be used for three of the most important tasks that software engineers perform as part of doing so: reading code, fixing bugs, and refactoring.

The SDLC, Technical Debt, and the Total Cost of Ownership of Enterprise Software

Why care about technical debt?  In a survey from McKinsey CIOs reported that:

“… 10 to 20 percent of the technology budget dedicated to new products is diverted to resolving issues related to tech debt. More troubling still, CIOs estimated that tech debt amounts to 20 to 40 percent of the value of their entire technology estate before depreciation.”

Fortunately, proper handling of technical debt is possible and can have a significant impact as seen in this quote from the same McKinsey survey:

“Some companies find that actively managing their tech debt frees up engineers to spend up to 50 percent more of their time on work that supports business goals. The CIO of a leading cloud provider told us, ‘By reinventing our debt management, we went from 75 percent of engineer time paying the [tech debt] ‘tax’ to 25 percent. It allowed us to be who we are today.’”

Technical Debt, the Developer’s View

Technical debt must be handled during all the phases of the SDLC. Software architects are heavily involved during the design phase of the SDLC, but the bulk of the work done by software engineers takes place during the implementation, testing, and maintenance phases, so it is in those phases that handling technical debt is most important (and those phases can account for up to 61% of the total cost of ownership of enterprise software).

What does that work look like for software engineers? To start with, it represents a significant amount of their work, with an estimated mean of 17.3 hours of work per week spent dealing with bad code/errors, debugging, refactoring, and modifying code, out of which a mean of 13.5 hours are spent paying down technical debt (see “The Developer Coefficient” by Stripe). How is this work different from other kinds of development? Developing new features usually involves a significant amount of time spent writing brand new code and most software engineers love to write brand new code far more than they do working on extant code (as the old saying goes “Hell is other people’s code”). This is because working with extant code requires a software engineer to perform a different set of activities than those usually performed while writing new code: reading (and understanding) code, refactoring code, and fixing bugs.

Reading (and Understanding) Code

When developing new features software engineers go through a process of understanding a particular problem, designing a solution, and then iteratively writing code; when paying down technical debt the process begins with understanding the problem, and then reading the code in order to understand the current solution, only then can a software engineer begin the process of redesigning/fixing said solution so that it can be correctly implemented in code.

Crucially, while in the case of a novel feature, even partial success can add value, in the case of paying down technical debt partial understanding of the current approach can add technical debt instead of paying it down! As a code base grows the amount of code and the complexity of the system grow as well, this is why proper documentation has increasing value as a system is developed, and here is where Generative AI tools can have an important impact.

When writing code for a new feature one of the aims of a good software engineer is to make the code as readable and understandable as possible, and while the idea of self-documenting code (i.e., code that requires no comments or other external documentation) is one to strive for, it is sometimes hard to make the context in which a particular bit of code operates clear in the code itself, this makes the task of understanding the code later on much harder. Generative AI tools can help with the problem in two ways:

  1. Writing better comments and documentation: Software engineers are known for loathing the task of writing documentation, and poor code commenting practices are common. Tools like GitHub Copilot make the task of generating comments (and keeping them current as the code is modified) much easier, significantly increasing the coverage of well-commented code. Similarly, tools like ChatGPT speed up the task of writing documentation that is external to the code.
  2. Aiding in understanding code: Currently work on repository-level code synthesis is ongoing (e.g., Fengji Zhang et al.) and LLMs are unable to take into account the context for any specific bits of code that they generate. Fortunately, current LLMs (thanks to recent increases in context window size) can consume not only entire files, but whole sets of them (i.e., they can consume whole packages, modules, namespaces, etc.), and with that context can help explain what sections of the said code do, thus greatly speeding up the process of code understanding. Software engineers should still strive to develop a deeper understanding of the code they read this way since models sometimes struggle to properly explain some kinds of code (recursion is particularly hard for LLMs as seen in the work of Shizhuo Dylan Zhang et al.).

Bug Fixing

It’s known that the earlier in the software development life cycle any defects are handled, the better, as can be seen in the figure below:

Screenshot 2023-07-27 at 12.25.46 PMFigure 3, Relative Cost of Fixing Defects of "Integrating Software Assurance into the Software Development Life Cycle (SDLC)" by Maurice Dawson et al.

While software architects are heavily involved during the design phase of the SDLC, the bulk of the work done by software engineers when it comes to fixing defects takes place during the implementation, testing and maintenance phases, where the cost of doing so is significantly higher. That work also represents a significant part of the time software engineers spend working, with estimates ranging from 10% to 25% of developers’ time spent solely on fixing bugs.

Generative AI tools can help to accelerate this process by both speeding up an engineer’s understanding of the problem (as discussed above), as well as speeding up the writing of a solution. Workflows, where an LLM like ChatGPT is prompted with the code and the error output it generates, are useful in solving the easier kinds of bugs. More complex bugs require deeper analysis and usually don’t have useful output to prompt a model with, requiring more prompt engineering on the engineer’s part. The models’ abilities to handle bigger context windows with heterogenous types of data promise to, with the aid of add-ons like Code Interpreter (which despite the name helps with the context around the code and not the code itself), once again comes to the rescue.

One thing to keep in mind when using Generative AI to help fix bugs is that these tools are still prone to errors and hallucinations and that we should be careful to not introduce new bugs while fixing the old ones, as the following anecdote from one of our engineers relates:

“Excited by the convenience of auto-completions, there was one time when we accepted a function generated by Copilot without proper scrutiny. This unverified adoption led to a bug caused by a minor configuration oversight, which took us a couple of hours to find. This served as a warning, making us more vigilant about reviewing suggestions before integrating them into our code.”

Refactoring

In the second edition of “Refactoring: Improving the Design of Existing Code” Fowler and Beck define refactoring as:

“… the process of changing a software system in a way that does not alter the external behavior of the code yet improves its internal structure. It is a disciplined way to clean up code that minimizes the chances of introducing bugs. In essence, when you refactor, you are improving the design of the code after it has been written.”

Some common improvements made during refactoring tasks include:

  1. Improving design by reorganizing code around key design patterns.
  2. Removing duplication.
  3. Increasing readability (improving names by breaking large sections of code into smaller, more modular ones).
  4. Enabling new features by decoupling components into separate modules behind clean interfaces, making it easier to add new features without impacting unrelated parts of the code.
  5. Improving performance by replacing inefficient algorithms, data structures, or unnecessary levels of abstraction.


Refactoring is therefore fundamental to paying down technical debt. Unfortunately, it is one of the most complex tasks software engineers perform. Is it worth it to use Generative AI for it? A recent study from McKinsey analyzed the impact of Generative AI on developer productivity based on the complexity of the tasks they undertook:

Screenshot 2023-07-27 at 12.32.17 PM

Figure 1 from “Unleashing developer productivity with generative AI" by McKinsey.


As can be seen in the graph above, refactoring sees a lower productivity lift than either code documentation or code generation, but it still sees a big enough lift to have a significant impact on overall productivity.

The main difficulty that is seen when using Generative AI during refactoring tasks is that refactoring takes place at multiple levels of granularity in the code base, from files, through modules, namespaces, and packages, and ending at libraries, services, and system components (changes which might involve almost the entire code base), while current code synthesis tools work mostly at the single file level. Fortunately, work is ongoing on implementing whole repository code synthesis (D. Shrivastava, et al., GitHub Copilot View) and benchmarking (T. Liu, et al.), so we can expect productivity lift for refactoring to increase in the future.

In our experience here at Encora, we’ve seen that current tools work well when refactorizing by incorporating design patterns and writing new interfaces to allow for decoupling, while performance improvements and general readability are still best left for more specialized tools like profilers and liters. We’ve also seen that properly documented code is more easily refactored since the tool has more context with which to work.

Conclusion

At Encora we’ve used code synthesis to not only write new code but also to transform existing code to pay down technical debt. In our experience Generative AI had the biggest impact on facilitating the reading of new code and on fixing bugs, with refactoring proving to be harder due to the current limitations of these tools, nevertheless, we’ve concluded that Generative AI promises to be an important new tool in paying down technical debt.

Share this post

Table of Contents