CodeCompose, the AI-driven code writing tool widely used by thousands of developers at Meta, has recently made a significant breakthrough from single-line suggestions to multi-line suggestions. However, this transition also presents unique challenges aimed at improving the tool's usability and user experience.
In the early stages, multi-line suggestions disrupted developers' workflow by constantly moving existing code, potentially leading to a decrease in productivity and satisfaction. Additionally, the generation of multi-line suggestions took longer, prompting the team to invest in reducing perceived latency.
Through model hosting optimization, the latency of multi-line suggestions has been improved by 2.5 times. Subsequent experiments involving numerous engineers showed that multi-line suggestions accounted for a significant portion of accepted characters, resulting in almost double the percentage of keystrokes saved compared to single-line suggestions. Despite this, the percentage of Meta engineers choosing to opt out after the introduction of multi-line suggestions remained below 1%.
CodeCompose was originally designed to provide inline suggestions for software engineers while writing code, but initially limited to predicting and completing code snippets for the current line. Such single-line suggestions required speed, high accuracy, and contextual relevance.
The multi-line algorithm, on the other hand, is more complex, requiring automatic triggering during user input and precise selection of suggestions based on the chosen trigger point and the limitation of the user's current scope. Although generating accurate multi-line suggestions is more challenging, range-based algorithms allow for displaying suggestions consistent with the user's current thought process, aiding their thinking without introducing unnecessary distractions.
In the system architecture of CodeCompose, the client editor is responsible for displaying suggestions, while the language server acts as an intermediary between the CodeCompose model service host and the client. By passing the "multi-line" flag in the request to the model service, the generation of multi-line suggestions is achieved.
In this article, the author addresses the following challenges:
Challenge 1: Jarring Effect: To address this issue, the team designed a range-based algorithm. This algorithm triggers multi-line suggestions only when the cursor is at the end of a range. The suggestions remain visible until the end of the current block. After accepting a suggestion, the cursor automatically moves to the end of the suggestion block, reducing the likelihood of interrupting the developer's workflow.
Example of single-line "jarring" effect: When the user's cursor is between the "def" keyword and the "quicksort" function, the appearance of inline suggestions shifts the existing user code to the right, creating a jarring effect.
Example of multi-line "jarring" effect: When the user's cursor is between the function name and the line containing the statement "test1 = 1," the generation of multi-line suggestions may push down the existing line, disrupting the developer's flow. This interruption forces them to review the suggested "quicksort" function and determine the correct placement of the existing code.
Challenge 2: Responsive User Experience: Due to the additional time required for generating multi-line suggestions, various efforts were made to minimize the perceived latency and improve adoption rates compared to single-line suggestions. This includes introducing user interface indicators to notify users when multi-line suggestions are being generated, as well as implementing optimization measures such as Flash Attention and persistent K-V caching in the model hosting service.
Challenge 3: Production Release Effect: During the rollout of multi-line suggestions, the team closely monitored various metrics, including acceptance rate, display rate, latency, and throughput. This evaluation helped assess the overall impact of multi-line suggestions compared to single-line suggestions.
Although developers feel an increase in coding speed, they typically require more time to review the generated code. Additionally, other studies have shown that the generated suggestions aid in discovering new APIs. This indicates that CodeCompose's multi-line suggestion feature has potential value in improving development efficiency and promoting code quality.