Azure Build Health - Multibuild Comparison

Azure Build Health sought to increase the efficiency of Release Managers by creating a build comparison feature to create visibility into incremental quality changes.

Timeline: 3 months
My Role: Solo E2E Designer
Team: 2 Product Managers, 1 Senior Developer, 1 Principal Developer

Product Impacts

Launched MVP meeting the needs of 80% of teams onboarded
Created a flexible framework for future build comparison features

Saved Release Managers 20-30 minutes per release

Problem

Release Managers at Microsoft are Developer leads or Dev Managers that manually evaluate the intent and risk of every code change contained in a release to determine its deployability. A key part of this process involves identifying and tracking regressions in their code to ensure it passes Microsoft’s stringent quality gates.

Currently, the Azure Build Health (ABH) test tab allows users to view tests run against a single build. However, it lacks the functionality to compare test results between multiple builds, a critical feature for identifying regressions. This gap forces Release Managers to rely on cumbersome manual processes to do so, leading to inefficiencies and a potential compromise in release quality.

I needed to enable Release Managers to easily compare test results between builds in the ABH test tab, while accounting for the varying build pass criteria and test groupings that differ between teams.

Quick Glance:

I created a multibuild comparison for a singular build vs build experience to save release managers 20-30 minutes per release, as well as identified key dev dependencies we needed to solve before we could address test aggregation and 3+ build comparison scenarios.

The Project Pivot

I joined the Azure Build Health Team as the product’s second designer. When I picked up this project it was scoped as a data visualization experience to show regressions in builds in a separate experience from the test tab. While onboarding I needed to learn about the current implementation of the test tab while also visualizing & designing the new regression tab.

After talking to customers and speaking with developers about the technical feasibility, I identified that creating a separate regression experience would create a gap in the experience by separating this experience from the already existing test tab. With this insight, this project merged into a large cross-team collaboration project to re-do navigation across Azure Build Health, another project I worked on.

Understanding what Release Managers find valuable

As part of the work on the unified navigation project for all of Azure Build Health (ABH), I had created designs that combined the regression tab multi-build scenario with the single build experience in the test. The left nav is part of Azure DevOps, which is where Azure Build Health lives, and we were not able to change that component.

I needed to validate these designs with ABH’s main users, Release Managers. To understand RM’s needs, I interviewed six different teams across Microsoft.

Team’s open questions

What data points Release Manager’s find useful when comparing multiple builds
Learn what scenarios RM’s support
Understand RM’s processes comparing builds

Method

In these meetings I showed RM’s a prototype of a design I had created to guide the conversation. Using these prototypes, I was able to get insights about their different use cases and needs and spark deeper technical discussions.

What was learned

We were able to identify 3 different use cases that RM’s had and what quality insights were useful for them to make the best-informed decisions.

Release Manager’s use cases:

Compare two different builds against each other (1 vs 1)
Compare 3 or more, potentially hundreds, of builds against each other (1 vs many)
Aggregate view of multiple builds

Scoping the solution & Prioritizing 1 vs. 1 comparisons

While I originally tried to solve the three scenarios with a one size fits all solution, we found that the 3 different use cases required showing different types of data and trying to combine them created a disjointed experience.

1 vs 1 build comparison

1 vs many comparison

Based on user demand and implementation time, we decided to focus our MVP on only the 1vs.1 build comparison use case. The PM and I consulted our developers about the size of each feature and ask to establish a priority list of features using the MosCoW method.

Validating the usefulness of the data

With a finalized design concept, I conducted one last round of concept testing to customers showing the refined designs. During these interviews we focused on understanding their teams' requirements on how they group, filter, and examine test data.

Retro - A tested test experience

What I learned:

Between concept validation and synthesizing the feedback to iterate on the designs, there was a 2 month gap due to a company wide initiative to focus on security. Once the fire drill was over and we were able to return to this project, it took much longer than if I had synthesized the data before pivoting to the security work streams. Being able to pivot is important, but so is prioritizing when you pivot, especially if the work stream will be returned to in the near future.

Outcomes:

Designed MVP meeting needs of 80% of teams onboarded

Saved Release Managers 20-30 minutes per release