Azure Build Health - Multibuild Comparison

Azure Build Health sought to increase the efficiency of Release Managers by creating a build comparison feature to create visibility into incremental quality changes.

Timeline: 3 months
My Role: UX Designer
Team: 2 Product Managers, 1 Senior Developer, 1 Principal Developer

Product Impacts

Designed MVP meeting needs of 80% of teams onboarded

Saved Release Managers 20-30 minutes per release

Problem

Azure Build Health’s test tab is used to identify where regressions were introduced and require the ability to compare results between builds to make sure their builds pass Microsoft’s quality gates. To increase their release’s quality, users need to identify & track regressions in their code. Today in Azure Build Health’s test tab, users can see tests run against a singular build but lack any way of comparing two build’s tests against each other. This work involved understanding the different test pass criteria and test groupings Release Managers care about, which varies on a team-by-team basis.

The Project Pivot

I joined the Azure Build Health Team as the product’s second designer. When I picked up this project, it was scoped as a data visualization experience in a separate pane to show regressions in builds. While onboarding I needed to learn about the current implementation of the test tab while also visualizing & designing the new regression tab.

After talking to customers and speaking with developers about the technical feasibility, I identified that creating a separate regression experience would create a gap in the experience by separating this experience from the already existing test tab. With this insight, this project merged into a large cross-team collaboration project to re-do navigation across Azure Build Health, another project I worked on.

Understanding what Release Managers find valuable

As part of the work on the unified navigation project for all of Azure Build Health (ABH), I had created designs that combined the regression tab multi-build scenario with the single build experience in the test. The left nav is part of Azure DevOps, which is where Azure Build Health lives, and we were not able to change that component.

I needed to validate these designs with ABH’s main users, Release Managers.

RELEASE MANAGERS (RMs) – Developer leads or Developer Managers that manually evaluate the intent and risk of every code change contained in a release and decide whether it should be deployed.

To understand RM’s needs, we interviewed six different teams across Microsoft.

Our team’s open questions

What data points Release Manager’s find useful when comparing multiple builds
Learn what scenarios RM’s support
Understand RM’s processes comparing builds

Method

In these meetings I showed RM’s a prototype of a design I had created to guide the conversation. Using these prototypes, I was able to get insights about their different use cases and needs and spark deeper technical discussions.

What we learned

We were able to identify 3 different use cases that RM’s had and what quality insights were useful for them to make the best-informed decisions.

Release Manager’s use cases:

Compare two different builds against each other (1 vs 1)
Compare 3 or more, potentially hundreds, of builds against each other (1 vs many)
Aggregate view of multiple builds

Scoping the solution & Prioritizing 1 vs. 1 comparisons

While I originally tried to solve the three scenarios with a one size fits all solution, we found that the 3 different use cases required showing different types of data and trying to combine them created a disjointed experience.

1 vs 1 build comparison

1 vs many comparison

Based on user demand and implementation time, we decided to focus our MVP on only the 1vs.1 build comparison use case. The PM and I consulted our developers about the size of each feature and ask to establish a priority list of features using the MosCoW method.

Validating the usefulness of the data

With a finalized design concept, I conducted one last round of concept testing to customers showing the refined designs. During these interviews we focused on understanding their teams' requirements on how they group, filter, and examine test data.

Retro - A tested test experience

What I learned:

Between concept validation and synthesizing the feedback to iterate on the designs, there was a 2 month gap due to a company wide initiative to focus on security. Once the fire drill was over and we were able to return to this project, it took much longer than if I had synthesized the data before pivoting to the security work streams. Being able to pivot is important, but so is prioritizing when you pivot, especially if the work stream will be returned to in the near future.

Outcomes:

Designed MVP meeting needs of 80% of teams onboarded

Saved Release Managers 20-30 minutes per release