Zalando AI-Assistant UX Research

Worknest The Smart Way to Find Your Workspace

Scaling Zalando Assistant to New Markets | On-site data analysis to uncover customer usage patterns, directly informing an iterative product roadmap

Short summary

Short summary

Zalando Assistant is a generative AI powered conversational experience that proactively guides Zalando’s customers in their lifestyle discovery journeys.

I joined the team as a user researcher guiding the future iteration of the Assistant. My primary objective was to understand user behaviour, identify failure points that were causing drop-offs, and establish a systematic research framework that could continuously uncover usage patterns and inform ongoing product improvements.

Research activity

Research activity

Human-in-the-loop analysis to systematically evaluate user interactions and identify opportunities for improvement. My approach was two-fold:

  • Conducted a through analysis of 200+ onsite user conversations with the AI assistant, focussing on identifying user intent and understanding key failure points that were hampering the user experience.

  • Audited our internal AI Evaluation tool’s capability to correctly identify failures and tag assistant responses. This provided a crucial feedback loop to improve machine evaluations and improve monitoring performance at scale.

Impact

  • Enhanced understanding of user intents and needs

  • Improvement of the tagging mechanism of our internal Machine Evaluation tool to clearly identify different customer intents and scale monitoring efforts

  • Plug and Play research framework to analyze customer conversations for continuous improvement

Company

Zalando SE

Project Duration

April 2024

Role

Lead User Researcher

Team

Design Manager
Development Team

Project background

Customer Problem | Zalando’s vast product assortment was a source of cognitive overload for customers. The rise of generative AI presented a unique opportunity to solve this problem by bridging the gap between how customers naturally speak about fashion and the rigid, keyword-based search interfaces. Our solution, the Zalando Assistant, was launched as a conversational AI that acts as a personal fashion advisor, helping customers discover fashion using natural language.

My role | Guiding product iterations

My role | Guiding product iterations

As we prepared to scale the Zalando Assistant to new markets, my role was to deepen our understanding of how early users were engaging with the tool. My focus was on uncovering key user behaviors, what jobs the assistant was helping them achieve, and identifying new opportunities to innovate and better meet customer needs.

Understanding the space

Understanding the space

Stakeholder alignment | Design Manager, Engineering Manager & Applied Science team

Stakeholder alignment | Design Manager, Engineering Manager & Applied Science team

To align on a shared vision, I collaborated with the Design Manager, Engineering Manager, and Applied Scientist team. I mapped their questions and aspirations for the Zalando Assistant and internal evaluation tool to understand the key decisions and technical capabilities.

This stakeholder alignment process helped me gain a deeper understanding of the team's vision to build a scalable platform capability, not just a product feature.

Existing research review | Early beta tests, search behaviour research

Existing research review | Early beta tests, search behaviour research

I reviewed existing internal research, including beta tests and search behaviour data, to inform my research approach. This preliminary work was crucial for defining research questions and creating a framework to provide continuous, actionable feedback to the team.

Evaluating the current behaviour | JTBD validation

Evaluating the current behaviour | JTBD validation

To evaluate the Zalando Assistant, I proposed using the Jobs-to-be-Done (JTBD) framework, an approach the team was eager to adopt. I drew on my previous research on inspiration and purchase phases of the fashion discovery journey to quickly provide the team with a baseline understanding of user needs.

To build a comprehensive view of how well the Zalando Assistant was meeting user needs, I mapped the current user experience. This helped me uncover key experience and model issues and provided the data to guide my subsequent research.

Research design & analysis

Research design & analysis

As the sole researcher on a fast-paced team, I developed a "human-in-the-loop" framework to provide continuous feedback without slowing down development. This two-part approach systematically evaluated user interactions and identified clear opportunities for improvement.

Analysis of user conversations

Analysis of user conversations

I conducted a deep-dive analysis of over 200 anonymous user conversations with the Zalando Assistant. I focused on two key areas:

  • Identifying user intent: I analyzed the nuances of user queries, which were often three or more back-and-forth exchanges.

  • Uncovering failure points: I meticulously documented breakdowns in the conversation, including instances of misinterpretation, forgotten context, irrelevant suggestions, and "LLM hallucinations," where the assistant recommended non-existent products.

Tooling & machine evaluation

Tooling & machine evaluation

To scale our research and create a continuous feedback loop, I worked with the engineering team to analyse and improve our internal AI tool. This provided a crucial feedback mechanism for monitoring the assistant's performance at scale.

  • Improving the tagging mechanism: I identified and improved the existing tagging system, allowing us to capture the complexity of user interactions more accurately.

  • Building a human-in-the-loop process: I established a repeatable research framework that enabled the team to validate the AI's performance after each model iteration. This process helped us catch bugs, identify production issues, and build shared empathy for the user's experience.

Data analysis

Data analysis

To quantify the problems beyond anecdotal feedback, I developed an automated Google Sheet tool to query the database, pull 200 conversations, and systematically code them. I meticulously coded each interaction, focusing on:

  • User Intent: I categorized the purpose of each query (e.g., product discovery, occasion-based search, outfit inspiration, etc.) to understand how users were using natural language for complex exploration and discovery, rather than simple keyword searches.

  • Performance Evaluation: I rated the success of each assistant response and compared it to the machine's evaluation. This helped us quantify common failure points and understand the root cause of these issues.

  • Failure Taxonomy: I used a custom taxonomy to tag the cause of every failed interaction, such as "misinterpretation," "hallucination," or "lack of conversational memory." This provided the engineering team with clear, actionable insights into where the model was breaking down.This systematic coding process allowed me to present data-backed findings to the team, highlighting precisely where our most critical issues lay.

This systematic coding process allowed me to present data-backed findings to the team, highlighting precisely where our most critical issues lay.

Key insights

Key insights

To ensure the team fully digested the findings, I presented the research insights in a detailed review session, using real user conversation examples to illustrate the key takeaways.

Following the presentation, I led a series of design- and engineering-focused workshops. These sessions allowed stakeholders to brainstorm and identify potential solutions. We then prioritized the changes, categorizing them into immediate actions for the next two iterations and long-term goals for the product roadmap.

Insight 1 | Suggestive prompts were a missed opportunity for a personalized first-time user experience

Insight 1 | Suggestive prompts were a missed opportunity for a personalized first-time user experience

Generic, one-size-fits-all prompts failed to engage users and often led to simple text outputs, hiding the assistant's dynamic capabilities and preventing meaningful discovery.

Action

Action

I facilitated a workshop to brainstorm how we could personalize the landing experience. We shifted from static prompts to dynamic, personalized "conversation starters" based on user’s behaviour and seasonal trends.

Impact

Impact

This strategic change aimed to increase the number of high-value actions (HVAs), directly impacting customer lifetime value and product adoption.

Insight 2 | Failure to acknowledge non-verbal cues

Insight 2 | Failure to acknowledge non-verbal cues

The assistant was not equipped to understand subtle user dissatisfaction, such as repeated requests for "more options" without any clicks. This resulted in an unhelpful, static experience.

Action

Action

I documented specific user behaviors and collaborated with the team to propose a feedback loop. We implemented proactive nudges to prompt users for more details on their preferences (e.g., color, style) after they asked for more options.

Impact

Impact

This created a more engaging and dynamic experience. By actively learning from user actions and responses, the assistant could provide better recommendations. This also allowed us to more accurately calculate the customer satisfaction score, as we were no longer mistaking repeated requests for positive engagement.

Insight 3 | Need for a more robust tagging system

Insight 3 | Need for a more robust tagging system

The machine evaluation system used broad, generic tags that failed to capture the nuances of user intent. For example, the "Search" tag alone accounted for 77% of all queries, obscuring a wide range of unique user behaviors.

Action

Action

Based on a thorough analysis of user queries, I designed a more detailed tagging system for our internal AI tool.

Impact

Impact

This change provided a more granular and scalable understanding of user needs, allowing the team to identify specific pain points and create a more focused product roadmap.

Reflections

Reflections

This project taught me that research is a powerful strategic tool, not just a way to gather insights. It showed me how user research can unblock key product decisions and help a team iterate effectively while keeping technical constraints in mind. Even when the underlying technology could not be changed overnight, we could look into different interaction patterns to bring value to users and help in the adoption of the Zalando Assistant.

This project was a unique opportunity to combine my technical knowledge—working with SQL and on-site data—with my user research skills. It deepened my understanding of the unique challenges of building and scaling AI products, and reinforced my ability to work comfortably and collaboratively with multidisciplinary tech teams.

Similar Projects