Increasing the perceived accuracy of Alerts for a Suicide Prevention Tool

Beacon is a GoGuardian tool that helps K-12 schools detect a risk of suicide, self-harm, or possible harm to others in students and give them the help they need when they need it the most.

Increasing the perceived accuracy of Alerts for a Suicide Prevention Tool

Beacon is a GoGuardian tool that helps K-12 schools detect a risk of suicide, self-harm, or possible harm to others in students and give them the help they need when they need it the most.

Increasing the perceived accuracy of Alerts for a Suicide Prevention Tool

Beacon is a GoGuardian tool that helps K-12 schools detect a risk of suicide, self-harm, or possible harm to others in students and give them the help they need when they need it the most.

What does GoGuardian do?

GoGuardian is an educational technology company based in California. It provides IT administrators with tools to filter and monitor school devices, offers teachers a way to manage and enhance classroom engagement, and includes a student safety tool called Beacon for counselors. The complete suite aims to create effective, engaging, and safe learning experiences for K-12 students.

Check it out

The Background

Beacon is GoGuardian's student safety tool. It is deployed on all school devices and is used to notify designated staff about online activity that indicates a risk of suicide, self-harm, or possible harm to others.

Beacon works by identifying at-risk students and connecting them with staff members, such as school counselors. It sends an alert to staff members with relevant context and assists them in utilizing this data to facilitate meaningful intervention.

My Role

I was the UX/UI designer on Beacon.

I worked closely with a lead designer, a product manager, and the data science team.

Target Users

Counselors, Mental Health Professionals, and Administrators in schools.

Timeline

3 months.

The Problem

Users who were receiving high volumes of alerts from Beacon on a daily basis faced challenges in determining when an alert was urgent and required immediate action. Without proper insight, they often dismissed the alerts. It was essential to assist them in understanding the rationale behind each alert in order to improve the perceived accuracy of the alerts.

Responding to such a sensitive matter in large volumes also had a psychological and emotional impact on users. Therefore, it was crucial to make this process easily understandable and context-rich so that users could make well-informed decisions more quickly.

The Research

To improve the perceived accuracy of Beacon alerts, I first started by gaining a deeper understanding of our users.

User research calls with 10 schools across different districts using Beacon revealed interesting insights. At the time, Beacon had two primary user personas, each with unique preferences in understanding why an alert was triggered.

Mental Health Professionals

Schools have a dedicated personnel, often in the form of a counselor, who is responsible for student safety and health.

First to be notified and deal with a very high volume of alerts.

Want information that is visual and easy to understand and breakdown and provides necessary context on student activity.

School Administrators

Administrative staff of a school is also listed on Beacon alert lists and are responsible for student safety.

Often, the second point of contact for triggered alerts.

Want information that is analytical and provides an understanding of how the machine-learning model decided to trigger an alert.

I conducted interview calls with both user personas to quantify the problem at hand and jump into understanding why they perceived alerts as inaccurate.

Interviews revealed that users receiving Beacon alerts felt that these alerts were accurate only about 27% of the time. This meant that an overwhelming 73% of alerts were dismissed as inaccurate.

“How often do you feel the predicted phase correctly categorized alerts?”

~27%

Understanding “False Positives”

The next step was to understand why users considered alerts as "false positives."

False positive alerts meant that, according to the users, the model was incorrectly categorizing students as at-risk. This made the user perception of Beacon’s value also drop. After digging in, I found that there were 4 reasons for deeming an alert as false -

Incorrect Phase

Beacon categorizes alerts into 6 phases to help users determine the course of action.

Users felt that Beacon flagged an alert as, say ‘Active Planning’, when it should have been ‘Suicide Research.’


No Imminent Threat

Beacon sends alerts for what it considers concerning or at-risk behavior. ‘Self Harm’ or ‘Suicide Ideation’ also trigger alerts.

Users felt that alerts were incorrect or unimportant because they didn’t require urgent intervention.


No Obvious Signs

Beacon sent alerts with some details about its phase and next steps.

Users often, on receiving an alert, could not easily make out why it was triggered and, thus, deemed it false.


Genuine False Positive

Beacon works on a complex machine-learning model. The more data it processes, the more accurately it delivers.

Sometimes, the model sent genuine false positives that users pointed out.


Genuine False Positives were understood correctly by users and would improve with time as the model has more data to train on. This problem was passed on to the Data Science team where they doubled down on producing more accurate results.

Incorrect Phase, No Imminent Threat, and No Obvious Signs were 3 problems we could mitigate for. We needed to help users understand how the model works and why an alert gets triggered.

The Process

Here is a look into my process and collaborations at each step:

The Solution

It is important to understand that the ML model Beacon was using was a black box model that utilized Natural Language Processing and Deep Learning. This means that we were aware of the inputs it received and the outputs it generated, but we had no visibility into its internal workings.

Here are the various inputs the team fed the model and the components we designed from its output:

Probability Distribution for Phases

After processing the content on a page, the model categorized an alert across 5 phases with varying degrees of confidence.

Instead of selecting the phase with the highest confidence, our goal was to display to users the various probabilities of an alert being classified as a specific phase. To achieve this, we converted the confidence levels into easily understandable percentages and created a probability distribution component that highlighted the most probable phase.

Words on the Page

To further our goal of providing users with comprehensive context, we added the text displayed on a student's screen as a data component. In doing so, we could highlight words that indicate at-risk behavior, while still presenting entire paragraphs to provide a complete understanding of the context. Additionally, we allowed users to download this text for future analysis or for sharing with students during discussions.

Context Diagram

Over several whiteboarding sessions, we tried to find the best way to visually show the connection between different words on a student's screen, indicating if they are at risk according to the ML model.

The model worked by calculating the distance or similarity data between multiple words from a center point on the axis that indicated at-risk behavior. The shorter the distance, the closer the words came to indicate a cause for concern.

Here, the yellow dots represent different words that are being inputted to the model from a student's screen. The distance between Word A and Word B, measured from the center point S in the 3D space, provides similarity data. Using this information, the model can calculate the distance or similarity between all words on a page.

Below is a sample output from the model with words ranked according to their similarity to the center point.

Our goal was to highlight the most concerning words to users, helping them understand how the connected points can come together to mean at-risk behavior. To achieve this, we decided to break down the above list into multiple tiers, prioritizing the most concerning words. We explored various methods of visually presenting this information to users in the most effective and comprehensible manner. Ultimately, we decided on the following representation.

This context diagram showed all 46 words and their distance from the center point of concern and accurately delivered the meaning. The primary critique of this visualization had to do with information overload. After discussions with multiple stakeholders, we decided to reduce the cognitive load by showing a smaller number of top ranked words. This would make it quicker for our users to digest the diagram and draw insights.

The Final Steps

Once we had our components ready and internally validated, I moved onto gathering user feedback on them.

8 users scored each data component on a scale of 0 to 5 going from not useful to very useful. This helped us understand user preferences, refine the components further, and determine their order on an Alert Page.

Finally, I added a new tab on the page and called it Rationale. This would hold all newly designed components.

In my final round of user testing, I tested a prototype of the entire user flow from receiving an alert to opening and reading about it and finally taking action on it. At the end of these calls, I asked the same question to our users and was happy to report an increase in the perceived accuracy.

“How often do you feel the predicted phase correctly categorized alerts?”

~27%

~83%

Finally, this feature was handed off to the team, built, and shipped to make Beacon a more helpful tool.