Building fail-safe software systems

Abstract digital illustration with glowing shapes and lines.

By Karthik Pattabiraman, Associate Professor of Electrical and Computer Engineering and head of UBC's Dependable Systems Lab Software systems are everywhere. And as their prevalence grows across every part of our lives, it’s essential that these systems are both trustworthy and fail-safe. We see this in specific industries, of course, like transportation and health care, where software failure can have deadly implications. But it also increasingly applies to other facets of modern life, where software’s dependability, accessibility and reliability are crucial.

With mechanical systems, we often have some warning before they fail – which can enable us to take action to minimize or prevent the impact. In the virtual space of software, it’s a different story.

When software fails, it often fails spectacularly, with significant impacts. And with AI and machine learning there are further complications, where we are often not able to even pinpoint causes of failure.

Building resilient systems

Malicious attacks and security breaches pose additional threats to software dependability and resilience. In December 2021, for example, the Canada Revenue Agency went offline as the administrators were worried about a potential security threat that was affecting organizations – including hospitals – around the world. The threat was a flaw in a common open-source logging tool used in cloud servers across industry and government that enables hackers to access data, embed malware and engage in other nefarious activities. With the increasing use of software systems, there’s a parallel growing demand for software engineers who can design and implement resilient systems that will continue operating in the face of failure or security breach.

The MEL in Dependable Software Systems is a graduate degree for software professionals with several years of work experience who want to upgrade their skills in this critical area.

Three of the required courses enable students to acquire new skills in the field, including software testing, dependable systems design and software security. When I taught the resilient systems class, students had to build a system and then simulate a variety of failures. They competed with each other to see which system would be the last one standing. Similar applied learning projects are required in the other software engineering courses.

Real-world practice

Students in the program also complete a capstone project. In 2021, students developed a Gradle plugin and a GitHub action for fuzz testing, which is emerging as a popular approach to testing software. Fuzzing tools have been developed for a variety of languages, including Java, and a team of two DSS students developed additional tools to simplify the use of fuzzing in software development. The Gradle plugin is now available for everyone to use from the Gradle Plugins website, making the students’ work available to the entire software engineering community. Another team tackled the issue of how to cope with errors in training data used for machine learning. Most machine learning approaches today need a set of input and output pairs that correctly explain the behaviour of the system that we are trying to “learn.” Two students studied the use of ensemble methods as a mechanism for coping with erroneous data (assuming that incorrect outputs were linked to some of the inputs during the training phase) that was being used to train machine learning models. Their study found that one could use a collection of machine learning models and approaches such as voting to reduce the impact of errors in training data. This is a promising result because it is expensive to obtain error-free training data sets.

The need for professionals with the ability to assess and design safety-critical and fail-safe systems will only continue to grow.

Our program meets that need, and the business courses – which make up about half the curriculum – enable students to deepen their skills in communication, business and leadership, positioning them to successfully transition into management and leadership positions.

UBC Crest The official logo of the University of British Columbia. Arrow An arrow indicating direction. Arrow in Circle An arrow indicating direction. Caret An arrowhead indicating direction. E-commerce Cart A shopping cart. Time A clock. Chats Two speech clouds. Facebook The logo for the Facebook social media service. Social Media The globe is the default icon for a social media platform. TikTok The logo for the TikTok social media platform. Calendar Location Home A house in silhouette. Information The letter 'i' in a circle. Instagram The logo for the Instagram social media service. Linkedin The logo for the LinkedIn social media service. Location Pin A map location pin. Mail An envelope. Telephone An antique telephone. Play A media play button. Search A magnifying glass. Arrow indicating share action A directional arrow. Speech Bubble A speech bubble. Star An outline of a star. Twitter The logo for the Twitter social media service. Urgent Message An exclamation mark in a speech bubble. User A silhouette of a person. Vimeo The logo for the Vimeo video sharing service. Youtube The logo for the YouTube video sharing service. Future of work A logo for the Future of Work category. Inclusive leadership A logo for the Inclusive leadership category. Planetary health A logo for the Planetary health category. Solutions for people A logo for the Solutions for people category. Thriving cities A logo for the Thriving cities category. University for future A logo for the University for future category.