Cruise's AV Performance: Is Danger Being Caught?
The well-known phrase, “Ignorance is bliss,” is, according to a dictionary definition, a phrase used when, “a person who does not know about a problem does not worry about it.”
The “ignorance is bliss” mindset is a massive problem in the autonomous vehicle industry. Many developers and regulators are striving to tackle it.
I’ll go out on a limb and say the biggest risk to autonomous vehicle companies is what they don’t know about their product. Not only can they not fix what they don’t know, they also don’t know what exactly to look for in the first place.
The reluctant solution is to strive to look at everything. This takes a mature, long-lasting view of their technology to commit to constantly searching for the ugly truths about issues and limitations of their technology.
We’re going to look very closely at Cruise’s AV performance in this blog, but the content of this discussion is applicable to all AV developers, as far as I can tell. This is not just about one company, but really the whole industry.
Earlier this month, Cruise posted a video of their autonomous vehicle driving through a dust cloud in Arizona. The vehicle drove autonomously through the dust cloud without swerving or slamming the brakes, and the comment on their video stated it, “doesn't faze a Cruise AV going 45 MPH.”
Looking closely at their video, however, we see two “cyclists” appear briefly when in the dust cloud, but these were, of course, false detections.
It is a huge relief they weren't real cyclists, because one of them was directly in the stopping distance of the planned path, about 35 meters ahead.
So, What’s the Big Deal?
It is expected that AVs absolutely will have many false positive or negative detections of the surrounding objects. But in the region in which the AV’s immediate path planning is concerned, the detection of an object versus free space must be provably certain.
Retrospect’s continuous risk monitoring would have immediately flagged this brief false positive as a safety-critical event, prompting an automated mission abort sequence.
Why? This event demonstrates the maturity still needed before the technology can responsibly manage the driving task on its own. Specifically, the uncertainty as to what may or may not be on the other side of the dust cloud ends up impacting the immediate path planning.
One second before entering the dust cloud, it thought for sure it could proceed ahead through the free space. The next second, it thought a cyclist was suddenly nearing its path, out of nowhere.
Driving beyond one’s ability to manage risk to an acceptable level is a violation of the core driving tenet to, “Always drive within your limits.”
The fact there was no cyclist is irrelevant. The next time the AV approaches a similar dust cloud, it has already been proven by this video that it knows it can’t tell if a cyclist is or is not in its stopping range in that environment. At least not until it may be too late.
Now, we were fortunate enough to get to speak with Cruise briefly about our findings. They were open to gathering feedback and they appreciated our concern.
They mentioned some points to help justify the event in the video, and these points can also be found in GM’s Voluntary Safety Self-Assessment (VSSA) Letter:
1) This was supervised by a safety driver,
2) Their safety drivers follow an “early-and-often” practice of taking over AVs, and
3) Their QA team is constantly reviewing data for misdetections.
These are all great points. The remaining problem is how to reconcile these points with what we see in the video. More frankly, how does the upper management of any AV company reconcile this? Because this is not unique to just Cruise.
The Difficulties in Reconciling this with a Safety Perspective
Consider, first, on the one hand: Let’s say the developers went to a test track in the desert and created dust clouds for a week to prove their perception works well enough in the Arizona dust, and the only thing left to do was drive around and encounter a real-life dust cloud to validate that the testing was representative of real-life.
If this were the case, then the real-life validation failed. It lost track of a forward car at about 70 m in front and briefly detected the two bikes, one in the immediate stopping distance of 35 m.
Furthermore, if the QA team is constantly reviewing perception performance, this event would have been flagged as an issue needing quality improvement, not something to show it was unfazed. So, was the QA team fazed at all by this?
Now, let’s consider the other hand. Let’s say this technology had not been tested in dusty conditions, and this was the first time the AV was going to encounter a real dust cloud.
In this case, the AV is being operated outside of its ODD (operating design domain). This is an environmental condition that the AV has not been designed or tested for.
Our highest recommendation is that safety drivers have clear training on environmental conditions in which the AV can safely operate, and if in doubt, take over.
Developers do the track testing, then train the safety drivers, and that’s how they continuously expand their ODD capabilities. If this were driven outside the ODD, were their safety driver operations fazed at all by this?
While in Cruise’s case they had a trained safety driver and they judged the situation safe to proceed, did the driver really know if the AV could handle dust clouds? We already saw its perception struggled briefly.
What if it had been worse? If the AV had detected the bike for longer than a few fractions of a second, could that have led to a sudden deceleration?
Does the chance of a rear-end collision increase if the AV stops or slows suddenly in the road for no reason, with nothing in front of it? Sure. Plus, it is now obscured by the dust cloud, making it hard for the drivers approaching from behind to see the AV.
In general, this can be a hazardous situation because harm could occur from a single, cascading failure in the AV. Specifically, a false object detection failure could lead to a sudden stop and rear-end collision, which we’ve discussed in detail in another blog.
The safety quandary is this could have been operated autonomously outside its ODD, which is not recommended, and if it was within the ODD, then something failed and needs further rework.
So, again, we have to ask: What is it about Cruise’s technology, safety driver, and QA monitoring all not being “fazed” by this that is supposed to be taken positively?
Don’t Trust Your Eyes
Road testing autonomous vehicles is extremely expensive. It takes a tremendous amount of setup and supporting processes to collect and process so much data. The hope is that all that effort pays off by either:
1) finding issues that need to be fixed, or
2) validating that the system works and all assumptions were correct.
If that’s not being done, then what is any AV company hoping to find by testing on the roads? And is it really worth it?
We hope all stakeholders and upper managers of AV companies take this question very seriously:
Do you strategically know what your road testing is accomplishing towards your next launch?
Are you maximizing all the hidden value in the data that is generated?
If the answer is: “to get more miles,” then is there an auditable system that traces those miles to a particular software release and finally to validated results?
If so, would this instance of driving through a dust cloud have made it into the “good” category of miles, even though it clearly shows the inability to detect for sure or not if there’s an object in a dust cloud?
There is a danger in public relations to blur the lines between something of interest to customers, versus overpromising on capabilities, particularly with safety.
In my opinion, the worst offenders are those companies going “no driver” and operating autonomously primarily for the benefit of generating marketing videos on social media, or trying to achieve year-end goals, in which demonstrably safe operations appear to be excluded from among the goals.
Was Cruise’s intent with this video to convince the public this vehicle was demonstrably safe in dust clouds, since it didn’t appear to be “fazed” on the surface?
This is precisely the problem with surface level behavior, that is, the behavior we simply observe from the outside. Surface level behavior does not serve as a sign of safety.
Just because the vehicle didn’t suddenly stop or swerve does not mean it is correct or safe. Many inattentive drivers tend to not be fazed, either, as they drive into potentially dangerous situations.
An outside observer could watch a driver navigate through a scenario and think either, “Wow, that driver must be really good,” or, “Wow, that driver must be an idiot,” and yet the behavior is identical.
It all depends on the rationale on which the driver based their decisions to behave a certain way.
The only solution to AV safety is to become extremely attentive to the driving rationale, especially at scrutinizing the changes in risk in the immediate path planning area. This is the only viable path forward, and I mean viable in the strictest sense.
AV companies must build up internal monitors and dashboards that pour out all the bugs and issues up to the highest levels of the company.
The more issues found, the better, and I’m not joking, because that means the monitoring system is actually working if it’s finding more and more issues.
Safety Culture is a Culture of Management Ownership
Ultimately, management should want to own all those issues, and build their product and processes around finding those issues. Sometimes management doesn’t want to know.
In Cruise’s case, they 100% agree with us that getting executive and management involvement is vital to managing safety and have a formal Safety Management System in place which includes management engagement early in the safety process.
There are two opposing philosophies when it comes to finding product issues. The first is “Ignorance is Bliss.” It’s better not to know. Sadly, this is not uncommon.
The second comes from the lean methods originating with Toyota and Deming, which I will summarize by Peter Drucker’s words, “No problem? No manager.”
Drucker tells the story in which a Toyota manager kept asking for weekly problem reports from two new American managers working under him. They kept assuring him everything was fine, as that was the custom they were used to. His response was if everything’s fine, then they don’t need a manager, “No problem? No manager,” and they quickly started finding problems to report.
It’s the second philosophy, the one in which management wants to know what the problem is that the employees are dealing with, that is the right approach and leads to success.
The other philosophy, the “ignorance is bliss” approach leads to overpromised planning, short-term gains, and eventually jumping roles as the ship takes on water. And let’s be blunt, it leads to far worse outcomes; potentially life-ending.
We find the correct philosophy in the first two of these 14 principles in the Lean Toyota Production System (TPS) as shown in this blog:
• Principle 1: Base your management decisions on a long-term philosophy, even at the expense of short-term financial goals.
• Principle 2: Create continuous process flow to bring problems to the surface.
Bingo. Think long-term. Don’t worry about short-term financials. Create continuous process flow of issues to the surface for everyone to see.
When issues are found, like the issues shown with Cruise’s perception in the dust cloud, these are gold to management. These reveal where the structural weaknesses are most likely to be. They can plan accordingly. They can address the high-risk safety concerns first, launch sooner, and continuously improve functionality.
Systematically scrutinizing autonomous path planning for issues is not only great from a business perspective, it can actually become a part of the developer’s safety argument. This is precisely why we created RiskEngine™ and are continuing to work on quantitative AV monitoring solutions.
It allows developers to claim, “Yes, we likely have certain bugs and issues in our software we’re not aware of. But we can still operate safely by detecting those bugs early.”
They can claim this so long as the observability or detectability of those bugs, which can be tested through fault injection, is many orders of magnitude greater than the likelihood of the environmental circumstances occurring such that there is real harm caused by the bug.
In short, any bugs will be exposed in operation long before the bug actually leads to an accident. But, developers can only claim that if their management is using systematic risk monitoring of the path planning algorithm.
We should stop pretending that AVs will ever be “bug-free” and that by many miles of expensive testing we will somehow convince litigators that the AV was bug-free.
The sooner we all admit that software bugs are likely here to stay in autonomy, and we build an effective safety monitoring system around them, then the sooner AVs can be fully deployed and provide value to their customers and to their companies.
In closing, all AV company executives need to ask these questions:
Are the roles of the Functional Safety and Operational Safety teams to make safety problems go away, or to make safety problems known?
Are the safety team’s development milestones to show no more risks, or to start showing new risks?
Hopefully this blog has made the right answers to these questions fairly obvious.
Until next time…
We want to thank Cruise for taking the time to speak with us about these safety concerns, even if it wasn’t as positive as we’d like from our end. We would also really appreciate hearing your thoughts on autonomous safety.
If there is anything you disagree with, or if there was anything new that you learned, please contact us, or leave a comment below.
It helps us to get feedback, and obviously we’d love nothing more than talking about AV risk monitors, but creating a dialogue is how the industry is ultimately going to solve the technical challenges in AV safety. We appreciate your help in spreading the word on these critical discussions.