Understanding risk, artificial intelligence and software quality improvement

The software discipline is widely included in all NASA Mission Directorates. Some recent discipline focus and development areas are highlighted below, along with a look at the Software Technical Discipline Team’s (TDT) approach to developing the discipline’s best practices going forward.

Understanding the risks of automation

Software creates automation. Reliance on that automation increases the amount of software in NASA’s programs. This year, the software team examined historical aviation software incidents to describe how, why, and where software or automation is most likely to fail. The goal is to better design software to reduce the risk of errors, improve software processes, and better design software for error tolerance (or improve error tolerance if errors occur).

Some key findings shown in the diagrams above indicate that software does the wrong thing more often than it just crashes. Rebooting has been found to be ineffective when software misbehaves. Unexpected behavior is mostly attributed to the code or logic itself, and about half of those cases were the result of a software defect—the software not being present due to unexpected situations or missing requirements. This can mean that even fully tested software is exposed to this significant class of error. Data misconfiguration was a significant factor that continues to grow with the advent of more modern data-driven systems. The last subjective category that was assessed was “unknown unknowns” – things that could not reasonably be expected. They accounted for 19% of the studied software incidents.

The software team uses and shares these findings to improve best practices. The importance of full requirements, non-nominal test campaigns and “testing on the fly” using real hardware in the loop is emphasized more. When designing a fault-tolerant system, more attention should be paid to detecting and correcting misbehavior than just checking for crashes. Less trust should be placed on rebooting as an effective recovery strategy. Backup strategies for automation should be used for critical applications—given the historical prevalence of absent software and unknown unknowns. More information can be found in NASA/TP-20230012154, Aviation Software Error Incident Categorizations.

Using AI and machine learning techniques

The rise of artificial intelligence (AI) and machine learning (ML) techniques has allowed NASA to examine data in new ways not previously possible. While NASA has used autonomy since its inception, AI/ML techniques provide teams with the ability to expand the use of autonomy beyond its current limits. The agency works on ethical frameworks for artificial intelligence and examines standards, procedures and practices, taking into account security implications. While AI/ML generally uses non-deterministic statistical algorithms that currently limit its use in safety-critical flight applications, NASA uses it in more than 400 AI/ML projects that support research and science. The agency also uses AI/ML communities of practice to share knowledge between centers. TDT examined AI/ML work across the Agency and summarized it for trends and lessons.

Common uses of AI/ML include image recognition and identification. NASA’s Earth science missions use AI/ML to identify marine debris, measure cloud thickness, and recognize wildfire smoke (examples are shown in the satellite images below). This reduces the workload of the staff. There are many applications of AI/ML used to predict atmospheric physics. One example is predicting the tracking and intensity of hurricanes. Another example is predicting the thickness of the planetary boundary layer and comparing it to measurements, and these predictions are combined with live data to improve performance over previous boundary layer models.

Code Analysis Pipeline: A static analysis tool for IV&V and software quality improvement

The Code Analysis Pipeline (CAP) is an open source tool architecture that supports software development and assurance activities, improving overall software quality. The Independent Verification and Validation (IV&V) program uses CAP to support software assurance on the Human Landing System, Gateway, Ground Research Systems, Orion, and Roman. CAP supports the configuration and automated execution of multiple static code analysis tools to identify potential code defects, generate code metrics that indicate potential quality problem areas (eg, cyclomatic complexity), and execute any other tool that analyzes or processes source code. TDT is aimed at integrating coverage analysis support with modified coverage testing conditions/decisions. Results from the tool are consolidated into a central database and presented in context through a user interface that supports viewing, querying, reporting, and analysis of results as the code matures.

The tool’s architecture is based on an industry-standard DevOps approach to continuously build source code and run tools. CAP integrates with GitHub for source code control, uses Jenkins to support analysis build automation, and uses Docker to create standard and custom build environments that support unique mission needs and use cases.

Software process improvement and best practice sharing

TDT gathered best practice knowledge from all centers in NPR 7150.2, NASA Software Engineering Requirements and NASA-HDBK-2203, NASA Software Engineering and Assurance Handbook (https://swehb.nasa.gov.) Two APPEL training classes were developed and shared with several organizations to give them a foundation in NPR management and software engineering. TDT has established several program/project assistance sub-teams dealing with software architecture, project management, requirements, cybersecurity, test and verification, and programmable logic controllers. Many of these teams have developed guidelines and best practices, which are documented in NASA-HDBK-2203 and on the NASA Engineering Network.

NPR 7150.2 and the manual describe best practices throughout the life cycle for all NASA software. This includes requirements development, architecture, design, implementation and verification. Also covered, and equally important, are the supporting activities/functions that improve quality, including software assurance, security configuration management, reuse, and software procurement. The rationale and guidance for the requirements is set out in a manual that is available internally and externally and is regularly updated as new information, tools and techniques are found and used.

Software TDT deputies train software engineers, systems engineers, principal engineers, and project managers on NPR requirements and their role in ensuring that those requirements are implemented at NASA centers. Additionally, TDT deputies train software technical managers in many advanced aspects of software engineering management, including planning, cost estimating, negotiation, and change management.

Source link