7:30 AM – 8:30 AM
Registration / Check-in
8:30 AM – 8:40 AM
Room A+B
Opening Remarks
Norty Schwartz
(Institute for Defense Analyses)
Norton A. Schwartz serves as President of the Institute for Defense Analyses (IDA), a nonprofit corporation operating in the public interest. IDA manages three Federally Funded Research and Development Centers that answer the most challenging U.S. security and science policy questions with objective analysis leveraging extraordinary scientific, technical, and analytic expertise. At IDA, General Schwartz (U.S. Air Force, retired) directs the activities of more than 1,000 scientists and technologists employed by IDA.General Schwartz has a long and prestigious career of service and leadership that spans over 5 decades. He was most recently President and CEO of Business Executives for National Security (BENS). During his 6-year tenure at BENS, he was also a member of IDA’s Board of Trustees.Prior to retiring from the U.S. Air Force, General Schwartz served as the 19th Chief of Staff of the U.S. Air Force from 2008 to 2012. He previously held senior joint positions as Director of the Joint Staff and as the Commander of the U.S. Transportation Command. He began his service as a pilot with the airlift evacuation out of Vietnam in 1975.General Schwartz is a U.S. Air Force Academy graduate and holds a master’s degree in business administration from Central Michigan University. He is also an alumnus of the Armed Forces Staff College and the National War College.He is a member of the Council on Foreign Relations and a 1994 Fellow of Massachusetts Institute of Technology’s Seminar XXI. General Schwartz has been married to Suzie since 1981.
8:40 AM – 9:20 AM
Room A+B
Keynote 1
Shawn N. Bratton
(United States Space Force)
Maj Gen Shawn N. Bratton is Commander, Space Training and Readiness Command, temporarily located at Peterson Space Force Base, Colorado.
Space Training and Readiness Command was established as a Field Command 23 August 2021, and is responsible for preparing the USSF and more than 6,000 Guardians to prevail in competition and conflict through innovative education, training, doctrine, and test activities.
Maj Gen Bratton received his commission from the Academy of Military Science in Knoxville, Tenn. Prior to his commissioning Maj Gen Bratton served as an enlisted member of the 107th Air Control Squadron, Arizona Air National Guard. He has served in numerous operational and staff positions. Maj Gen Bratton was the first Air National Guardsman to attend Space Weapons Instructor Course at Nellis Air Force Base. He deployed to the Air Component Coordination Element, Camp Victory Iraq for Operation IRAQI FREEDOM, he served as the USNORTHCOM Director of Space Forces, and commanded the 175th Cyberspace Operations Group, Maryland Air National Guard. He also served as the Deputy Director of Operations, USSPACECOM.
Prior to his current assignment, Maj Gen Bratton served as the Space Training and Readiness Task Force Lead.
EDUCATION1993 Bachelor of Arts in Secondary Education, Arizona State University, Tempe, Ariz.1994 Academy of Military Science, Knoxville, Tenn1999 Squadron Officer School, Maxwell AFB, Ala.2005 Master’s Certificate, Homeland Security Studies, Naval Postgraduate School, Monterey, Calif.2005 Air Command and Staff College, by correspondence2010 Air War College, by correspondence2011 Naval War College, in residence2011 Master’s degree in National Security Studies, Naval War College, Newport, R.I.
ASSIGNMENTS1. March – April 1987, Basic Training, Lackland AFB, Texas2. April – December 1987, Student, Aircraft Control and Early Warning Systems, Keesler AFB, Miss3. January 1988 – September 1994, Aircraft Control and Warning Radar Technician, 107th Air Control Squadron, Arizona Air National Guard, Phoenix, Ariz.4. September – October 1994, Student, Academy of Military Science, Knoxville, Tenn.5. October 1994 – August 1999, Communications Officer, 107th Air Control Squadron, Phoenix, Ariz6. August 1999 – March 2000, Operations Officer, Joint Counter-Narcotics Task Force, Phoenix, Ariz.7. March 2000 – December 2003, Action Officer, Headquarters Air Force Space Command HQ AFSPC/CG, Peterson AFB, Colo.8. January 2003 – June 2003, Student, USAF Weapons School, Nellis AFB, Nev.9. July 2003 – July 2005, Chief, Weapons and Tactics, HQ AFSPC/A3, Peterson AFB, Colo.10. August 2005 – July 2007, Weapons Officer, Det 2, Arizona ANG, Sky Harbor International Airport, Phoenix, Ariz.11. August 2007 – July 2010, ANG Advisor to Commander, 14th Air Force, Vandenberg AFB, Calif.12. August 12. 2010 – June 2011, Student, Naval War College, Newport, Rhode Island13. June 2011 – February 2014, ANG Advisor to AFSPC/A3 & A6, Peterson AFB, Colo.14. March 2014 – May 2015, ANG Advisor to the Commander, AFSPC, Peterson AFB, Colo.15. May 2015 – June 2017, Commander, 175th Cyberspace Operations Group, Warfield ANG Base, Md.16. July 2017 – Jan 2019, ANG Advisor to the Commander, AFSPC, Peterson AFB, Colo.17. Jan 2019 – Aug 2019, Deputy Director of Operations, Plans, and Training, Joint Force Space Component Command, Peterson AFB, Colo18. Apr 2019 – Aug 2019, Special Assistant to the Chief National Guard Bureau for Space, Schriever AFB, Colo.19. Aug 2019 – Feb 2021, Deputy Director of Operations, United States Space Command, Schriever AFB, Colo.20. Feb 2021 – Aug 2021, Space Training and Readiness Task Force Lead, United States Space Force, Peterson AFB, Colo.21. Aug 2021 – Present, Commander Space Training and Readiness Command, United States Space Force, Peterson Space Force Base, Colo.
MAJOR AWARDS AND DECORATIONSDefense Superior Service MedalBronze StarMeritorious Service Medal with six devicesAir Force Commendation Medal with deviceAir Force Achievement Medal with four devicesJoint Meritorious Unit AwardIraq Campaign MedalGlobal War on Terrorism Service MedalUSAF NCO PME Graduate Ribbon
EFFECTIVE DATES OF PROMOTIONSecond Lieutenant 29 September 1994First Lieutenant 30 September 1996Captain 3 October 1998Major 17 October 2002Lieutenant Colonel 1 August 2007Colonel 26 May 2011Brigadier General 2 April 2019Major General 17 February 2022
(Current as of March 22)
9:20 AM – 10:00 AM
Room A+B
Keynote 2
Peter Coen
(NASA)
Peter Coen currently serves as the Mission Integration Manager for NASA’s Quesst Mission. His primary responsibility in this role is to ensure that the X-59 aircraft development, in-flight acoustic validation and community test elements of the Mission stay on track toward delivering on NASA’s Critical Commitment to provide quiet supersonic overflight response data to the FAA and the International Civil Aviation Organization.
Previously, Peter was the manager for the Commercial Supersonic Technology Project in NASA’s Aeronautics Research Mission, where he led a team from the four NASA Aero Research Centers in the development of tools and technologies for a new generation of quiet and efficient supersonic civil transport aircraft.
Peter’s NASA career spans almost 4 decades. During this time, he has studied technology integration in practical designs for many different types of aircraft and has made technical and management contributions to all of NASA’s supersonics related programs over the past 30 years. As Project Manager, he led these efforts for 12 years.
Peter is a licensed private pilot who has amassed nearly 30 seconds of supersonic flight time.
10:00 AM – 10:20 AM
Break
10:20 AM – 12:20 PM: Parallel Sessions
Room A

Virtual Session
Session 1A: Cybersecurity in Test & Evaluation
Session Chair: Mark Herrera, IDA
Planning for Public Sector Test and Evaluation in the Commercial Cloud
Brian Conway (IDA)
Lee Allison received his Ph.D. in experimental nuclear physics from Old Dominion University in 2017 studying a specialized particle detector system. Lee is now a Research Staff Member at the Institute for Defense Analyses in the cyber operational testing group where he focuses mainly on Naval and Land Warfare platform-level cyber survivability testing. Lee has also helped to build one of IDA’s cyber lab environments that both IDA staff members, DOT&E staff, and the OT community can use to better understand cyber survivability test and evaluation.Brian Conway holds a B.S. from the University of Notre Dame and a Ph.D. from Pennsylvania State University, where he studied solvation dynamics in ionic liquid mixtures with conventional solvents. He joined the Institute for Defense Analyses in 2019, and has since supported operational testing in both the Departments of Defense and Homeland Security. There, he focuses on cyber testing of Naval and Net-Centric and Space Systems to evaluate whether adversaries have the ability to exploit those systems and effect the missions executed by warfighters.
As the public sector shifts IT infrastructure toward commercial cloud solutions, the government test community needs to adjust its test and evaluation (T&E) methods to provide useful insights into a cloud-hosted system’s cyber posture. Government entities must protect what they develop in the cloud by enforcing strict access controls and deploying securely configured virtual assets. However, publicly available research shows that doing so effectively is difficult, with accidental misconfigurations leading to the most commonly observed exploitations of cloud-hosted systems. Unique deployment configurations and identity and access management across different cloud service providers increases the burden of knowledge on testers. More care must be taken during the T&E planning process to ensure that test teams are poised to succeed in understanding the cyber posture of cloud-hosted systems and finding any vulnerabilities present in those systems. The T&E community must adapt to this new paradigm of cloud-hosted systems to ensure that vulnerabilities are discovered and mitigated before an adversary has the opportunity to use those vulnerabilities against the system.
Cyber Testing Embedded Systems with Digital Twins
Michael Thompson (Naval Postgraduate School)
Mike Thompson is a Research Associate at the Naval Postgraduate School. Mike is the lead developer of the RESim reverse engineering platform, which grew out of his work as a member of the competition infrastructure development team for DARPA’s Cyber Grand Challenge. He is the lead developer for the Labtainers cyber lab exercise platform and for the CyberCIEGE educational video game. Mike has decades of experience developing products for software vulnerability analysis, cybersecurity education and high assurance trusted platforms.
Dynamic cyber testing and analysis require instrumentation to facilitate measurements, e.g., to determine which portions of code have been executed, or detection of anomalous conditions which might not manifest at the system interface. However, instrumenting software causes execution to diverge from the execution of the deployed binaries. And instrumentation requires mechanisms for storing and retrieving testing artifacts on target systems. RESim is a dynamic testing and analysis platform that does not instrument software. Instead, RESim instruments high fidelity models of target hardware upon which software-under-test executes, providing detailed insight into program behavior. Multiple modeled computer platforms run within a single simulation that can be paused, inspected and run forward or backwards to selected events such as the modification of a specific memory address. Integration of the Google’s AFL fuzzer with RESim avoids the need to create fuzzing harnesses because programs are fuzzed in their native execution environment, commencing from selected execution states with data injected directly into simulated memory instead of I/O streams. RESim includes plugins for the IDA Pro and NSA’s Ghidra disassembler/debuggers to facilitate interactive analysis of individual processes and threads, providing the ability to skip to selected execution states (e.g., a reference to an input buffer) and “reverse execution” to reach a breakpoint by appearing to run backwards in time. RESim simulates networks of computers through use of Wind River’s Simics platform of high fidelity models of processors, peripheral devices (e.g., network interface cards), and memory. The networked simulated computers load and run firmware and software from images extracted from the physical systems being tested. Instrumenting the simulated hardware allows RESim to observe software behavior from the other side of the hardware, i.e., without affecting its execution.Simics includes tools to extend and create high fidelity models of processors and devices, providing a clear path to deploying and managing digital twins for use in developmental test and evaluation. The simulations can include optional real-world network and bus interfaces to facilitate integration into networks and test ranges. Simics is a COTS product that runs on commodity hardware and is able to execute several parallel instances of complex multi-component systems on a typical engineering workstation or server.This presentation will describe RESim and strategies for using digital twins for cyber testing of embedded systems. And the presentation will discuss some of the challenges associated with fuzzing non-trivial software systems.
Test and Evaluation of AI Cyber Defense Systems
Shing-hon Lau (Software Engineering Institute, Carnegie Mellon University)
Shing-hon Lau is a Senior Cybersecurity Engineer at the CERT Division of the Software Engineering Institute at Carnegie Mellon University, where he investigates the intersection between cybersecurity, artificial intelligence, and machine learning. His research interests include rigorous testing of artificial intelligence systems, building secure and trustworthy machine learning systems, and understanding the linkage between cybersecurity and adversarial machine learning threats. One research effort concerns the development of a methodology to evaluate the capabilities of AI-powered cybersecurity defensive tools.Prior to joining the CERT Division, Lau obtained his PhD in Machine Learning in 2018 from Carnegie Mellon. His doctoral work focused on the application of keystroke dynamics, or the study of keyboard typing rhythms, for authentication, insider-threat detection, and healthcare applications.
Adoption of Artificial Intelligence and Machine Learning powered cybersecurity defenses (henceforth, AI defenses) has outpaced testing and evaluation (T&E) capabilities. Industrial and governmental organizations around the United States are employing AI defenses to protect their networks in ever increasing numbers, with the commercial market for AI defenses currently estimated at $15 billion and expected to grow to $130 billion by 2030. This adoption of AI defenses is powered by a shortage of over 500,000 cybersecurity staff in the United States, by a need to expeditiously handle routine cybersecurity incidents with minimal human intervention and at machine speeds, and by a need to protect against highly sophisticated attacks. It is paramount to establish, through empirical testing, trust and understanding of the capabilities and risks associated with employing AI defenses.While some academic work exists for performing T&E of individual machine learning models trained using cybersecurity data, we are unaware of any principled method for assessing the capabilities of a given AI defense within an actual network environment. The ability of AI defenses to learn over time poses a significant T&E challenge, above and beyond those faced when considering traditional static cybersecurity defenses. For example, an AI defense may become more (or less) effective at defending against a given cyberattack as it learns over time. Additionally, a sophisticated adversary may attempt to evade the capabilities of an AI defense by obfuscating attacks to maneuver them into its blind spots, by poisoning the training data utilized by the AI defense, or both.Our work provides an initial methodology for performing T&E of on-premises network-based AI defenses on an actual network environment, including the use of a network environment with generated user network behavior, automated cyberattack tools to test the capabilities of AI cyber defenses to detect attacks on that network, and tools for modifying attacks to include obfuscation or data poisoning. Discussion will also center on some of the difficulties with performing T&E on an entire system, instead of just an individual model.
Test and Evaluation of Systems with Embedded Artificial Intelligence Components
Michael R. Smith (Sandia National Laboratories)
Michael R. Smith is a Principal Member at Sandia National Laboratories. He previously earned his PhD at Brigham Young University for for his work on instance-level metalearning. In his current, his research focuses on the explainability, credibility, and validation of machine learned models in high-consequence applications and their effects on decision making.
As Artificial Intelligence (AI) continues to advance, it is being integrated into more systems. Often, the AI component represents a significant portion of the system that reduces the burden on the end user or significantly improves the performance of a task. The AI component represents an unknown complex phenomenon that is learned from collected data without the need to be explicitly programmed. Despite the improvement in performance, the models are black boxes. Evaluating the credibility and the vulnerabilities of AI models poses a gap in current test and evaluation practice. For high consequence applications, the lack of testing and evaluation procedures represents a significant source of uncertainty and risk. To help reduce that risk, we have developed a red-teaming inspired methodology to evaluate systems embedded with an AI component. This methodology highlights the key expertise and components that are needed beyond what a typical red team generally requires. Opposed to academic evaluation of AI models, we present a system-level evaluation rather than the AI model in isolation. We outline three axes along which to evaluate an AI component: 1) Evaluating the performance of the AI component to ensure that the model functions as intended and is developed based on bast practices developed by the AI community. This process entails more than simply evaluating the learned model. As the model operates on data used for training as well as perceived by the system, peripheral functions such as feature engineering and the data pipeline need to be included. 2) AI components necessitate supporting infrastructure in deployed systems. The support infrastructure may introduce additional vulnerabilities that are overlooked in traditional test and evaluation processes. Further, the AI component may be subverted by modifying key configuration files or data pipeline components. 3) AI models introduce possible vulnerabilities to adversarial attacks. These could be attacks designed to evade detection by the model, steal the model, poison the model, steal the model or data, or misuse the model to act inappropriately. Within the methodology, we highlight tools that may be applicable as well as gaps that need to be addressed by the community.SNL is managed and operated by NTESS under DOE NNSA contract DE-NA0003525
Room B

Virtual Session
Session 1B:
Session Chair: Tom Donnelly, JMP Statistical Discovery
On the Validation of Statistical Software
Ryan Lekivetz (JMP Statistical Discovery)
Ryan Lekivetz is the manager of the Design of Experiments (DOE) and Reliability team that develops those platforms in JMP. He earned his doctorate in statistics from Simon Fraser University in Burnaby, BC, Canada, and has publications related to topics in DOE in peer-reviewed journals. He looks for ways to apply DOE in other disciplines and even his everyday life.
Validating statistical software involves a variety of challenges. Of these, the most difficult is the selection of an effective set of test cases, sometimes referred to as the “test case selection problem”. To further complicate matters, for many statistical applications, development and validation are done by individuals who often have limited time to validate their application and may not have formal training in software validation techniques. As a result, it is imperative that the adopted validation method is efficient, as well as effective, and it should also be one that can be easily understood by individuals not trained in software validation techniques. As it turns out, the test case selection problem can be thought of as a design of experiments (DOE) problem. This talk discusses how familiar DOE principles can be applied to validating statistical software.
Validating the Prediction Profiler with Disallowed Combination: A Case Study
Yeng Saanchi (JMP Statistical Discovery)
Yeng Saanchi is an Analytic Software Tester at JMP Statistical Discovery LLC, a SAS company. Her research interests include stochastic optimization and applications of optimal experimental designs in precision medicine.
The prediction profiler is an interactive display in JMP statistical software that allows a user to explore the relationships between multiple factors and responses. A common use case of the profiler is for exploring the predicted model from a designed experiment. For experiments with a constrained design region defined by disallowed combinations, the profiler was recently enhanced to obey such constraints. In this case study, we show how a DOE based approach to validating statistical software was used to validate this enhancement.
Introducing Self-Validated Ensemble Models (SVEM) – Bringing Machine Learning to DOEs
Chris Gotwalt (JMP Statistical Discovery)
Chris joined JMP in 2001 while obtaining his Ph.D. in Statistics at NCSU. Chris has made many contributions to JMP, mostly in the form of the computational algorithms that fit models or design experiments. He developed JMP’s algorithms for fitting neural networks, mixed models, structural equations models, text analysis, and many more. Chris leads a team of 20 statistical software developers, testers, and technical writers. Chris was the 2020 Chair of the Quality and Productivity Section of the American Statistical Association and has held adjunct professor appointments at Univ. of Nebraska, Univ. of New Hampshire, and NCSU, guiding dissertation research into generalized linear mixed models, extending machine learning techniques designed experiments, and machine learning based imputation strategies.
DOE methods have evolved over the years, as have the needs and expectations of experimenters. Historically, the focus emphasized separating effects to reduce bias in effect estimates and maximizing hypotheses testing power, which are largely a reflection of the methodological and computational tools of their time. Often DOE in industry is done to predict product or process behavior under possible changes. We introduce Self-Validating Ensemble Models (SVEM), an inherently predictive algorithmic approach to the analysis of DOEs, generalizing the fractional bootstrap to make machine learning and bagging possible for small datasets common in DOE. In many DOE applications the number of rows is small, and the factor layout is carefully structured to maximize information gain in the experiment. Applying machine learning methods to DOE is generally avoided because they begin with a partitioning the rows into a training set for model fitting and a holdout set for model selection. This alters the structure of the design in undesirable ways such as randomly introducing effect aliasing. SVEM avoids this problem by using a variation of the fractionally weighted bootstrap to create training and validation versions of the complete data that differ only in how rows are weighted. The weights are reinitialized, and models refit multiple times so that our final SVEM model is a model average, much like bagging. We find this allows us to fit models where the number of estimated effects exceeds the number of rows. We will present simulation results showing that in these supersaturated cases SVEM outperforms existing approaches like forward selection as measured by prediction accuracy.
Effective Application of Self-Validated Ensemble Models in Challenging Test Scenarios
James Wisnowski (Adsurgo)
Dr. James Wisnowski is the co-founder of Adsurgo. He currently provides training and consulting services to industry and government in Reliability Engineering, Design of Experiments (DOE), and Applied Statistics. He retired as an Air Force officer with over 20 years of service as a commander, joint staff officer, Air Force Academy professor, and operational tester. He received his PhD in Industrial Engineering from Arizona State University and is currently a faculty member at the Colorado School of Mines Department of Mechanical Engineering teaching DOE. Some conference presentations and journal articles on applied statistics are shown at https://scholar.google.com/scholar?hl=en&as_sdt=0%2C44&q=james+wisnowski&btnG=
We test the efficacy of SVEM versus alternative variable selection methods in a mixture experiment setting. These designs have built-in dependencies that require modifications of the typical design and analysis methods. The usual design metric of power is not helpful for these tests and analyzing results becomes quite challenging, particularly for factor characterization. We provide some guidance and lessons learned from hypersonic fuel formulation experience. We also show through simulation favorable combinations of design and Generalized Regression analysis options that lead to the best results. Specifically, we quantify the impact of changing run size, including complex design region constraints, using space-filling vs optimal designs, including replicates and/or center runs, and alternative analysis approaches to include full model, backward stepwise, SVEM forward selection, SVEM Lasso, and SVEM neural network.
Room C
Session 1C: Statistical and Systems Engineering Applications in Aerospace
Session Chair: TBD
The Containment Assurance Risk Framework of the Mars Sample Return Program
Giuseppe Cataldo (NASA)
Giuseppe Cataldo is theheadofplanetary protectionof theMars Sample Return (MSR) Capture, Containment and Return System (CCRS). His expertise is in the design, testing and management of space systems. He has contributed to a variety of NASA missions and projects including the James Webb Space Telescope, where he developed a Bayesian framework for model validation and a multifidelity approach to uncertainty quantification forlarge-scale, multidisciplinary systems. Giuseppe holds a PhD in Aeronautics and Astronautics from the Massachusetts Institute of Technology (MIT) and several master’s degrees from Italy and France.
The Mars Sample Return campaign aims at bringing rock and atmospheric samples from Mars to Earth through a series of robotic missions. These missions would collect the samples being cached and deposited on Martian soil by the Perseverance rover, place them in a container, and launch them into Martian orbit for subsequent capture by an orbiter that would bring them back. Given there exists a non-zero probability that the samples contain biological material, precautions are being taken to design systems that would break the chain of contact between Mars and Earth. These include techniques such as sterilization of Martian particles, redundant containment vessels, and a robust reentry capsule capable of accurate landings without a parachute.Requirements exist that the probability of containment not assured of Martian-contaminated material into Earth’s biosphere be less than one in a million. To demonstrate compliance with this strict requirement, a statistical framework was developed to assess the likelihood of containment loss during each sample return phase and make a statement about the total combined mission probability of containment not assured. The work presented here describes this framework, which considers failure modes or fault conditions that can initiate failure sequences ultimately leading to containment not assured. Reliability estimates are generated from databases, design heritage, component specifications, or expert opinion in the form of probability density functions or point estimates and provided as inputs to the mathematical models that simulate the different failure sequences. The probabilistic outputs are then combined following the logic of several fault trees to compute the ultimate probability of containment not assured. Given the multidisciplinary nature of the problem and the different types of mathematical models used, the statistical tools needed for analysis are required to be computationally efficient. While standard Monte Carlo approaches are used for fast models, a multi-fidelity approach to rare event probabilities is proposed for expensive models. In this paradigm, inexpensive low-fidelity models are developed for computational acceleration purposes while the expensive high-fidelity model is kept in the loop to retain accuracy in the results. This work presents an example of end-to-end application of this framework highlighting the computational benefits of a multi-fidelity approach.The decision to implement Mars Sample Return will not be finalized until NASA’s completion of the National Environmental Policy Act process. This document is being made available for information purposes only.
Large-scale cross-validated Gaussian processes for efficient multi-purpose emulators
Jouni Susiluoto (NASA Jet Propulsion Laboratory, California Institute of Technology)
Dr. Jouni Susiluoto is a Data Scientist at NASA Jet Propulsion Laboratory in Pasadena, California. His main research focus has recently been in inversion algorithms and forward model improvements for current and next-generation hyperspectral imagers, such as AVIRIS-NG, EMIT, and SBG. This research heavily leans on new developments in high-efficiency cross-validated Gaussian process techniques, a research topic that he has closely pursued together with Prof. Houman Owhadi’s group at Caltech. Susiluoto’s previous work includes a wide range of data science, uncertainty quantification and modeling applications in geosciences, such as spatio-temporal data fusion with very large numbers of data, Bayesian model selection, chaotic model analysis and parameter estimation, and climate and carbon cycle modeling. He has a doctorate in mathematics from University of Helsinki, Finland.
We describe recent advances in Gaussian process emulation, which allow us to both save computation time and to apply inference algorithms that previously were too expensive for operational use. Specific examples are given from the Earth-orbiting Orbiting Carbon Observatory and the future Surface Biology and Geology Missions, dynamical systems, and other applications. While Gaussian processes are a well-studied field, there are surprisingly important choices that the community has not paid so much attention to this far, including dimension reduction, kernel parameterization, and objective function selection. This talk will highlight some of those choices and help understand what practical implications they have.
A Bayesian Optimal Experimental Design for High-dimensional Physics-based Models
James Oreluk (Sandia National Laboratories)
James Oreluk is a postdoctoral researcher at Sandia National Laboratories in Livermore, CA. He earned his Ph.D. in Mechanical Engineering from UC Berkeley with research on developing optimization methods for validating physics-based models. His current research focuses on advancing uncertainty quantification and machine learning methods to efficiently solve complex problems, with recent work on utilizing low-dimensional representation for optimal decision making.
Many scientific and engineering experiments are developed to study specific questions of interest. Unfortunately, time and budget constraints make operating these controlled experiments over wide ranges of conditions intractable, thus limiting the amount of data collected. In this presentation, we discuss a Bayesian approach to identify the most informative conditions, based on the expected information gain. We will present a framework for finding optimal experimental designs that can be applied to physics-based models with high-dimensional inputs and outputs. We will study a real-world example where we aim to infer the parameters of a chemically reacting system, but there are uncertainties in both the model and the parameters. A physics-based model was developed to simulate the gas-phase chemical reactions occurring between highly reactive intermediate species in a high-pressure photolysis reactor coupled to a vacuum-ultraviolet (VUV) photoionization mass spectrometer. This time-of-flight mass spectrum evolves in both kinetic time and VUV energy producing a high-dimensional output at each design condition. The high-dimensional nature of the model output poses a significant challenge for optimal experimental design, as a surrogate model is built for each output. We discuss how accurate low-dimensional representations of the high-dimensional mass spectrum are necessary for computing the expected information gain. Bayesian optimization is employed to maximize the expected information gain by efficiently exploring a constrained design space, taking into account any constraint on the operating range of the experiment. Our results highlight the trade-offs involved in the optimization, the advantage of using optimal designs, and provide a workflow for computing optimal experimental designs for high-dimensional physics-based models.
Systems Engineering Applications of UQ in Space Mission Formulation
Kelli McCoy (NASA Jet Propulsion Laboratory)
Kelli McCoy began her career at NASA Kennedy Space Center as an Industrial Engineer in the Launch Services Program, following her graduation from Georgia Tech with a M.S in Industrial and Systems Engineering. She went on to obtain a M.S. in Applied Math and Statistics at Georgetown University, and subsequently developed probability models to estimate cost and schedule during her tenure with the Office of Evaluation at NASA Headquarters. Now at Jet Propulsion Laboratory, she has found applicability for math and probability models in an engineering environment. She further developed that skillset as the Lead of the Europa Clipper Project Systems Engineering Analysis Team, where she and her team produced 3 probabilistic risk assessments for the mission, using their model-based SE environment. She is currently the modeling lead for a JPL New Frontiers proposal and is a member of JPL’s Quantification of Uncertainty Across Disciplines (QUAD) team, which is promoting Uncertainty Quantification practices across JPL. In parallel, Kelli is pursuing a PhD in UQ at University of Southern California.
It is critical to link the scientific phenomenology under investigation, and the operating environment directly to the spacecraft design, mission design, and concept of operations. With many missions of discovery, the large uncertainty in the science phenomenology and the operating environment necessitates mission architecture solutions that are robust and resilient to these unknowns, to maximize probability of achieving the mission objectives. Feasible mission architectures are assessed against performance, cost, and risk, in the context of large uncertainties.For example, despite Cassini observations of Enceladus, significant uncertainties exist in the moon’s surface properties and the surrounding Enceladus environment. Orbilander or any other mission to Enceladus will need to quantify or bound these uncertainties to formulate a viable design and operations trade space that addresses a range of mission objectives within the imposed technical, programmatic constraints.Uncertainty quantification (UQ), utilizes a portfolio of stochastic, data science, and mathematical methods to characterize uncertainty of a system and inform decision-making. This discussion will focus on a formulation of a workflow and an example of an Enceladus mission development use case.
Room E

Virtual Session
Introduction to Design of Experiments in R: Generating and Evaluating Designs with skpr
Tyler Morgan-Wall (IDA)
Dr. Tyler Morgan-Wall is a Research Staff Member at the Institute for Defense Analyses, and is the developer of the software library skpr: a package developed at IDA for optimal design generation and power evaluation in R. He is also the author of several other R packages for data visualization, mapping, and cartography. He has a PhD in Physics from Johns Hopkins University and lives in Silver Spring, MD.
The Department of Defense requires rigorous testing to support the evaluation of effectiveness and suitability of oversight acquisition programs. These tests are performed in a resource constrained environment and must be carefully designed to efficiently use those resources. The field of Design of Experiments (DOE) provides methods for testers to generate optimal experimental designs taking these constraints into account, and computational tools in DOE can support this process by enabling analysts to create designs tailored specifically for their test program. In this tutorial, I will show how you can run these types of analyses using “skpr”: a free and open source R package developed by researchers at IDA for generating and evaluating optimal experimental designs. This software package allows you to perform DOE analyses entirely in code; rather than using a graphical user interface to generate and evaluate individual designs one-by-one, this tutorial will demonstrate how an analyst can use “skpr” to automate the creation of a variety of different designs using a short and simple R script. Attendees will learn the basics of using the R programming language and how to generate, save, and share their designs. Additionally, “skpr” provides a straightforward interface to calculate statistical power. Attendees will learn how to use built-in parametric and Monte Carlo power evaluation functions to compute power for a variety of models and responses, including linear models, split-plot designs, blocked designs, generalized linear models (including logistic regression), and survival models. Finally, I will demonstrate how you can conduct an end-to-end DOE analysis entirely in R, showing how to generate power versus sample size plots and other design diagnostics to help you design an experiment that meets your program’s needs.
12:20 PM – 1:30 PM
Lunch
1:30 PM – 3:00 PM: Parallel Sessions
Room A

Virtual Session
Session 2A: Statistical Engineering
Session Chair: Peter Parker, NASA
An Overview of the NASA Quesst Community Test Campaign with the X-59 Aircraft
Jonathan Rathsam (NASA Langley Research Center)
Jonathan Rathsam is a Senior Research Engineer at NASA’s Langley Research Center in Hampton, Virginia. He conducts laboratory and field research on human perceptions of low noise supersonic overflights. He currently serves as Technical Lead of Survey Design and Analysis for Community Test Planning and Execution within NASA’s Commercial Supersonic Technology Project. He has previously served as NASA co-chair for DATAWorks and as chair for a NASA Source Evaluation Board. He holds a Ph.D. in Engineering from the University of Nebraska, a B.A. in Physics from Grinnell College in Iowa, and completed postdoctoral research in acoustics at Ben-Gurion University in Israel.
In its mission to expand knowledge and improve aviation, NASA conducts research to address sonic boom noise, the prime barrier to overland supersonic flight. For half a century civilian aircraft have been required to fly slower than the speed of sound when over land to prevent sonic boom disturbances to communities under the flight path. However, lower noise levels may be achieved via new aircraft shaping techniques that reduce the merging of shockwaves generated during supersonic flight. As part of its Quesst mission, NASA is building a piloted, experimental aircraft called the X-59 to demonstrate low noise supersonic flight. After initial flight testing to ensure the aircraft performs as designed, NASA will begin a national campaign of community overflight tests to collect data on how people perceive the sounds from this new design. The data collected will support national and international noise regulators’ efforts as they consider new standards that would allow supersonic flight over land at low noise levels. This presentation provides an overview of the community test campaign, including the scope, key objectives, stakeholders, and challenges.
Dose-Response Data Considerations for the NASA Quesst Community Test Campaign
Aaron B. Vaughn (NASA Langley Research Center)
Aaron Vaughn works in the Structural Acoustics Branch at NASA Langley Research Center and is a member of the Community Test and Planning Execution team under the Commercial Supersonic Technology project. Primarily, Aaron researches statistical methods for modeling the dose-response relationship of boom level to community annoyance in preparation for upcoming X-59 community tests.
Key outcomes for NASA’s Quesst mission are noise dose and perceptual response data to inform regulators on their decisions regarding noise certification standards for the future of overland commercial supersonic flight. Dose-response curves are commonly utilized in community noise studies to describe the annoyance of a community to a particular noise source. The X-59 aircraft utilizes shaped-boom technology to demonstrate low noise supersonic flight. For X-59 community studies, the sound level from X-59 overflights constitutes the dose, while the response is an annoyance rating selected from a verbal scale, e.g., “slightly annoyed” and “very annoyed.” Dose-response data will be collected from individual flyovers (single event dose) and an overall response to the accumulation of single events at the end of the day (cumulative dose). There are quantifiable sources of error in the noise dose due to uncertainty in microphone measurements of the sonic thumps and uncertainty in predicted noise levels at survey participant locations. Assessing and accounting for error in the noise dose is essential to obtain an accurate dose-response model. There is also a potential for error in the perceptual response. This error is due to the ability of participants to provide their response in a timely manner and participant fatigue after responding to up to one hundred surveys over the course of a month. This talk outlines various challenges in estimating noise dose and perceptual response and the methods considered in preparation for X-59 community tests.
Infusing Statistical Thinking into the NASA Quesst Community Test Campaign
Nathan B. Cruze (NASA Langley Research Center)
Dr. Nathan Cruze joined NASA Langley Research Center in 2021 as a statistician in the Engineering Integration Branch supporting the planning and execution of community testing during the Quesst mission. Prior to joining NASA, he served as a research mathematical statistician at USDA’s National Agricultural Statistics Service for more than eight years, where his work focused on improving crop and economic estimates programs by combining survey and auxiliary data through statistical modeling. His Ph.D. in Interdisciplinary Programs was co-directed by faculty from the statistics and chemical engineering departments at Ohio State University. He holds bachelor’s degrees in economics and mathematics and master’s degrees in economics and statistics, also from Ohio State University. Dr. Cruze currently co-chairs the Federal Committee on Statistical Methodology interest group on Computational Statistics and the Production of Official Statistics.
Statistical thinking permeates many important decisions as NASA plans its Quesst mission, which will culminate in a series of community overflights using the X-59 aircraft to demonstrate low-noise supersonic flight. Month-long longitudinal surveys will be deployed to assess human perception and annoyance to this new acoustic phenomenon. NASA works with a large contractor team to develop systems and methodologies to estimate noise doses, to test and field socio-acoustic surveys, and to study the relationship between the two quantities, dose and response, through appropriate choices of statistical models. This latter dose-response relationship will serve as an important tool as national and international noise regulators debate whether overland supersonic flights could be permitted once again within permissible noise limits. In this presentation we highlight several areas where statistical thinking has come into play, including issues of sampling, classification and data fusion, and analysis of longitudinal survey data that are subject to rare events and the consequences of measurement error. We note several operational constraints that shape the appeal or feasibility of some decisions on statistical approaches, and we identify several important remaining questions to be addressed.
Room B

Virtual Session
Session 2B: Situation Awareness, Autonomous Systems, and Digital Engineering in . . .
Session Chair: Elizabeth Green, IDA
Towards Scientific Practices for Situation Awareness Evaluation in Operational Testing
Miriam Armstrong (IDA)
Dr. Armstrong is a human factors researcher at IDA where she is involved in operational testing of defense systems. Her expertise includes interactions between humans and autonomous systems and psychometrics. She received her PhD in Human Factors Psychology from Texas Tech University in 2021. Coauthors Elizabeth Green, Brian Vickers, and Janna Mantua also conduct human subjects research at IDA.
Situation Awareness (SA) plays a key role in decision making and human performance; higher operator SA is associated with increased operator performance and decreased operator errors. In the most general terms, SA can be thought of as an individual’s “perception of the elements in the environment within a volume of time and space, the comprehension of their meaning, and the projection of their status in the near future.”While “situational awareness” is a common suitability parameter for systems under test, there is no standardized method or metric for quantifying SA in operational testing (OT). This leads to varied and suboptimal treatments of SA across programs and test events. Current measures of SA are exclusively subjective and paint an inadequate picture. Future advances in system connectedness and mission complexity will exacerbate the problem. We believe that technological improvements will necessitate increases in the complexity of the warfighters’ mission, including changes to team structures (e.g., integrating human teams with human-machine teams), command and control (C2) processes (e.g., expanding C2 frameworks toward joint all-domain C2), and battlespaces (e.g., overcoming integration challenges for multi-domain operations). Operational complexity increases the information needed for warfighters to maintain high SA, and assessing SA will become increasingly important and difficult to accomplish.IDA’s Test science team has proposed a piecewise approach to improve the measurement of situation awareness in operational evaluations. The aim of this presentation is to promote a scientific understanding of what SA is (and is not) and encourage discussion amongst practitioners tackling this challenging problem. We will briefly introduce Endsley’s Model of SA, review the trade-offs involved in some existing measures of SA, and discuss a selection of potential ways in which SA measurement during OT may be improved.
T&E Landscape for Advanced Autonomy
Kathryn Lahman (Johns Hopkins University Applied Physics Laboratory)
I am the Program Manager for Advanced Autonomy Test & Evaluation (AAT&E) program within the Sea Control Mission Area of the Johns Hopkins Applied Physics Laboratory (JHU/APL).My primary focus is guiding the DoD towards a full scope understanding of what Autonomy T&E truly involves such as:– validation of Modeling and Simulation (M&S) environments and models– development of test tools and technologies to improve T&E within M&S environments– data collection, analysis and visualization to make smarter decisions more easily– improvement and streamlining of the T&E process to optimize continual development, test, and fielding– understanding and measuring trust of autonomy– supporting rapid experimentation and feedback loop from testing via M&S to testing with physical systemsIn my previous life as a Human Systems Engineer (HSE), I developed skills in Usability Engineering, Storyboarding, Technical Writing, Human Computer Interaction, and Information Design. Strong information technology professional with a Master of Science (M.S.) focused in Human Centered Computing from University of Maryland Baltimore County (UMBC).As I moved to JHU/APL I became focused on HSE involving UxVs (Unmanned Vehicles of multiple domains) and autonomous systems. I further moved into managing projects across autonomous system domains with the Navy as my primary sponsor. As my skillsets and understanding of Autonomy and Planning, Test and Evaluation (PT&E) of those systems grew, I applied a consistent human element to my approach for PT&E.
The DoD is making significant investments in the development of autonomous systems, spanning from basic research, at organizations such as DARPA and ONR, to major acquisition programs, such as PEO USC.In this talk we will discuss advanced autonomous systems as complex, fully autonomous systems and systems of systems, rather than specific subgenres of autonomous functions – i.e. basic path planning autonomy or vessel controllers for moving vessels from point A to B.As a community, we are still trying to understand how to integrate these systems in the field with the warfighter to fully optimize their added capabilities. A major goal of using autonomous systems is to support multi-domain, distributed operations. We have a vision for how this may work, but we don’t know when, or if, these systems will be ready any time soon to implement these visions. We must identify trends, analyze bottlenecks, and find scalable approaches to fielding these capabilities, such as identifying certification criterion, or optimizing methods of testing and evaluating (T&E) autonomous systems.Traditional T&E methods are not sufficient for cutting edge autonomy and artificial intelligence (AI). Not only do we have to test the traditional aspects of system performance (speed, endurance, range, etc.) but also the decision-making capabilities that would have previously been performed by humans.This complexity increases when an autonomous system changes based on how it is applied in the real world. Each domain, environment, and platform an autonomy is run on, presents unique autonomy considerations.Complexity is further compounded when we begin to stack these autonomies and integrate them into a fully autonomous system of systems. Currently, there are no standard processes or procedures for testing these nested, complex autonomies; yet there are numerous areas for growth and improvement in this space.We will dive into identified capability gaps in Advanced Autonomy T&E that we have recognized and provide approaches for how the DOD may begin to tackle these issues. It is important that we make critical contributions towards testing, trusting and certifying these complex autonomous systems.Primary focus areas that are addressed include:– Recommending the use of bulk testing through Modeling and Simulation (M&S), while ensuring that the virtual environment is representative of the operational environment.– Developing intelligent tests and test selection tools to locate and discriminate areas of interest faster than through traditional Monte-Carlo sampling methods.– Building methods for testing black box autonomies faster than real time, and with fewer computational requirements.– Providing data analytics that assess autonomous systems in ways that human provide decision makers a means for certification.– Expanding the concept of what trust means, how to assess and, subsequently, validate trustworthiness of these systems across stakeholders.– Testing experimental autonomous systems in a safe and structured manner that encourages rapid fielding and iteration on novel autonomy components.
Digital Transformation Enabled by Enterprise Automation
Nathan Pond (Edaptive Computing, Inc.)
Nathan Pond is the Program Manager for Business Enterprise Systems at Edaptive Computing, Inc., where he works to provide integrated technology solutions around a variety of business and engineering processes. He oversees product development teams for core products and services enabling digital transformation, and acts as the principal cloud architect for cloud solutions. Mr. Pond has over 20 years of experience with software engineering and technology, with an emphasis on improving efficiency with digital transformation and process automation.
Digital transformation is a broad term that means a variety of things to people in many different operational domains, but the underlying theme is consistent: using digital technologies to improve business processes, culture, and efficiency. Digital transformation results in streamlining communications, collaboration, and information sharing while reducing errors. Properly implemented digital processes provide oversight and cultivate accountability to ensure compliance with business processes and timelines.A core tenet of effective digital transformation is automation. The elimination or reduction of human intervention in processes provides significant gains to operational speed, accuracy, and efficiency.DOT&E uses automation to streamline the creation of documents and reports which need to include up-to-date information. By using Smart Documentation capabilities, authors can define and automatically populate sections of documents with the most up-to-date data, ensuring that every published document always has the most current information.This session discusses a framework for driving digital transformation to automate nearly any business process.
Room C
Students & Fellows Speed Session 1
Please note many of the Speed Sessions will also include a poster during the Poster Session
Session Chair: Denise Edwards, IDA
Comparing Normal and Binary D-optimal Design of Experiments by Statistical Power
Addison Adams (IDA / Colorado State University)
Addison joined the Institute for Defense Analysis (IDA) during the summer of 2022. Addison is currently a PhD student at Colorado State University where he is studying statistics. Addison’s PhD research is focused on the stochastic inverse problem and its applications to random coefficient models. Before attending graduate school, Addison worked as a health actuary for Blue Cross of Idaho. Addison attended Utah Valley University (UVU) where he earned a BS in mathematics. During his time at UVU, Addison completed internships with both the FBI and AON.
In many Department of Defense (DoD) Test and Evaluation (T&E) applications, binary response variables are unavoidable. Many have considered D-optimal design of experiments (DOEs) for generalized linear models (GLMs). However, little consideration has been given to assessing how these new designs perform in terms of statistical power for a given hypothesis test. Monte Carlo simulations and exact power calculations suggest that D-optimal designs generally yield higher power than binary D-optimal designs, despite using logistic regression in the analysis after data have been collected. Results from using statistical power to compare designs contradict traditional DOE comparisons which employ D-efficiency ratios and fractional design space (FDS) plots. Power calculations suggest that practitioners that are primarily interested in the resulting statistical power of a design should use normal D-optimal designs over binary D-optimal designs when logistic regression is to be used in the data analysis after data collection.
Uncertain Text Classification for Proliferation Detection
Andrew Hollis (North Carolina State University)
Andrew Hollis was born raised in Los Alamos, New Mexico. He attended the University of New Mexico as a Regents’ Scholar and received his bachelor’s degree in statistics with minors in computer science and mathematics in spring 2018. During his time in undergraduate, he also completed four summer internships at Los Alamos National Laboratory in the Principal Associate Directorate for Global Security. He began the PhD program in Statistics at North Carolina State University in August of 2018, and received his Masters of Statistics in December of 2020. While at NCSU, he has conducted research in collaboration with the Laboratory for Analytical Sciences, a research lab focused on building analytical tools for the intelligence community, the Consortium for Nonproliferation Enabling Capabilities, and West Point. He has had opportunities to complete two internships with the Department of Defense including an internship with the Air Force at the Pentagon in the summer of 2022. He plans to graduate with his PhD in May of 2023, and will begin working with the Air Force as an operations research analyst after graduation.
A key global security concern in the nuclear weapons age is the proliferation and development of nuclear weapons technology, and a crucial part of enforcing non-proliferation policy is developing an awareness of the scientific research being pursued by other nations and organizations. Deep, transformer-based text classification models are an important piece of systems designed to monitor scientific research for this purpose. For applications like proliferation detection involving high-stakes decisions, there has been growing interest in ensuring that we can perform well-calibrated, interpretable uncertainty quantification with such classifier models. However, because modern transformer-based text classification models have hundreds of millions of parameters and the computational cost of uncertainty quantification typically scales with the size of the parameter space, it has been difficult to produce computationally tractable uncertainty quantification for these models. We propose a new variational inference framework that is computationally tractable for large models and meets important uncertainty quantification objectives including producing predicted class probabilities that are well-calibrated and reflect our prior conception of how different classes are related.
A data-driven approach of uncertainty quantification on Reynolds stress based on DNS
Zheming Gou (University of Southern California)
Zheming Gou, a Ph.D. student in the Department of Mechanical Engineering at the University of Southern California, is a highly motivated individual with a passion for high-fidelity simulations, uncertainty quantification, and machine learning, especially in high dimensions and rare data scenarios. Currently, Zheming Gou is engaged in research to build probabilistic models for multiscale simulations using tools including polynomial chaos expansion (PCE) and state-of-art machine learning methods.
High-fidelity simulation capabilities have progressed rapidly over the past decades in computational fluid dynamics (CFD), resulting in plenty of high-resolution flow field data. Uncertainty quantification remains an unsolved problem due to the high-dimensional input space and the intrinsic complexity of turbulence. Here we developed an uncertainty quantification method to model the Reynolds stress based on Karhunen-Loeve Expansion(KLE) and Project Pursuit basis Adaptation polynomial chaos expansion(PPA). First, different representative volume elements (RVEs) were randomly drawn from the flow field, and KLE was used to reduce them into a moderate dimension. Then, we build polynomial chaos expansions of Reynolds stress using PPA. Results show that this method can yield a surrogate model with a test accuracy of up to 90%. PCE coefficients also show that Reynolds stress strongly depends on second-order KLE random variables instead of first-order terms. Regarding data efficiency, we built another surrogate model using a neural network(NN) and found that our method outperforms NN in limited data cases.
I-TREE: a tool for characterizing research using taxonomies
Aayushi Verma (IDA)
Aayushi Verma is a Data Science Fellow at the Institute for Defense Analyses. She supports the Chief Data Officer with the IDA Data Initiative Strategy by leveraging disparate sources of data to create applications and dashboards that help IDA staff. Her data science interests include data analysis, machine learning, artificial intelligence, and extracting stories from data. She has a B.Sc. (Hons.) in Astrophysics from the University of Canterbury, and is currently pursuing her M.S. in Data Science from Pace University.
IDA is developing a Data Strategy to develop solid infrastructures and practices that allow for a rigorous data-centric approach to answering U.S. security and science policy questions. The data strategy implements data governance and data architecture strategies to leverage data to gain trusted insights, and establishes a data-centric culture. One key component of the Data Strategy is a set of research taxonomies that describe and characterize the research done at IDA. These research taxonomies, broadly divided into six categories, are a vital tool to help IDA researchers gain insight into the research expertise of staff and divisions, in terms of the research products that are produced for our sponsors.We have developed an interactive web application which consumes numerous disparate sources of data related to these taxonomies, research products, researchers, and divisions, and unites them to create quantified analytics and visualizations to answer questions about research at IDA. This tool, titled I-TREE (IDA-Taxonomical Research Expertise Explorer), will enable staff to answer questions like ‘Who are the researchers most commonly producing products for a specified research area?’, ‘What is the research profile of a specified author?’, ‘What research topics are most commonly addressed by a specified division?’, ‘Who are the researchers most commonly producing products in a specified division?’, and ‘What divisions are producing products for a specified research topic?’.These are essential questions whose answers allow IDA to identify subject-matter expertise areas, methodologies, and key skills in response to sponsor requests, and to identify common areas of expertise to build a research team with a broad range of skills. I-TREE demonstrates the use of data science and data management techniques that enhance the company’s data strategy while actively enabling researchers and management to make informed decisions.
An Evaluation Of Periodic Developmental Reviews Using Natural Language Processing
Dominic Rudakevych (United States Military Academy)
Cadet Dominic Rudakevych is from Hatboro, Pennsylvania, and is a senior studying mathematics at the United States Military Academy (USMA). Currently, CDT Rudakevych serves as the Society for Industrial and Applied Mathematics president at USMA and the captain of the chess team. He is involved in research within the Mathematics department, using artificial intelligence methods to analyze how written statements about cadets impact overall cadet ratings. He will earn a Bachelor of Science Degree in mathematics upon graduation. CDT Rudakevych will commission as a Military Intelligence officer in May and is excited to serve his country as a human-intelligence platoon leader.
As an institution committed to developing leaders of character, the United States Military Academy (USMA) holds a vested interest in measuring character growth. One such tool, the Periodic Developmental Review (PDR), has been used by the Academy’s Institutional Effectiveness Office for over a decade. PDRs are written counseling statements evaluating how a cadet is developing with respect to his/her peers. The objective of this research was to provide an alternate perspective of the PDR system by using statistical and natural language processing (NLP) based approaches to find whether certain dimensions of PDR data were predictive of a cadet’s overall rating. This research implemented multiple NLP tasks and techniques, including sentiment analysis, named entity recognition, tokenization, part-of-speech tagging, and word2vec, as well as statistical models such as linear regression and ordinal logistic regression. The ordinal logistic regression model concluded PDRs with optional written summary statements had more predictable overall scores than those without summary statements. Additionally, those who wrote the PDR on the cadet (Self, Instructor, Peer, Subordinate) held strong predictive value towards the overall rating. When compared to a self-reflecting PDR, instructor-written PDRs were 62.40% more probable to have a higher overall score, while subordinate-written PDRs had a probability of improvement of 61.65%. These values were amplified to 70.85% and 73.12% respectively when considering only those PDRs with summary statements. These findings indicate that different writer demographics have a different understanding of the meaning of each rating level. Recommendations for the Academy would be implementing a forced distribution or providing a deeper explanation of overall rating in instructions. Additionally, no written language facets analyzed demonstrated predictive strength, meaning written statements do not introduce unwanted bias and could be made a required field for more meaningful feedback to cadets.
Optimal Release Policy for Covariate Software Reliability Models.
Ebenezer Yawlui (University of Massachusetts Dartmouth)
Ebenezer Yawlui is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). He received his BS (2020) in Electrical Engineering from Regional Maritime University, Ghana.
The optimal time to release a software is a common problem of broad concern to software engineers, where the goal is to minimize cost by balancing the cost of fixing defects before or after release as well as the cost of testing. However, the vast majority of these models are based on defect discovery models that are a function of time and can therefore only provide guidance on the amount of additional effort required. To overcome this limitation, this paper presents a software optimal release model based on cost criteria, incorporating the covariate software defect detection model based on the Discrete Cox Proportional Hazards Model. The proposed model provides more detailed guidance recommending the amount of each distinct test activity performed to discover defects. Our results indicate that the approach can be utilized to allocate effort among alternative test activities in order to minimize cost.
A Stochastic Petri Net Model of Continuous Integration and Continuous Delivery
Sushovan Bhadra (University of Massachusetts Dartmouth)
Sushovan Bhadra is a MS student in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth (UMassD). He received his BS (2016) in Electrical Engineering from Ahsanullah University of Science and Technology (AUST), Bangladesh.
Modern software development organizations rely on continuous integration and continuous delivery (CI/CD), since it allows developers to continuously integrate their code in a single shared repository and automates the delivery process of the product to the user. While modern software practices improve the performance of the software life cycle, they also increase the complexity of this process. Past studies make improvements to the performance of the CI/CD pipeline. However, there are fewer formal models to quantitatively guide process and product quality improvement or characterize how automated and human activities compose and interact asynchronously. Therefore, this talk develops a stochastic Petri net model to analyze a CI/CD pipeline to improve process performance in terms of the probability of successfully delivering new or updated functionality by a specified deadline. The utility of the model is demonstrated through a sensitivity analysis to identify stages of the pipeline where improvements would most significantly improve the probability of timely product delivery. In addition, this research provided an enhanced version of the conventional CI/CD pipeline to examine how it can improve process performance in general. The results indicate that the augmented model outperforms the conventional model, and sensitivity analysis suggests that failures in later stages are more important and can impact the delivery of the final product.
Neural Networks for Quantitative Resilience Prediction
Karen Alves da Mata (University of Massachusetts Dartmouth)
Karen da Mata is a Master’s Student in the Electrical and Computer Engineering Department at the University of Massachusetts – Dartmouth. She completed her undergraduate studies in the Electrical Engineering Department at the Federal University of Ouro Preto – Brazil – in 2018.
System resilience is the ability of a system to survive and recover from disruptive events, which finds applications in several engineering domains, such as cyber-physical systems and infrastructure. Most studies emphasize resilience metrics to quantify system performance, whereas more recent studies propose resilience models to project system recovery time after degradation using traditional statistical modeling approaches. Moreover, past studies are either performed on data after recovering or limited to idealized trends. Therefore, this talk considers alternative machine learning approaches such as (i) Artificial Neural Networks (ANN), (ii) Recurrent Neural Networks, and (iii) Long-Short Term Memory (LSTM) to model and predict system performance of alternative trends other than ones previously considered. These approaches include negative and positive factors driving resilience to understand and precisely quantify the impact of disruptive events and restorative activities. A hybrid feature selection approach is also applied to identify the most relevant covariates. Goodness of fit measures are calculated to evaluate the models, including (i) mean squared error, (ii) predictive-ratio risk, (iii) and adjusted R squared. The results indicate that LSTM models outperform ANN and RNN models requiring fewer neurons in the hidden layer in most of the data sets considered. In many cases, ANN models performed better than RNNs but required more time to be trained. These results suggest that neural network models for predictive resilience are both feasible and accurate relative to traditional statistical methods and may find practical use in many important domains.
A generalized influence maximization problem
Sumit Kumar Kar (University of North Carolina at Chapel Hill)
I am currently a Ph.D. candidate and a Graduate Teaching Assistant in the department of Statistics & Operations Research at the University of North Carolina at Chapel Hill (UNC-CH). My Ph.D. advisors are Prof. Nilay Tanik Argon, Prof. Shankar Bhamidi, and Prof. Serhan Ziya. I broadly work in probability, statistics, and operations research. My current research interests include (but are not confined to) working on problems in random networks which find interesting practical applications such as in viral marketing or Word-Of-Mouth (WOM) marketing, efficient immunization during epidemic outbreaks, finding the root of an epidemic, and so on. Apart from research, I have worked on two statistical consulting projects with the UNC Dept. of Medicine and Quantum Governance, L3C, CUES. Also, I am extremely passionate about teaching. I have been the primary instructor as well as a teaching assistant for several courses at UNC-CH STOR. You can find more about me on my website here: https://sites.google.com/view/sumits-space/home.
The influence maximization problem is a popular topic in social networks with several applications in viral marketing and epidemiology. One possible way to understand the problem is from the perspective of a marketer who wants to achieve the maximum influence on a social network by choosing an optimum set of nodes of a given size as seeds. The marketer actively influences these seeds, followed by a passive viral process based on a certain influence diffusion model, in which influenced nodes influence other nodes without external intervention. Kempe et al. showed that a greedy algorithm-based approach can provide a (1-1/e)-approximation guarantee compared to the optimal solution if the influence spreads according to the Triggering model. In our current work, we consider a much more general problem where the goal is to maximize the total expected reward obtained from the nodes that are influenced by a given time (that may be finite or infinite) where the reward obtained by influencing a set of nodes can depend on the set (and not necessarily a sum of rewards from the individual nodes) as well as the times at which each node gets influenced, we can restrict ourself to a subset of the network from where the seeds can be chosen, we can choose to assign multiple units of our budget to a single node (where the maximum number of budget units that may be assigned on a node can depend on the node), and a seeded node will actually get influenced with a certain probability where the probability is a non-decreasing function of the number of budget units assigned to that node. We have formulated a greedy algorithm that provides a (1-1/e)-approximation guarantee compared to the optimal solution of this generalized influence maximization problem if the influence spreads according to the Triggering model.
Room D
Contributed Session 1
Session Chair: TBD
Avoiding Pitfalls in AI/ML Packages
Justin Krometis (Virginia Tech National Security Institute)
Justin Krometis is a Research Assistant Professor with the Virginia Tech National Security Institute and holds an adjunct position in the Math Department at Virginia Tech. His research in mostly in development of theoretical and computational frameworks for Bayesian data analysis. These include approaches to incorporating and balancing data and expert opinion into decision-making, estimating model parameters, including high- or even infinite-dimensional quantities, from noisy data, and designing experiments to maximize the information gained. He also has extensive expertise in high-performance computing and more recently-developed skills in Artificial Intelligence/Machine Learning (AI/ML) techniques. Research interests include: Statistical Inverse Problems, High-Performance Computing, Parameter Estimation, Uncertainty Quantification, Artificial Intelligence/Machine Learning (AI/ML), Reinforcement Learning, and Experimental Design.Prior to joining VTNSI, Dr. Krometis spent ten years as a Computational Scientistsupporting high-performance computing with Advanced Research Computing at Virginia Tech and seven years in the public and private sectors doing transportation modeling for planning and evacuation applications and hurricane, pandemic, and other emergency preparedness. He holds Ph.D., M.S., and B.S. degrees in Math and a B.S. degree in Physics, all from Virginia Tech.
Recent years have seen an explosion in the application of artificial intelligence and machine learning (AI/ML) to practical problems from computer vision to game playing to algorithm design. This growth has been mirrored and, in many ways, been enabled by the development and maturity of publicly-available software packages such as PyTorch and TensorFlow that make model building, training, and testing easier than ever. While these packages provide tremendous power and flexibility to users, and greatly facilitate learning and deploying AI/ML techniques, they and the models they provide are extremely complicated and as a result can present a number of subtle but serious pitfalls. This talk will present three examples from the presenter’s recent experience where obscure settings or bugs in these packages dramatically changed model behavior or performance – one from a classic deep learning application, one from training of a classifier, and one from reinforcement learning. These examples illustrate the importance of thinking carefully about the results that a model is producing and carefully checking each step in its development before trusting its output.
Reinforcement Learning Approaches to the T&E of AI/ML-based Systems Under Test
Karen O’Brien (Modern Technology Solutions, Inc)
Karen O’Brien has 20 years of service as a Dept. of the Army Civilian. She has worked as a physical scientist and ORSA in a wide range of mission areas, from ballistics to logistics, and from S&T to T&E. She was a physics and chemistry nerd as an undergrad but uses her Masters in Predictive Analytics from Northwestern to support DoD agencies in developing artificial intelligence, machine learning, and advanced analytics capabilities. She is currently a principal data scientist at Modern Technology Solutions, Inc.
Designed experiments provide an efficient way to sample the complex interplay of essential factors and conditions during operational testing. Analysis of these designs provide more detailed and rigorous insight into the system under test’s (SUT) performance than top-level summary metrics provide. The introduction of artificial intelligence and machine learning (AI/ML) capabilities in SUTs create a challenge for test and evaluation because the factors and conditions that constitute the AI SUT’s “feature space” are more complex than those of a mechanical SUT. Executing the equivalent of a full-factorial design quickly becomes infeasible.This presentation will demonstrate an approach to efficient, yet rigorous, exploration of the AI/ML-based SUT’s feature space that achieves many of the benefits of a traditional design of experiments – allowing more operationally meaningful insight into the strengths and limitations of the SUT than top-level AI summary metrics (like ‘accuracy’) provide. The approach uses an algorithmically defined search method within a reinforcement learning-style test harness for AI/ML SUTs. An adversarial AI (or AI critic) efficiently traverses the feature space and maps the resulting performance of the AI/ML SUT. The process identifies interesting areas of performance that would not otherwise be apparent in a roll-up metric. Identifying ‘toxic performance regions’, in which combinations of factors and conditions result in poor model performance, provide critical operational insights for both testers and evaluators. The process also enables T&E to explore the SUT’s sensitivity and robustness to changes in inputs and the boundaries of the SUT’s performance envelope. Feedback from the critic can be used by developers to improve the AI/ML SUT and by evaluators to interpret in terms of effectiveness, suitability, and survivability. This procedure can be used for white box, grey box and black box testing.
Under Pressure? Using Unsupervised Machine Learning for Classification May Help
Nelson Walker (United States Air Force)
Dr. Walker is a statistician for the United States Air Force at the 412th Test Wing at Edwards AFB, California. He graduate with a PhD in statistics from Kansas State University in 2021.
Classification of fuel pressure states is a topic of aerial refueling that is open to interpretation from subject matter experts when primarily visual examination is utilized. Fuel pressures are highly stochastic, so there are often differences in classification based on the experience level and judgement calls between a particular engineers. This hurts reproducibility and defensibility between test efforts, in addition to being highly time-consuming. The Pruned Exact Linear Time (PELT) changepoint detection algorithm is an unsupervised machine learning method that has shown promise towards leading to a consistent and reproducible solution regarding classification. This technique combined with classification rules shows promise to classify oscillatory behavior, transient spikes, and steady states, all while having malleable features that can adjust the sensitivity to identify key chunks of fuel pressure states across multiple receivers and tankers.
Analysis Opportunities for Missile Trajectories
Kelsey Cannon (Lockheed Martin Space)
Kelsey Cannon is a senior research scientist at Lockheed Martin Space in Denver, CO. She earned a bachelors degree from the CO School of Mines in Metallurgical and Materials Engineering and a masters degree in Computer Science and Data Science. In her current role, Kelsey works across aerospace and DoD programs to advise teams on effective use of statistical and uncertainty quantification techniques to save time and budget.
Contractor analysis teams are given 725 monte-carlo generated trajectories for thermal heating analysis. The analysis team currently uses a reduced-order model to evaluate all 725 options for the worst cases. The worst cases are run with the full model for handoff of predicted temperatures to the design team. Two months later, the customer arrives with yet another set of trajectories updated for the newest mission options.This presentation will question each step in this analysis process for opportunities to improve the cost, schedule, and fidelity of the effort. Using uncertainty quantification and functional data analysis processes, the team should be able to improve the analysis coverage and power with reduced (or at least a similar number) of model runs.
Room E

Virtual Session
A Tour of JMP Reliability Platforms and Bayesian Methods for Reliability Data
Peng Liu (JMP Statistical Discovery)
Peng Liu is a Principal Research Statistician Developer at JMP Statistical Discovery LLC. He holds a Ph.D. in statistics from NCSU. He has been working at JMP since 2007. He specializes in computational statistics, software engineering, reliability data analysis, reliability engineering, time series analysis, and time series forecasting. He is responsible for developing and maintaining all JMP platforms in the above areas. He has broad interest in statistical analysis research and software product development.
JMP is a comprehensive, visual, and interactive statistical discovery software with a carefully curated graphical user interface designed for statistical discovery. The software is packed with traditional and modern statistical analysis capabilities and many unique innovative features. The software hosts several suites of tools that are especially valuable to the DATAWorks’s audience. The software includes suites for Design of Experiments, Quality Control, Process Analysis, and Reliability Analysis.JMP has been building its reliability suite for the past fifteen years. The reliability suite in JMP is a comprehensive and mature collections of JMP platforms. The suite empowers reliability engineers with tools for analyzing time-to-event data, accelerated life test data, observational reliability data, competing cause data, warranty data, cumulative damage data, repeated measures degradation data, destructive degradation data, and recurrence data. For each type of data, there are numerous models and one or more methodologies that are applicable based on the nature of data. In addition to reliability data analysis platforms, the suite also provides capabilities of reliability engineering for system reliability from two distinct perspectives, one for non-repairable systems and the other for repairable systems. The capability of JMP reliability suite is also at the frontier of advanced research on reliability data analysis. Inspired by the research by Prof. William Meeker at Iowa State University, we have implemented Bayesian inference methodologies for analyzing three most important types of reliability data. The tutorial will start with an overall introduction to JMP’s reliability platforms. Then the tutorial will focus on analyzing time-to-event data, accelerated life test data, and repeated measures degradation data. The tutorial will present analyzing these types of reliability data using traditional methods, and highlight when, why, and how to analyze them in JMP using Bayesian methods.
3:00 PM – 3:20 PM
Break
3:20 PM – 4:50 PM: Parallel Sessions
Room A

Virtual Session
Session 3A: Methods & Applications of M&S Validation
Session Chair: TBD
Model verification in a digital engineering environment: an operational test perspective
Jo Anna Capp (IDA)
Jo Anna Capp is a Research Staff Member in the Operational Evaluation Division of IDA’s Systems and Analyses Center. She supports the Director, Operational Test and Evaluation (DOT&E) in the test and evaluation oversight of nuclear acquisition programs for the Department of Defense.Jo Anna joined IDA in 2017, and has worked on space and missile systems during her tenure. She is an expert in operational test and evaluation of nuclear weapon systems and in the use of statistical and machine learning techniques to derive insight into the performance of these and other acquisition systems.Jo Anna holds a doctorate in biochemistry from Duke University and a bachelor’s degree in cell and molecular biology from Florida Gulf Coast University.
As the Department of Defense adopts digital engineering strategies for acquisition systems in development, programs are embracing the use highly federated models to assess the end-to-end performance of weapon systems, to include the threat environment. Often, due to resource limitations or political constraints, there is limited live data with which to validate the end-to-end performance of these models. In these cases, careful verification of the model, including from an operational factor-space perspective, early in model development can assist testers in prioritizing resources for model validation in later system development. This presentation will discuss how using Design of Experiments to assess the operational factor space can shape model verification efforts and provide data for model validation focused on the end-to-end performance of the system.
Recommendations for statistical analysis of modeling and simulation environment outputs
Curtis Miller (IDA)
Dr. Curtis Miller is a research staff member of the Operational Evaluation Division at the Institute for Defense Analyses. In that role, he advises analysts on effective use of statistical techniques, especially pertaining to modeling and simulation activities and U.S. Navy operational test and evaluation efforts, for the division’s primary sponsor, the Director of Operational Test and Evaluation. He obtained a PhD in mathematics from the University of Utah and has several publications on statistical methods and computational data analysis, including an R package, CPAT. In the past, he has done research on topics in economics including estimating difference in pay between male and female workers in the state of Utah on behalf of Voices for Utah Children, an advocacy group.
Modeling and simulation (M&S) environments feature frequently in test and evaluation (T&E) of Department of Defense (DoD) systems. Testers may generate outputs from M&S environments more easily than collecting live test data, but M&S outputs nevertheless take time to generate, cost money, require training to generate, and are accessed directly only by a select group of individuals. Nevertheless, many M&S environments do not suffer many of the resourcing limitations associated with live test. We thus recommend testers apply higher resolution output generation and analysis techniques compared to those used for collecting live test data. Doing so will maximize stakeholders’ understanding of M&S environments’ behavior and help utilize its outputs for activities including M&S verification, validation, and accreditation (VV&A), live test planning, and providing information for non-T&E activities.This presentation provides recommendations for collecting outputs from M&S environments such that a higher resolution analysis can be achieved. Space filling designs (SFDs) are experimental designs intended to fill the operational space for which M&S predictions are expected. These designs can be coupled with statistical metamodeling techniques that estimate a model that flexibly interpolates or predicts M&S outputs and their distributions at both observed settings and unobserved regions of the operational space. Analysts can use the resulting metamodels as a surrogate for M&S outputs in situations where the M&S environment cannot be deployed. They can also study metamodel properties to decide if a M&S environment adequately represents the original systems. IDA has published papers recommending specific space filling design and metamodeling techniques; this presentation briefly covers the content of those papers.
Back to the Future: Implementing a Time Machine to Improve and Validate Model Predictions
Olivia Gozdz and Kyle Remley (IDA)
Dr. Gozdz received her Bachelor’s of Science in Physics from Hamilton College in 2016, and she received her Ph.D. in Climate Science from George Mason University in 2022. Dr. Gozdz has been a Research Staff Member with IDA since September 2022.Dr. Remley received his Bachelor’s of Science in Nuclear and Radiological Engineering from Georgia Tech in 2013, he received in Master’s of Science in Nuclear Engineering from Georgia Tech in 2015, and he received in Ph.D. in Nuclear and Radiological Engineering from Georgia Tech in 2016. He was a senior engineer with the Naval Nuclear Laboratory until 2020. Dr. Remley has been a Research Staff Member with IDA since July 2020.
At a time when supply chain problems are challenging even the most efficient and robust supply ecosystems, the DOD faces the additional hurdles of primarily dealing in low volume orders of highly complex components with multi-year procurement and repair lead times. When combined with perennial budget shortfalls, it is imperative that the DOD spend money efficiently by ordering the “right” components at the “right time” to maximize readiness. What constitutes the “right” components at the “right time” depends on model predictions that are based upon historical demand rates and order lead times. Given that the time scales between decisions and results are often years long, even small modeling errors can lead to months-long supply delays or tens of millions of dollars in budget shortfalls. Additionally, we cannot evaluate the accuracy and efficacy of today’s decisions for some years to come. To address this problem, as well as a wide range of similar problems across our Sustainment analysis, we have built “time machines” to pursue retrospective validation – for a given model, we rewind DOD data sources to some point in the past and compare model predictions, using only data available at the time, against known historical outcomes. This capability allows us to explore different decisions and the alternate realities that would manifest in light of those choices. In some cases, this is relatively straightforward, while in others it is made quite difficult by problems familiar to any time-traveler: changing the past can change the future in unexpected ways.
Room B

Virtual Session
Session 3B: Software Quality
Session Chair: TBD
Assessing Predictive Capability and Contribution for Binary Classification Models
Mindy Hotchkiss (Aerojet Rocketdyne)
Mindy Hotchkiss is a Technical Specialist and Subject Matter Expert in Statistics for Aerojet Rocketdyne. She holds a BS degree in Mathematics and Statistics and an MBA from the University of Florida, and a Masters in Statistics from North Carolina State University. She has over 20 years of experience as a statistical consultant between Pratt & Whitney and Aerojet Rocketdyne, including work supporting technology development across the enterprise, including hypersonics and metals additive manufacturing. She has been a team member and statistics lead on multiple Metals Affordability Initiative projects, working with industry partners and the Air Force Research Laboratory Materials Directorate. Interests include experimentation, risk and reliability, statistical modeling in any form, machine learning, autonomous systems development and systems engineering, digital engineering, and the practical implementation of statistical methods. She is a Past Chair of the ASQ Statistics Division and currently serves on the Board of Directors for RAMS, the Reliability and Maintainability Symposium, and the Governing Board of the Ellis R. Ott Graduate Scholarship program.
Classification models for binary outcomes are in widespread use across a variety of industries. Results are commonly summarized in a misclassification table, also known as an error or confusion matrix, which indicates correct vs incorrect predictions for different circumstances. Models are developed to minimize both false positive and false negative errors, but the optimization process to train/obtain the model fit necessarily results in cost-benefit trades. However, how to obtain an objective assessment of the performance of a given model in terms of predictive capability or benefit is less well understood, due to both the rich plethora of options described in literature as well as the largely overlooked influence of noise factors, specifically class imbalance. Many popular measures are susceptible to effects due to underlying differences in how the data are allocated by condition, which cannot be easily corrected.This talk considers the wide landscape of possibilities from a statistical robustness perspective. Results are shown from sensitivity analyses for a variety of different conditions for several popular metrics and issues are highlighted, highlighting potential concerns with respect to machine learning or ML-enabled systems. Recommendations are provided to correct for imbalance effects, as well as how to conduct a simple statistical comparison that will detangle the beneficial effects of the model itself from those of imbalance. Results are generalizable across model type.
STAR: A Cloud-based Innovative Tool for Software Quality Analysis
Kazu Okumoto (Sakura Software Solutions (3S) LLC)
Kazu is a well-recognized pioneer in software reliability engineering. He invented a world-famous statistical model for software reliability (the “Goel – Okumoto model”). After retiring from Nokia Bell Labs in 2020 as a Distinguished Member of the Technical Staff, Dr. Okumoto founded Sakura Software Solutions, which has developed a cloud-based innovative tool, STAR, for software quality assurance. His co-authored book on software reliability is the most frequently referenced in this field. And he has an impressive list of book chapters, keynote addresses, and numerous technical papers to his credit. Since joining Bell Labs in 1980, Kazu has worked on many exciting projects for original AT&T, Lucent, Alcatel Lucent, and Nokia. He has 13 years of management experience, including Bell Labs Field Representative in Japan. He completed his Ph D. (1979) and MS (1976) at Syracuse University and BS (1974) at Hiroshima University. He was an Assistant Professor at Rutgers University.
Traditionally, subject matter experts perform software quality analysis using custom spreadsheets which produce inconsistent output and are challenging to share and maintain across teams. This talk will introduce and demonstrate STAR – a cloud-based, data-driven tool for software quality analysis. The tool is aimed at practitioners who manage software quality and make decisions based on its readiness for delivery. Being web-based and fully automated allows teams to collaborate on software quality analysis across multiple projects and releases. STAR is an integration of SaaS and automated analytics. It is a digital engineering tool for software quality practice.To use the tool, all users need to do is upload their defect and development effort (optional) data to the tool and set a couple of planned release milestones, such as test start date and delivery dates for customer trial and deployment. The provided data is then automatically processed and aggregated into a defect growth curve. The core innovation of STAR is in its set of Statistical sound algorithms that are then used to fit a defect prediction curve to the provided data. This is achieved through the automated identification of inflection points in the original defect data and their use in generating piece-wise exponential models that make up the final prediction curve. Moreover, during the early days of software development, where no defect data is available, STAR can use the development effort plan and learn from previous software releases’ defects and effort data to make predictions for the current release. Finally, the tool implements a range of what-if scenarios that enable practitioners to evaluate several potential actions to correct course.Thanks to the use of an earlier version of STAR by a large software development group at Nokia and the current trialing collaboration with NASA, the features and accuracy of the tool have improved to be better than traditional single curve fitting. In particular, the defect prediction is stable several weeks before the planned software release, and the multiple metrics provided by the tool make the analysis of software quality straightforward, guiding users in making an intelligent decision regarding the readiness for high-quality software delivery.
Covariate Software Vulnerability Discovery Model to Support Cybersecurity T&E
Lance Fiondella (University of Massachusetts)
Lance Fiondella is an Associate Professor in the Department of Electrical & Computer Engineering at the University of Massachusetts Dartmouth and the Founding Director of the University of Massachusetts Dartmouth Cybersecurity Center, A NSA/DHS designated Center of Academic Excellence in Cyber Research (CAE-R). His research has been funded by DHS, Army, Navy, Air Force, NASA, and National Science Foundation, including a CAREER award and CyberCorps Scholarship for Service.
Vulnerability discovery models (VDM) have been proposed as an application of software reliability growth models (SRGM) to software security related defects. VDM model the number of vulnerabilities discovered as a function of testing time, enabling quantitative measures of security. Despite their obvious utility, past VDM have been limited to parametric forms that do not consider the multiple activities software testers undertake in order to identify vulnerabilities. In contrast, covariate SRGM characterize the software defect discovery process in terms of one or more test activities. However, data sets documenting multiple security testing activities suitable for application of covariate models are not readily available in the open literature.To demonstrate the applicability of covariate SRGM to vulnerability discovery, this research identified a web application to target as well as multiple tools and techniques to test for vulnerabilities. The time dedicated to each test activity and the corresponding number of unique vulnerabilities discovered were documented and prepared in a format suitable for application of covariate SRGM. Analysis and prediction were then performed and compared with a flexible VDM without covariates, namely the Alhazmi-Malaiya Logistic Model (AML). Our results indicate that covariate VDM significantly outperformed the AML model on predictive and information theoretic measures of goodness of fit, suggesting that covariate VDM are a suitable and effective method to predict the impact of applying specific vulnerability discovery tools and techniques.
Room C
Students & Fellows Speed Session 2
Please note many of the Speed Sessions will also include a poster during the Poster Session
Session Chair: Jim Starling, United States Military Academy at West Point
A Bayesian Approach for Nonparametric Multivariate Process Monitoring using Universal Resi
Daniel Timme (Florida State University)
Daniel A. Timme is currently a PhD candidate in Statistics at Florida State University. Mr. Timme graduated with a BS in Mathematics from the University of Houston and a BS in Business Management from the University of Houston-Clear Lake. He earned an MS in Systems Engineering with a focus in Reliability and a second MS in Space Systems with focuses in Space Vehicle Design and Astrodynamics, both from the Air Force Institute of Technology. Mr. Timme’s research interest is primarily focused in the areas of reliability engineering, applied mathematics and statistics, optimization, and regression.
In Quality Control, monitoring sequential-functional observations for characteristic changes through change-point detection is a common practice to ensure that a system or process produces high-quality outputs. Existing methods in this field often only focus on identifying when a process is out-of-control without quantifying the uncertainty of the underlying decision-making processes. To address this issue, we propose using universal residuals under a Bayesian paradigm to determine if the process is out-of-control and assess the uncertainty surrounding that decision. The universal residuals are computed by combining two non-parametric techniques: regression trees and kernel density estimation. These residuals have the key feature of being uniformly distributed when the process is in control. To test if the residuals are uniformly distributed across time (i.e., that the process is in-control), we use a Bayesian approach for hypothesis testing, which outputs posterior probabilities for events such as the process being out-of-control at the current time, in the past, or in the future. We perform a simulation study and demonstrate that the proposed methodology has remarkable detection and a low false alarm rate.
Implementing Fast Flexible Space Filling Designs In R
Christopher Dimapasok (IDA / Johns Hopkins University)
I graduated from UCLA in 2020 with a degree in Molecular Cell and Development Biology. Currently, I am a graduate student at Johns Hopkins University and also worked as a Summer Associate for IDA. I hope to leverage my multidisciplinary skills to make a long-lasting impact.
Modeling and simulation (M&S) can be a useful tool for testers and evaluators when they need to augment the data collected during a test event. During the planning phase, testers use experimental design techniques to determine how much and which data to collect. When designing a test that involves M&S, testers can use Space-Filling Designs (SFD) to spread out points across the operational space. Fast Flexible Space-Filling Designs (FFSFD) are a type of SFD that are useful for M&S because they work well in nonrectangular design spaces and allow for the inclusion of categorical factors. Both of these are recurring features in defense testing.Guidance from the Deputy Secretary of Defense and the Director of Operational Test and Evaluation encourages the use of open and interoperable software and recommends the use of SFD. This project aims to address those.IDA analysts developed a function to create FFSFD using the free statistical software R. To our knowledge, there are no R packages for the creation of an FFSFD that could accommodate a variety of user inputs, such as categorical factors. Moreover, by using this function, users can share their code to make their work reproducible.This presentation starts with background information about M&S and, more specifically, SFD. The briefing uses a notional missile system example to explain FFSFD in more detail and show the FFSFD R function inputs and outputs. The briefing ends with a summary of the future work for this project.
Development of a Wald-Type Statistical Test to Compare Live Test Data and M&S Predictions
Carrington Metts (IDA)
Carrington Metts is a Data Science Fellow at IDA. She has a Masters of Science in Business Analytics from the College of William and Mary. Her work at IDA encompasses a wide range of topics, including wargaming, modeling and simulation, natural language processing, and statistical analyses.
This work describes the development of a statistical test created in support of ongoing verification, validation, and accreditation (VV&A) efforts for modeling and simulation (M&S) environments. The test decides between a null hypothesis of agreement between the simulation and reality, and an alternative hypothesis stating the simulation and reality do not agree. To do so, it generates a Wald-type statistic that compares the coefficients of two generalized linear models that are estimated on live test data and analogous simulated data, then determines whether any of the coefficient pairs are statistically different.The test was applied to two logistic regression models that were estimated from live torpedo test data and simulated data from the Naval Undersea Warfare Center’s (NUWC) Environment Centric Weapons Analysis Facility (ECWAF). The test did not show any significant differences between the live and simulated tests for the scenarios modeled by the ECWAF. While more work is needed to fully validate the ECWAF’s performance, this finding suggests that the facility is adequately modeling the various target characteristics and environmental factors that affect in-water torpedo performance.The primary advantage of this test is that it is capable of handling cases where one or more variables are estimable in one model but missing or inestimable from the other. While it is possible to simply create the linear models on the common set of variables, this results in the omission of potentially useful test data. Instead, this approach identifies the mismatched coefficients and combines them with the model’s intercept term, thus allowing the user to consider models that are created on the entire set of available data. Furthermore, the test was developed in a generalized manner without any references to a specific dataset or system. Therefore, other researchers who are conducting VV&A processes on other operational systems may benefit from using this test for their own purposes.
Energetic Defect Characterizations
Naomi Edegbe (United States Military Academy)
Cadet Naomi Edegbe is a senior attending the United States Military Academy. As an Applied Statistics and Data Science Major, she enjoys proving a mathematical concept, wrangling data, and analyzing problems to answer questions. Her short-term professional goals include competing for a fellowship with the National GEM Consortium to obtain a master’s degree in mathematical sciences or data science. After which, she plans to serve her six–year active-duty Army commitment in the Quartermaster Corps. Cadet Edegbe’s l professional goal is to produce meaningful research in STEM, either in application to relevant Army resourcing needs or as a separate track into the field of epidemiology within social frameworks.
Energetic defect characterizations in munitions is a task requiring further refinement in military manufacturing processes. Convolutional neural networks (CNN) have shown promise in defect localization and segmentation in recent studies. These studies supplement that we may utilize a CNN architecture to localize casting defects in X-ray images. The U.S. Armament center has provided munition images for training to develop a system against MILSPEC requirements to identify and categorize defect munitions. In our approach, we utilize preprocessed munitions images and transfer learning from prior studies’ model weights to compare the localization accuracy of this dataset for application in the field.
Covariate Resilience Modeling
Priscila Silva (additional authors: Andrew Bajumpaa, Drew Borden, and Christian Taylor) (University of Massachusetts Dartmouth)
Priscila Silva is a Ph.D. student in Electrical and Computer Engineering at University of Massachusetts Dartmouth (UMassD). She received her MS in Computer Engineering from UMassD in 2022, and her BS degree in Electrical Engineering from Federal University of Ouro Preto (UFOP) in 2017.Andrew Bajumpaa is an undergraduate student in Computer Science at University of Massachusetts Dartmouth.Drew Borden is an undergraduate student in Computer Engineering at University of Massachusetts Dartmouth.Christian Taylor is an undergraduate student in Computer Engineering at University of Massachusetts Dartmouth.
Resilience is the ability of a system to respond, absorb, adapt, and recover from a disruptive event. Dozens of metrics to quantify resilience have been proposed in the literature. However, fewer studies have proposed models to predict these metrics or the time at which a system will be restored to its nominal performance level after experiencing degradation. This talk presents three alternative approaches to model and predict performance and resilience metrics with techniques from reliability engineering, including (i) bathtub-shaped hazard functions, (ii) mixture distributions, and (iii) a model incorporating covariates related to the intensity of events that degrade performance as well as efforts to restore performance. Historical data sets on job losses during seven different recessions in the United States are used to assess the predictive accuracy of these approaches, including the recession that began in 2020 due to COVID-19. Goodness of fit measures and confidence intervals as well as interval-based resilience metrics are computed to assess how well the models perform on the data sets considered. The results suggest that both bathtub-shaped functions and mixture distributions can produce accurate predictions for data sets exhibiting V, U, L, and J shaped curves, but that W and K shaped curves that respectively experience multiple shocks, deviate from the assumption of a single decrease and subsequent increase, or suffers a sudden drop in performance cannot be characterized well by either of those classes proposed. In contrast, the model incorporating covariates is capable of tracking all of types of curves noted above very well, including W and K shaped curves such as the two successive shocks the U.S. economy experienced in 1980 and the sharp degradation in 2020. Moreover, covariate models outperform the simpler models on all of the goodness of fit measures and interval-based resilience metrics computed for all seven data sets considered. These results suggest that classical reliability modeling techniques such as bathtub-shaped hazard functions and mixture distributions are suitable for modeling and prediction of some resilience curves possessing a single decrease and subsequent recovery, but that covariate models to explicitly incorporate explanatory factors and domain specific information are much more flexible and achieve higher goodness of fit and greater predictive accuracy. Thus, the covariate modeling approach provides a general framework for data collection and predictive modeling for a variety of resilience curves.
Application of Software Reliability and Resilience Models to Machine Learning
Zakaria Faddi (University of Massachusetts Dartmouth)
Zakaria Faddi is a master’s student at the University of Massachusetts Dartmouth in the Electrical and Computer Engineering department. He completed his undergraduate in the Spring of 2022 at the same institute in Electrical and Computer Engineering with a concentration in Cybersecurity.
Machine Learning (ML) systems such as Convolutional Neural Networks (CNNs) are susceptible to adversarial scenarios. In these scenarios, an attacker attempts to manipulate or deceive a machine learning model by providing it with malicious input, necessitating quantitative reliability and resilience evaluation of ML algorithms. This can result in the model making incorrect predictions or decisions, which can have severe consequences in applications such as security, healthcare, and finance. Failure in the ML algorithm can lead not just to failures in the application domain but also to the system to which they provide functionality, which may have a performance requirement, hence the need for the application of software reliability and resilience. This talk demonstrates the applicability of software reliability and resilience tools to ML algorithms providing an objective approach to assess recovery after a degradation from known adversarial attacks. The results indicate that software reliability growth models and tools can be used to monitor the performance and quantify the reliability and resilience of ML models in the many domains in which machine learning algorithms are applied.
Application of Recurrent Neural Network for Software Defect Prediction
Fatemeh Salboukh (University of Massachusetts Dartmouth)
I am a Ph.D. student in the Department of Engineering and Applied Science at the University of Massachusetts Dartmouth. I received my Master’s in from the University of Allame Tabataba’i in Mathematical Statistics (September, 2020) and my Bachelor’s degree from Yazd University (July, 2018) in Applied Statistics.
Traditional software reliability growth models (SRGM) characterize software defect detection as a function of testing time. Many of those SRGM are modeled by the non-homogeneous Poisson process (NHPP). However, those models are parametric in nature and do not explicitly encode factors driving defect or vulnerability discovery. Moreover, NHPP models are characterized by a mean value function that predicts the average of the number of defects discovered by a certain point in time during the testing interval, but may not capture all changes and details present in the data and do not consider them. More recent studies proposed SRGM incorporating covariates, where defect discovery is a function of one or more test activities documented and recorded during the testing process. These covariate models introduce an additional parameter per testing activity, which adds a high degree of non-linearity to traditional NHPP models, and parameter estimation becomes complex since it is limited to maximum likelihood estimation or expectation maximization. Therefore, this talk assesses the potential use of neural networks to predict software defects due to their ability to remember trends. Three different neural networks are considered, including (i) Recurrent neural networks (RNNs), (ii) Long short-term memory (LSTM), and (iii) Gated recurrent unit (GRU) to predict software defects. The neural network approaches are compared with the covariate model to evaluate the ability in predictions. Results suggest that GRU and LSTM present better goodness-of-fit measures such as SSE, PSSE, and MAPE compared to RNN and covariate models, indicating more accurate predictions.
Topological Data Analysis’ involvement in Cyber Security
Anthony Salvatore Cappetta and Elie Alhajjar (United States Military Academy)
Anthony “Tony” Cappetta is native of Yardley, Pennsylvania, senior, and Operations Research major at the United States Military Academy (USMA) at West Point. Upon graduation, Tony will commission in the United States Army as a Field Artillery Officer. Tony serves as the training staff officer of the Scoutmasters’ Council, French Forum, and Center for Enhanced Performance. He was also on the Crew team for three years as a student-athlete. An accomplished pianist, Tony currently serves as the Cadet-in-Charge of the Department of Foreign Language’s Piano and Voice Mentorship program, which he has been a part of since arriving at the Academy. Tony has planned and conducted independent research in the field of statistical concepts at USMA as well as independent studies in complex mathematics (Topology and Number Theory) with Dr. Andrew Yarmola of Princeton University. Currently, his research is interested in the Topological Data Analysis application on Cyber Security. He is a current semi-finalist for in the Fulbright program where he hopes to model and map disease transmission in his pursuits to eradicate disease.
The purpose of this research is to see the use and application of Topological Data Analysis (TDA) in the real of Cyber Security. The methods used in this research include an exploration of different Python libraries or C++ python interfaces in order to explore the shape of data that is involved using TDA. These methods include, but are not limited to, the GUDHI, GIOTTO, and Scikit-tda libraries. The project’s results will show where the literal holes in cyber security lie and will offer methods on how to better analyze these holes and breaches.
Room D
Contributed Session 2
Session Chair: TBD
Empirical Calibration for a Linearly Extrapolated Lower Tolerance Bound
Caleb King (JMP Statistical Discovery)
Dr. Caleb King is a Research Statistician Developer for the Design of Experiments platform in the JMP software. He’s been responsible for developing the Sample Size Explorers suite of power and sample size explorers as well as the new MSA Design platform. Prior to joining JMP, he was a senior statistician at Sandia National Laboratories for 3 years, helping engineers design and analyze their experiments. He received his M.S. and Ph.D. in statistics from Virginia Tech, specializing in design and analysis for reliability.
In many industries, the reliability of a product is often determined by a quantile of a distribution of a product’s characteristics meeting a specified requirement. A typical approach to address this is to assume a distribution model and compute a one-sided confidence bound on the quantile. However, this can become difficult if the sample size is too small to reliably estimate a parametric model. Linear interpolation between order statistics is a viable nonparametric alternative if the sample size is sufficiently large. In most cases, linear extrapolation from the extreme order statistics can be used, but can result in inconsistent coverage. In this talk, we’ll present an empirical study from our submitted manuscript used to generate calibrated weights for linear extrapolation that greatly improves the accuracy of the coverage across a feasible range of distribution families with positive support. We’ll demonstrate this calibration technique using two examples from industry.
Analysis of Surrogate Strategies and Regularization with Application to High-Speed Flows
Gregory Hunt (William & Mary)
Greg is an interdisciplinary researcher that helps advance science with statistical and data-analytic tools. He is trained as a statistician, mathematician and computer scientist, and currently works on a diverse set of problems in engineering, physics, and microbiology.
Surrogate modeling is an important class of techniques used to reduce the burden of resource-intensive computational models by creating fast and accurate approximations. In aerospace engineering, surrogates have been used to great effect in design, optimization, exploration, and uncertainty quantification (UQ) for a range of problems, like combustor design, spacesuit damage assessment, and hypersonic vehicle analysis. Consequently, the development, analysis, and practice of surrogate modeling is of broad interest. In this talk, several widely used surrogate modeling strategies are studied as archetypes in a discussion on parametric/nonparametric surrogate strategies, local/global model forms, complexity regularization, uncertainty quantification, and relative strengths/weaknesses. In particular, we consider several variants of two widely used classes of methods: polynomial chaos and Gaussian process regression. These surrogate models are applied to several synthetic benchmark test problems and examples of real high-speed flow problems, including hypersonic inlet design, thermal protection systems, and shock-wave/boundary-layer interactions. Through analysis of these concrete examples, we analyze the trade-offs that modelers must navigate to create accurate, flexible, and robust surrogates.
Case Study on Test Planning and Data Analysis for Comparing Time Series
Phillip Koshute (Johns Hopkins University Applied Physics Laboratory)
Phillip Koshute is a data scientist and statistical modeler at the Johns Hopkins University Applied Physics Laboratory. He has degrees in mathematics and operations research and is currently pursuing his PhD in applied statistics at the University of Maryland.
Several years ago, the US Army Research Institute of Environmental Medicine developed an algorithm to estimate core temperature in military working dogs (MWDs). This canine thermal model (CTM) is based on thermophysiological principles and incorporates environmental factors and acceleration. The US Army Medical Materiel Development Activity is implementing this algorithm in a collar-worn device that includes computing hardware, environmental sensors, and an accelerometer. Among other roles, Johns Hopkins University Applied Physics Laboratory (JHU/APL) is coordinating the test and evaluation of this device.The device’s validation is ultimately tied to field tests involving MWDs. However, to minimize the burden to MWDs and the interruptions to their training, JHU/APL seeks to leverage non-canine laboratory-based testing to the greatest possible extent.For example, JHU/APL is testing the device’s accelerometers with shaker tables that vertically accelerate the device according to specified sinusoidal acceleration profiles. This test yields time series of acceleration and related metrics, which are compared to ground-truth measurements from a reference accelerometer.Statistically rigorous comparisons between the CTM and reference measurements must account for the potential lack of independence between measurements that are close in time. Potentially relevant techniques include downsampling, paired difference tests, hypothesis tests of absolute difference, hypothesis tests of distributions, functional data analysis, and bootstrapping.These considerations affect both test planning and subsequent data analysis. This talk will describe JHU/APL’s efforts to test and evaluate the CTM accelerometers and will outline a range of possible methods for comparing time series.
Model Validation Levels for Model Authority Quantification
Kyle Provost (STAT COE)
Kyle is a STAT Expert (Huntington Ingalls Industries contractor) at the Scientific Test and Analysis Techniques (STAT) Center of Excellence (COE) at the Air Force Institute of Technology (AFIT). The STAT COE provides independent STAT consultation to designated acquisition programs and special projects to improve Test & Evaluation (T&E) rigor, effectiveness, and efficiency. He received his M.S. in Applied Statistics from Wright State University.
Due to the increased use of Modeling & Simulation (M&S) in the development of Department of Defense (DOD) weapon systems, it is critical to assign models appropriate levels of trust. Validation is an assessment process that can help mitigate the risks posed by relying on potentially inaccurate, insufficient, or incorrect models. However, validation criteria are often subjective, and inconsistently applied between differing models. Current Practice fails to reassess models as requirements change, mission scope is redefined, new data is collected, or models are adapted to a new use. This brief will present Model Validation Levels (MVLs) as a validation paradigm that enables rigorous, objective validation of a model and yields metrics that quantify the amount of trust that can be placed in a model. This validation framework will be demonstrated through a real-world example detailing the construction and interpretation of MVLs.
Room E

Virtual Session
Mini-Tutorial 3
Session Chair: Heather Wojton, IDA
Data Management for Research, Development, Test, and Evaluation
Matthew Avery (IDA)
Matthew Avery is an OED Assistant Director and part of OED’s Sustainment group. He represents OED on IDA’s Data Governance Council and acts as the Deputy to IDA’s Director of Data Strategy and Chief Data Officer, helping craft data-related strategy and policy.Matthew spearheads a Sustainment group effort to develop an end-to-end model to identify ways to improve mission-capable rates for the V-22 fleet. Prior to joining Sustainment, Matthew was on the Test Science team. As the Test Science Data Management lead, he helped develop analytical methods and tools for operational test and evaluation. He also led OED’s project on operational test and evaluation of Army and Marine Corps unmanned aircraft systems. In 2018-19 Matthew served as an embedded analyst in the Pentagon’s Office of Cost Assessment and Program Evaluation, where among other projects he built state-space models in support of the Space Control Strategic Portfolio Review.Matthew earned his PhD in Statistics from North Carolina State University in 2012, his MS in Statistics from North Carolina State in 2009, and a BA from New College of Florida in 2006. He is a member of the American Statistical Association.
It is important to manage Data from research, development, test, and evaluation effectively. Well-managed data makes research more efficient and promotes better analysis and decision-making. At present, numerous federal organizations are engaged in large-scale reforms to improve the way they manage their data, and these reforms are already effecting the way research is executed. Data management effects every part of the research process. Thoughtful, early planning sets research projects on the path to success by ensuring that the resources and expertise required to effectively manage data throughout the research process are in place when they are needed.This interactive tutorial will discuss the planning and execution of data management for research projects. Participants will build a data management plan, considering data security, organization, metadata, reproducibility, and archiving.By the conclusion of the tutorial, participants will be able to define data management and understand its importance, understand how the data lifecycle relates to the research process, and be able to build a data management plan.
5:00 PM – 7:00 PM: Parallel Sessions
Café
Comparison of Magnetic Field Line Tracing Methods
Dean Thomas (George Mason University)
In 2022, Dean Thomas joined a NASA Goddard collaboration examining space weather phenomena. His research is examining major solar events that affect the earth. While the earth’s magnetic field protects the earth from solar radiation, solar storms can distort the earth’s magnetic field allowing the storms to damage satellites and electrical grids. Previously, he was Deputy Director for the Operational Evaluation Division (OED) at the Institute for Defense Analyses (IDA), managing a team of 150 researchers. OED supports the Director, Operational Test and Evaluation (DOT&E) within the Pentagon, who is responsible for operational testing of new military systems including aircraft, ships, ground vehicles, sensors, weapons, and information technology systems. His analyses fed into DOT&E’s reports and testimony to Congress and the Secretary of Defense on whether these new systems can successfully complete their missions and protect their crews. He received his PhD in Physics in 1987 from the State University of New York (SUNY), Stony Brook.
At George Mason University, we are developing swmfio, a Python package, for processing Space Weather Modeling Framework (SWMF) magnetosphere and ionosphere results, which is used to study the sun, heliosphere, and the magnetosphere. The SWMF framework centers around a high-performance magnetohydrodynamic (MHD) model, the Block Adaptive Tree Solar-wind Roe Upwind Scheme (BATS-R-US). This analysis uses swmfio and other methods, to trace magnetic field lines, compare the results, and identify why the methods differ. While the earth’s magnetic field protects the planet from solar radiation, solar storms can distort the earth’s magnetic field allowing solar storms to damage satellites and electrical grids. Being able to trace magnetic field lines helps us understand space weather. In this analysis, the September 1859 Carrington Event is examined. This event is the most intense geomagnetic storm in recorded history. We use three methods to trace magnetic field lines in the Carrington Event, and compare the field lines generated by the different methods. We consider two factors in the analysis. First, we directly compare methods by measuring the distances between field lines generated by different methods. Second, we consider how sensitive the methods are to initial conditions. We note that swmfio’s linear interpolation, which is customized for the BATS-R-US adaptive mesh, provides expected results. It is insensitive to small changes in initial conditions and terminates field lines at boundaries. We observe, that for any method, when the mesh size becomes large, results may not be accurate.
Framework for Operational Test Design: An Example Application of Design Thinking
Miriam Armstrong (IDA)
Dr. Armstrong is a human factors researcher at IDA where she is involved in operational testing of defense systems. Her expertise includes interactions between humans and autonomous systems and psychometrics. She received her PhD in Human Factors Psychology from Texas Tech University in 2021.
Design thinking is a problem-solving approach that promotes the principles of human-centeredness, iteration, and diversity. The poster provides a five-step framework for how to incorporate these design principles when building an operational test. In the first step, test designers conduct research on test users and the problems they encounter. In the second step, designers articulate specific user needs to address in the test design. In the third step, designers generate multiple solutions to address user needs. In the forth step, designers create prototypes of their best solutions. In the fifth step, designers refine prototypes through user testing.
The Component Damage Vector Method: A statistically rigorous method for validating AJEM us
Tom Johnson (IDA)
Tom works on the LFT&E of Army land-based systems. He has three degrees in Aerospace Engineering and specializes in statistics and experimental design, including the validation of modeling and simulation. Tom has been at IDA for 11 years.
As the Test and Evaluation community increasingly relies on Modeling and Simulation (M&S) to supplement live testing, M&S validation has become critical for ensuring credible weapon system evaluations. System-level evaluations of Armored Fighting Vehicles (AFV) rely on the Advanced Joint Effectiveness Model (AJEM) and Full-Up System Level (FUSL) testing to assess AFV vulnerability. This report reviews one of the primary methods that analysts use to validate AJEM, called the Component Damage Vector (CDV) Method. The CDV method compares components that were damaged in FUSL testing to simulated representations of that damage from AJEM.
Fully Bayesian Data Imputation using Stan Hamiltonian Monte Carlo
Melissa Hooke (NASA Jet Propulsion Laboratory)
Melissa Hooke is a Systems Engineer at the Jet Propulsion Laboratory in the Systems Modeling, Analysis & Architectures group. She is the task manager for NASA’s CubeSat or Microsat Probabilistic and Analogies Cost Tool (COMPACT) and the Analogy Software Cost Tool (ASCoT), and is the primary statistical model developer for the NASA Instrument Cost Model (NICM). Her areas of interest include Bayesian modeling, uncertainty quantification, and data visualization. Melissa was the recipient of the “Rising Star” Award at the NASA Cost Symposium in 2021. Melissa earned her B.A. in Mathematics and Statistics at Pomona College where she developed a Bayesian model for spacecraft safe mode events for her undergraduate thesis.
When doing multivariate data analysis, one common obstacle is the presence of incomplete observations, i.e., observations for which one or more covariates are missing data. Rather than deleting entire observations that contain missing data, which can lead to small sample sizes and biased inferences, data imputation methods can be used to statistically “fill-in” missing data. Imputing data can help combat small sample sizes by using the existing information in partially complete observations with the end goal of producing less biased and higher confidence inferences.
In aerospace applications, imputation of missing data is particularly relevant because sample sizes are small and quantifying uncertainty in the model is of utmost importance. In this paper, we outline the benefits of a fully Bayesian imputation approach which samples simultaneously from the joint posterior distribution of model parameters and the imputed values for the missing data. This approach is preferred over multiple imputation approaches because it performs the imputation and modeling steps in one step rather than two, making it more compatible with complex model forms. An example of this imputation approach is applied to the NASA Instrument Cost Model (NICM), a model used widely across NASA to estimate the cost of future spaceborne instruments. The example models are implemented in Stan, a statistical-modeling tool enabling Hamiltonian Monte Carlo (HMC).
Introducing TestScience.org
Sean Fiorito (IDA / V-Strat, LLC)
Mr. Fiorito has been a contractor for the Institute for Defense Analyses (IDA) since 2015. He was a part of the original team to design and develop both the DATAWorks and Test Science team websites. He has expertise in application development, integrated systems, cloud architecture and cloud adoption.Mr. Fiorito started his Federal IT career in 2004 with Booz Allen Hamilton. Since then he’s worked with other Federal IT contract firms, both large (Deloitte, Accenture) and small (Fila, Dynamo). He has contributed to projects such as the Coast Guard’s Rescue 21, the Forest Service’s Electronic Management of NEPA (eMNEPA) and Federal Student Aide’s Enterprise Cloud Migration.He holds a BS in Information Systems with a concentration in programming, as well as an Amazon Web Services Cloud Architect certification.
The Test Science Team facilitates data-driven decision-making by disseminating various testing and analysis methodologies. One way they disseminate these methodologies is through the annual workshop, DATAWorks; another way is through the website, TestScience.org. The Test Science website includes video training, interactive tools, a related research library as well as the DATAWorks Archive.“Introducing TestScience.org”, a presentation at DATAWorks, could include a poster and an interactive guided session through the site content. The presentation would inform interested DATAWorks attendees of the additional resources throughout the year. It could also be used to inform the audience about ways to participate, such as contributing interactive Shiny tools, training content, or research.“Introducing TestScience.org” would highlight the following sections of the website:1. The DATAWorks Archives2. Learn (Video Training)3. Tools (Interactive Tools)5. Research (Library)6. Team (About and Contact)Incorporating into DATAWorks an introduction to TestScience.org would inform attendees of additional valuable resources available to them, and could encourage broader participation in Testscience.org, adding value to both the DATAWorks attendees and the TestScience.org efforts.
Developing a Domain-Specific NLP Topic Modeling Process for Army Experimental Data
Anders Grau (United States Military Academy)
Anders Grau is a United States Military Academy cadet currently studying for a Bachelor of Science in Operations Research. In his time as a cadet, he has had the opportunity to work with the Research Facilitation Laboratory to analyse insider threats in the Army and has conducted an independent study on topic modelling with Twitter data. He is currently writing a thesis on domain-specific topic modelling for Army experimental data. Upon the completion of his studies, he will commission as a Second Lieutenant in the Army’s Air Defense Artillery branch.
Researchers across the U.S. Army are conducting experiments on the implementation of emerging technologies on the battlefield. Key data points from these experiments include text comments on the technologies’ performances. Researchers use a range of Natural Language Processing (NLP) tasks to analyse such comments, including text summarization, sentiment analysis, and topic modelling. Based on the successful results from research in other domains, this research aims to yield greater insights by implementing military-specific language as opposed to a generalized corpus. This research is dedicated to developing a methodology to analyze text comments from Army experiments and field tests using topic models trained on an Army domain-specific corpus. The methodology is tested on experimental data agglomerated in the Forge database, an Army Futures Command (AFC) initiative to provide researchers with a common operating picture of AFC research. As a result, this research offers an improved framework for analysis with domain-specific topic models for researchers across the U.S. Army.
The Application of Semi-Supervised Learning in Image Classification
Elijah Abraham Dabkowski (United States Military Academy)
CDT Elijah Dabkowski is a senior at the United States Military Academy majoring in Applied Statistics and Data Science. He branched Engineers and hopes to pursue a Master of Science in Data Science through a technical scholarship upon graduation. Within the Army, CDT Dabkowski plans to be a combat engineer stationed in either Germany or Italy for the early portion of his career before transitioning to the Operations Research and Systems Analysis career field in order to use his knowledge to help the Army make informed data-driven decisions. His research is centered around the application of semi-supervised learning in image classification to provide a proof-of-concept for the Army in how data science can be integrated with the subject matter expertise of professional analysts to streamline and improve current practices. He enjoys soccer, fishing, and snowboarding and is a member of the club soccer team as well as a snowboard instructor at West Point.
In today’s Army, one of the fastest growing and most important areas in the effectiveness of our military is data science. One aspect of this field is image classification, which has applications such as target identification. However, one drawback within this field is that when an analyst begins to deal with a multitude of images, it becomes infeasible for an individual to examine all the images and classify them accordingly. My research presents a methodology for image classification which can be used in a military context, utilizing a typical unsupervised classification approach involving K-Means to classify a majority of the images while pairing this with user input to determine the label of designated images. The user input comes in the form of manual classification of certain images which are deliberately selected for presentation to the user, allowing this individual to select which group the image belongs in and refine the current image clusters. This shows how a semi-supervised approach to image classification can efficiently improve the accuracy of the results when compared to a traditional unsupervised classification approach.
Best Practices for Using Bayesian Reliability Analysis in Developmental Testing
Paul Fanto (IDA)
Paul Fanto is a research staff member at the Institute for Defense Analyses (IDA). His work focuses on the modeling and analysis of space and ISR systems and on statistical methods for reliability. He received a Ph.D. in Physics from Yale University in 2021, where he developed computational models of atomic nuclei.
Traditional methods for reliability analysis are challenged in developmental testing (DT) as systems become increasingly complex and DT programs become shorter and less predictable. Bayesian statistical methods, which can combine data across DT segments and use additional data to inform reliability estimates, can address some of these challenges. However, Bayesian methods are not widely used. I will present the results of a study aimed at identifying effective practices for the use of Bayesian reliability analysis in DT programs. The study consisted of interviews with reliability subject matter experts, together with a review of relevant literature on Bayesian methods. This analysis resulted in a set of best practices that can guide an analyst in deciding whether to apply Bayesian methods, in selecting the appropriate Bayesian approach, and in applying the Bayesian method and communicating the results.
Test and Evaluation Tool for Stealthy Communication
Olga Chen (U.S. Naval Research Laboratory)
Dr. Olga Chen has worked as a Computer Scientist at the U.S. Naval Research Laboratory since 1999. For the last three years, she has been the Principal Investigator for the “Stealthy Communications and Situational Awareness” project. Her current research focuses on network protocols and communications, design of security protocols and architectures, and their analysis and verification. She has published peer-reviewed research on approaches to software security and on design and analysis of stealthy communications. She has a Doctorate in Computer Science from the George Washington University.
Stealthy communication allows the transfer of information while hiding not only the content of that information but also the fact that any hidden information was transferred. One way of doing this is embedding information into network covert channels, e.g., timing between packets, header fields, and so forth. We describe our work on an integrated system for the design, analysis, and testing of such communication. The system consists of two main components: the analytical component, the NExtSteP (NRL Extensible Stealthy Protocols) testbed, and the emulation component, consisting of CORE (Common Open Research Emulator), an existing open source network emulator, and EmDec, a new tool for embedding stealthy traffic in CORE and decoding the result.We developed the NExtSteP testbed as a tool to evaluate the performance and stealthiness of embedders and detectors applied to network traffic. NExtSteP includes modules to: generate synthetic traffic data or ingest it from an external source (e.g., emulation or network capture); embed data using an extendible collection of embedding algorithms; classify traffic, using an extendible collection of detectors, as either containing or not containing stealthy communication; and quantify, using multiple metrics, the performance of a detector over multiple traffic samples. This allows us to systematically evaluate the performance of different embedders (and embedder parameters) and detectors against each other.Synthetic data are easy to generate with NExtSteP. We use these data for initial experiments to broadly guide parameter selection and to study asymptotic properties that require numerous long traffic sequences to test. The modular structure of NExtSteP allows us to make our experiments increasingly realistic. We have done this in two ways: by ingesting data from captured traffic and then doing embedding, classification, and detector analysis using NExtSteP, and by using EmDec to produce external traffic data with embedded communication and then using NExtStep to do the classification and detector analysis.The emulation component was developed to build and evaluate proof-of-concept stealthy communications over existing IP networks. The CORE environment provides a full network, consisting of multiple nodes, with minimal hardware requirements and allows testing and orchestration of real protocols. Our testing environment allows for replay of real traffic and generation of synthetic traffic using MGEN (Multi-Generator) network testing tool. The EmDec software was created with the already existing NRL-developed protolib (protocol library). EmDec, running on CORE networks and orchestrated using a set of scripts, generates sets of data which are then evaluated for effectiveness by NExtSteP. In addition to evaluation by NExtSteP, development of EmDec allowed us to discover multiple novelties that were not apparent while using theoretical models.We describe current status of our work, the results so far, and our future plans.
Comparison of Bayesian and Frequentist Methods for Regression
James P Theimer (Homeland Security Community of Best Practices)
Dr. James Theimer is a Scientific Test and Analysis Techniques Expert employed by Huntington Ingles Industries Technical Solutions and working to support the Homeland Security Center of Best Practices.Dr. Theimer worked for Air Force Research Laboratory and predecessor organizations for more than 35 years. He worked on modeling and simulation of sensors systems and supporting devices. His doctoral research was on modeling pulse formation in fiber lasers. He worked with a semiconductor reliability team as a reliability statistician and led a team which studied statistical validation of models of automatic sensor exploitation systems. This team also worked with programs to evaluate these systems.Dr. Theimer has a PhD in Electrical Engineering from Rensselaer Polytechnic Institute, and MS in Applied Statistics from Wright State University, and MS in Atmospheric Science from SUNY Albany and a BS in Physics from University of Rochester.
Statistical analysis is typically conducted using either a frequentist or Bayesian approach. But what is the impact of choosing one analysis method over another? This presentation will compare the results of both linear and logistic regression using Bayesian and frequentist methods. The data set combines information on simulated diffusion of material and anticipated background signal to imitate sensor output. The sensor is used to estimate the total concentration of material, and a threshold will be set such that the false alarm rate (FAR) due to the background is a constant. The regression methods are used to relate the probability of detection, for a given FAR, to predictor variables, such as the total amount of material released. The presentation concludes with a comparison of the similarities and differences between the two methods given the results.
Post-hoc UQ of Deep Learning Models Applied to Remote Sensing Image Scene Classification
Alexei Skurikhin (Los Alamos National Laboratory)
Alexei Skurikhin is a scientist with Remote Sensing and Data Science group at Los Alamos National Laboratory (LANL). He holds a Ph.D. in Computer Science and has been working at LANL since 1997 in the areas of signal and image analysis, evolutionary computations, computer vision, machine learning, and remote sensing applications.
Post-hoc Uncertainty Quantification of Deep Learning Models Applied to Remote Sensing Image Scene ClassificationSteadily growing quantities of high-resolution UAV, aerial, and satellite imagery provide an exciting opportunity for global transparency and geographic profiling of activities of interest. Advances in deep learning, such as deep convolutional neural networks (CNNs) and transformer models, offer more efficient ways to exploit remote sensing imagery. Transformers, in particular, are capable of capturing contextual dependencies in the data. Accounting for context is important because activities of interest are often interdependent and reveal themselves in co-occurrence of related image objects or related signatures. However, while transformers and CNNs are powerful models, their predictions are often taken as point estimates, also known as pseudo probabilities, as they are computed by the softmax function. They do not provide information about how confident the model is in its predictions, which is important information in many mission-critical applications, and therefore limits their use in this space.Model evaluation metrics can provide information about the predictive model’s performance. We present and discuss results of post-hoc uncertainty quantification (UQ) of deep learning models, i.e., UQ application to trained models. We consider an application of CNN and transformer models to remote sensing image scene classification using satellite imagery, and compare confidence estimates of scene classification predictions of these models using evaluation metrics, such as expected calibration error, reliability diagram, and Brier score, in addition to conventional metrics, e.g. accuracy and F1 score. For validation, we use the publicly available and well-characterized Remote Sensing Image Scene Classification (RESISC45) dataset, which contains 31,500 images, covering 45 scene categories with 700 images in each category, and with the spatial resolution that varies from 30 to 0.2 m per pixel. This dataset was collected over different locations and under different conditions and possesses rich variations in translation, viewpoint, object pose and appearance, spatial resolution, illumination, background, and occlusion.
Multimodal Data Fusion: Enhancing Image Classification with Text
Jack Perreault (United States Military Academy)
CDT Jack Perreault is a senior at the United States Military Academy majoring in Applied Statistics and Data Science and will commission as a Signal officer upon graduation. He hopes to pursue a Master of Science in Data Science through a technical scholarship. Within the Army, CDT Perreault plans to work in the 528th Special Operations Sustainment Brigade at Fort Bragg, North Carolina before transitioning to the Operations Research and Systems Analysis career field where he can conduct data-driven analysis that affects the operational and strategic decisions of the Army. CDT Perreault hopes to return to the United States Military Academy as an instructor within the Math Department where he can teach and inspire future cadets before transitioning to civilian sector. His current research is centered around analyzing how the use of a multimodal data fusion algorithm can leverage both images and accompanying text to enhance image classification. His prior research involves predictive modeling by analyzing role of public perception of the Vice President and its impact on presidential elections. CDT Perreault is a member of West Point’s track and field team and enjoys going to the beach while at home in Rhode Island.
Image classification is a critical part of gathering information on high-value targets. To this end, Convolutional Neural Networks (CNN) have become the standard model for image and facial classification. However, CNNs alone are not entirely effective at image classification, and especially human classification due to their lack of robustness and bias. Recent advances in CNNs, however, allow for data fusion to help reduce the uncertainty in their predictions. In this project, we describe a multimodal algorithm designed to increase confidence in image classification with the use of a joint fusion model with image and text data. Our work utilizes CNNs for image classification and bag-of-words for text categorization on Wikipedia images and captions relating to the same classes as the CIFAR-100 dataset. Using data fusion, we combine the vectors of the CNN and bag-of-words models and utilize a fully connected network on the joined data. We measure improvements by comparing the SoftMax layer for the joint fusion model and image-only CNN.
Predicting Success and Identifying Key Characteristics in Special Forces Selection
Mark Bobinski (United States Military Academy)
I am currently a senior at the United States Military Academy at West Point. My major is Applied Statistics and Data Science and I come from Cleveland, Ohio. This past summer I had the opportunity to work with the Army’s Special Warfare Center and School as an intern where we began the work on this project. I thoroughly enjoy mathematical modeling and look to begin a career in data science upon retiring from the military.
The United States Military possesses special forces units that are entrusted to engage in the most challenging and dangerous missions that are essential to fighting and winning the nations wars. Entry into special forces is based on a series of assessments called Special Forces Assessment and Selection (SFAS), which consists of numerous challenges that test a soldiers mental toughness, physical fitness, and intelligence. Using logistic regression, random forest classification, and neural network classification, the researchers in this study aim to create a model that both accurately predicts whether a candidate passes SFAS and which variables are significant indicators of passing selection. Logistic regression proved to be the most accurate model, while also highlighting physical fitness, military experience, and intellect as the most significant indicators associated with success.
The Calculus of Mixed Meal Tolerance Test Trajectories
Skyler Chauff (United States Military Academy)
Skyler Chauff is a third-year student at the United States Military Academy at West Point. He is studying “Operations Research” and hopes to further pursue a career in data science in the Army. His hobbies include scuba-diving, traveling, and tutoring. Skyler is the head of the West Point tutoring program and helps lead the Army Smart nonprofit in providing free tutoring services to enlisted soldiers pursuing higher-level education. Skyler specializes in bioinformatics given his pre-medical background interwoven with his passion for data science.
BACKGROUNDPost-prandial glucose response resulting from a mixed meal tolerance test is evaluated from trajectory data of measured glucose, insulin, C-peptide, GLP-1 and other measurements of insulin sensitivity and β-cell function. In order to compare responses between populations or different composition of mixed meals, the trajectories are collapsed into the area under the curve (AUC) or incremental area under the curve (iAUC) for statistical analysis. Both AUC and iAUC are coarse distillations of the post-prandial curves and important properties of the curve structure are lost.METHODSVisual Basic Application (VBA) code was written to automatically extract seven different key calculus-based curve-shape properties of post-prandial trajectories (glucose, insulin, C-peptide, GLP-1) beyond AUC. Through two-sample t-tests, the calculus-based markers were compared between outcomes (reactive hypoglycemia vs. healthy) and against demographic information.RESULTSStatistically significant p-values (p < .01) between multiple curve properties in addition to AUC were found between each molecule studied and the health outcome of subjects based on the calculus-based properties of their molecular response curves. A model was created which predicts reactive hypoglycemia based on individual curve properties most associated with outcomes.CONCLUSIONSThere is a predictive power using response curve properties that was not present using solely AUC. In future studies, the response curve calculus-based properties will be used for predicting diabetes and other health outcomes. In this sense, response-curve properties can predict an individual's susceptibility to illness prior to its onset using solely mixed meal tolerance test results.
Using Multi-Linear Regression to Understand Cloud Properties’ Impact on Solar Radiance
Grant Parker (United States Military Academy)
CDT Grant Parker attends the United States Military Academy and will graduate and commission in May 2023. He is an Applied Statistics and Data Science major and is currently conducting his senior thesis with Lockheed Martin Space. At the academy, he serves as 3rd Regiment’s Operations Officer where he is responsible for planning and coordinating all trainings and events for the regiment. After graduation, CDT Parker hopes to attend graduate school and then start his career as a cyber officer in the US Army.
With solar energy being the most abundant energy source on Earth, it is no surprise that the reliance on solar photovoltaics (PV) has grown exponentially in the past decade. The increasing costs of fossil fuels have made solar PV more competitive and renewable energy more attractive, and the International Energy Agency (IEA) forecasts that solar PV’s installed power capacity will surpass that of coal by 2027. Crucial to the management of solar PV power is the accurate forecasting of solar irradiance, which is heavily impacted by different types and distributions of clouds. Many studies have aimed to develop models that accurately predict the global horizontal irradiance (GHI) while accounting for the volatile effects of clouds; in this study, we aim to develop a statistical model that helps explain the relationship between various cloud properties and solar radiance reflected by clouds them-self. Using 2020 GOES-16 data from the GOES R-Series Advanced Baseline Imager (ABI), we investigated the effect that the cloud-optical depth, cloud top temperature, solar zenith angle, and look zenith angle had on cloud solar radiance while accounting for differing longitude and latitudes. Using these variables as the explanatory variables, we developed a linear model using multi-linear regression that, when tested on untrained data sets from different days (same time of day as the training set), results in a coefficient of determination (R^2) between .70-.75. Lastly, after analyzing the variables’ degree of contribution to the cloud solar radiance, we presented error maps that highlight areas where the model succeeds and fails in prediction accuracy.
Data Fusion: Using Data Science to Facilitate the Fusion of Multiple Streams of Data
Madison McGovern (United States Military Academy)
Madison McGovern is a senior at the United States Military Academy majoring in Applied Statistics and Data Science. Upon graduation, she is headed to Fort Gordon, GA to join the Army’s Cyber branch as a Cyber Electromagnetic Warfare Officer. Her research interests include using machine learning to assist military operations.
Today there are an increasing number of sensors on the battlefield. These sensors collect data that includes, but is not limited to, images, audio files, videos, and text files. With today’s technology, the data collection process is strong, and there is a growing opportunity to leverage multiple streams of data, each coming in different forms. This project aims to take multiple types of data, specifically images and audio files, and combine them to increase our ability to detect and recognize objects. The end state of this project is the creation of an algorithm that utilizes and merges voice recordings and images to allow for easier recognition.Most research tends to focus on one modality or the other, but here we focus on the prospect of simultaneously leveraging both modalities for improved entity resolution. With regards to audio files, the most successful deconstruction and dimension reduction technique is a deep auto encoder. For images, the most successful technique is the use of a convolutional neural network. To combine the two modalities, we focused on two different techniques. The first was running each data source through a neural network and multiplying the resulting class probability vectors to capture the combined result. The second technique focused on running each data source through a neural network, extracting a layer from each network, concatenating the layers for paired image and audio samples, and then running the concatenated object through a fully connected neural network.
Assessing Risk with Cadet Candidates and USMA Admissions
Daniel Lee (United States Military Academy)
I was born in Harbor City, California at the turn of the millennium to two Korean immigrant parents. For most of my young life, I grew up in various communities in Southern California before my family ultimately settled in Murrieta in the Inland Empire, roughly equidistant from Los Angeles and San Diego. At the beginning of my 5th grade year, my father accepted a job offer with the US Army Corps of Engineers at Yongsan, South Korea. The seven years that I would spend in Korea would be among the most formative and fond years of my life. In Korea, I grew to better understand the diverse nations that composed the world and grew closer to my Korean heritage that I often forgot living in the US. It was in Korea, however, that I made my most impactful realization: I wanted to serve in the military. The military never crossed my mind as a career growing up. Growing up around the Army in Korea, I knew this was a path I wanted. Though the military was entirely out of my character, I spent the next several years working towards my goal of becoming an Army officer. Just before my senior year of high school, my family moved again to the small, rural town of Vidalia, Louisiana. I transitioned from living in a luxury high-rise in the middle of Seoul to a bungalow in one of the poorest regions of the US. Yet, I once again found myself entranced; not only did I once again grow to love my new home, but I also began to open my mind to the struggles, perspectives, and motivations of many rural Americans. To this day, I proudly proclaim my hometown and state of residence as Vidalia, Louisiana. My acceptance into West Point shortly after my move marked the beginning of my great adventure, fulfilling a life-long dream of serving in the Army and becoming an officer.
Though the United States Military Academy (USMA) graduates approximately 1,000 cadets annually, over 100 cadets from the initial cohort fail to graduate and are separated or resign at great expense to the federal government. Graduation risk among incoming cadet candidates is difficult to measure; based on current research, the strongest predictors of college graduation risk are high school GPA and, to a lesser extent, standardized test scores. Other predictors include socioeconomic factors, demographics, culture, and measures of prolonged and active participation in extra-curricular activities. For USMA specifically, a cadet candidate’s Whole Candidate Score (WCS), which includes measures to score leadership and physical fitness, has historically proven to be a promising predictor of a cadet’s performance at USMA. However, predicting graduation rates and identifying risk variables still proves to be difficult. Using data from the USMA Admissions Department, we used logistic regression, k-Nearest Neighbors, random forests, and gradient boosting algorithms to better predict which cadets would be separated or resign using potential variables that may relate to graduation risk. Using measures such as p-values for statistical significance, correlation coefficients, and the Area Under the Curve (AUC) scores to determine true positives, we found supplementing the current admissions criteria with data on the participation of certain extra-curricular activities improves prediction rates on whether a cadet will graduate.
Overarching Tracker of DOT&E Actions
Buck Thome (IDA)
Dr. Thome is a member of the research staff at Institute for Defense Analyses, focusing on test and evaluation of net-centric systems and cybersecurity. He received his PhD in Experimental High Energy Physics from Carnegie Mellon University in 2011. After working with a small business defense contractor developing radio frequency sensor systems, he came to IDA in 2013.
OED’s Overarching Tracker of DOT&E Actions distills information from DOT&E’s operational test reports and memoranda on test plan and test strategy approvals to generate informative metrics on the office’s activities. In FY22, DOT&E actions covered 68 test plans, 28 strategies, and 28 reports, relating to 74 distinct programs. This poster presents data from those documents and highlights findings on DOT&E’s effectiveness, suitability, and survivability determinations and other topics related to the state of T&E.