Threat Modeling with GenAI and LLM: Experience and Feedback

As a cybersecurity researcher, I’ve always been interested in leveraging cutting-edge technologies to improve security practices. Since ChatGPT and other large language models (LLMs) are now a big part of our everyday lives, there’s a good chance you’re going to use them in your exercises at some point. Whether you choose to or not is completely up to you. For me, as someone focused on security and threat modeling, I decided to give them a try. In this "essay", I want to share my personal experience using LLMs for threat modeling, both the positive and the negative aspects, so you can decide for yourself if it’s something worth exploring.

After doing some research, I discovered two main ways to use LLMs for threat modeling. One approach is treating the LLM as a specialized research engine, using tools like the GPT-4 Threat Modeling Mentor to guide you through the process and suggest potential threats. The other approach is using specific tools like STRIDEGPT, which follows the STRIDE framework to identify threats across various categories.

GPT-4 Threat Modeling Mentor Approach

The first step in any threat modeling exercise is to define the scope of your analysis. You need to identify the system or application you want to assess and set clear objectives. This is crucial because it helps focus your efforts and ensures the analysis covers the right areas. For example, you might be examining a web application, an IoT device, or an internal network. In my case, I focused on a Mobile Banking Platform, a comprehensive application allowing users to securely manage their financial accounts from their mobile devices. This platform includes features like balance checking, money transfers, bill payments, and even loan applications. With so many sensitive transactions happening in one place, ensuring the platform’s security was critical. Once the scope was clear, I began using GPT-4 to identify potential threats for this complex system, laying the groundwork for the rest of my threat modeling journey.

Identifying critical assets is a key step in threat modeling. ChatGPT helped me brainstorm assets for the mobile banking platform, such as user data, servers, databases, and intellectual property. It quickly identified components like perimeter and application servers, back-end databases holding sensitive user data (PII), and the custom-developed source code. While this made the process efficient, it occasionally missed context-specific assets, meaning I had to manually adjust the list to better fit the platform’s structure.

For identifying threat agents, ChatGPT suggested potential attackers like hackers, disgruntled employees, competitors, and even natural disasters. This gave me a broad view of risks, but some suggestions, like natural disasters, were irrelevant and required filtering out. However, it also revealed that using LLMs for brainstorming helped me cast a wider net than I would have manually, ensuring I didn’t miss any critical potential threat agents.

When it came to identifying vulnerabilities, ChatGPT was helpful. It flagged common weaknesses tied to the technology stack, software components, and configurations. For example, it highlighted insecure APIs and weak encryption practices, especially regarding user login sessions, which could be vulnerable to man-in-the-middle attacks. However, it occasionally missed system-specific vulnerabilities, requiring further investigation. This is where human validation and understanding of the system are still crucial.

For mitigation strategies, ChatGPT suggested practical solutions like stronger encryption, multi-factor authentication, and adjusting the system’s architecture. For example, it recommended isolating user data to minimize the impact of breaches. While these suggestions were helpful, I had to adapt them to fit the system’s specific needs and constraints. This reinforced the notion that AI can speed up processes and provide useful starting points, but human expertise is essential to refine and apply these suggestions effectively.

Finally, ChatGPT can also help map vulnerabilities in your platform to specific techniques in the MITRE ATT&CK framework. This creates a clearer link between weaknesses and real-world adversary behaviors, allowing for better prioritization of mitigations. For example, if your system has a vulnerability related to insecure API usage, ChatGPT can quickly identify relevant MITRE ATT&CK techniques, such as T1071.001 - Application Layer Protocol, which is commonly exploited by attackers.

STRIDEGPT Approach

Now, let’s explore the STRIDEGPT tool. STRIDEGPT is an AI-powered tool that assists with threat modeling using the STRIDE framework. Users input system details like architecture, technology stack, components, and user flows, then define the scope of the threat modeling exercise by focusing on areas like authentication, data storage, or communication channels. Based on this input, STRIDEGPT generates a list of threats, categorized by the STRIDE model, and provides insights into common vulnerabilities, such as weak encryption or insecure APIs. It also suggests mitigation strategies, like architectural changes or the use of specific security tools, and offers guidance on assessing risks by calculating risk scores based on the impact and likelihood of each threat.

Using STRIDEGPT for the mobile banking platform provided a structured, efficient way to identify security risks. It analyzed core functionalities like user authentication, money transfers, and account management, as well as client-server communication through RESTful APIs and encrypted data storage. STRIDEGPT quickly identified threats such as spoofing, tampering, and information disclosure, including scenarios like attackers intercepting authentication tokens or tampering with API requests. The tool also recommended mitigation strategies like strengthening encryption, implementing advanced token validation, and conducting regular penetration testing.

On the positive side, STRIDEGPT made the process smooth and insightful, offering clear guidance on vulnerabilities and threat agents. It helped prioritize risks and suggested practical mitigation steps, saving time and making the process easier to manage. However, it wasn’t without its downsides. The tool occasionally overgeneralized certain threats and provided mitigation suggestions that lacked the depth needed for specific system configurations. This required additional manual fine-tuning to ensure the solutions were fully applicable to the platform. Moreover, while STRIDEGPT provides outputs based on the STRIDE framework, to align with our internal processes, we need to adapt these outputs to fit the YACRAF framework, ensuring consistent and comprehensive threat modeling. This matching process is a necessary step to ensure the AI-generated content fits within the structured guidelines of YACRAF.

Conclusion

In my experience, using LLMs like ChatGPT and STRIDEGPT for threat modeling offers distinct advantages, especially in terms of efficiency and the ability to quickly generate ideas. ChatGPT-4 is great for brainstorming, providing insights, and simplifying the identification of assets, vulnerabilities, and threat agents. On the other hand, STRIDEGPT provides a more structured and targeted approach by applying the STRIDE framework directly to your system’s architecture, offering more focused threat identification and mitigation suggestions. However, it's essential to incorporate these tools within established methodologies like YACRAF to ensure consistency and completeness. While AI tools are incredibly useful, human expertise remains crucial in refining their outputs and ensuring the final threat model addresses all relevant risks.