Demo of the CyCognito Platform

See the CyCognito platform in action to understand how it can help you identify, prioritize and eliminate your most critical risks. 

State of External Exposure Management, Summer 2024 Edition

Download the report now to stay ahead of emerging threats and strengthen your organization’s security posture for 2024. 

The Total Economic Impact™ of The CyCognito Platform

Read The Total Economic Impact™ of The CyCognito Platform — a Forrester study. Cost Savings And Business Benefits Enabled By The CyCognito Platform. 

 
Products

AI at CyCognito

Tim-Matthews
By Tim Matthews
Chief Marketing Officer
May 1, 2024

Since our beginning, CyCognito has integrated AI techniques to power robust and accurate attack surface discovery and testing. And we’re also adding new capabilities all the time. In this post, we’ll share with you some of the tasks that CyCognito’s AI helps power or simplify, along with a primer of the various techniques we leverage. 

What You Can Do with CyCognito’s AI

Can Humans Keep Up with Discovery?

Assuming a human could find all of the information the CyCognito platform does (up to fifty data points per asset, including IP address, name, type, owner, location, included technologies, related assets, known vulnerabilities, and threats), we estimate it would take at least one hour per asset on average for initial discovery and 30 minutes per asset to keep all of the asset metadata up to date.

So, one hour per asset, multiplied by a midsized attack surface of 5,000 assets, that’s 5,000 hours, or over two work-years just for initial discovery. Spending 30 minutes per asset once per year to keep it up to date would add another 2,500 hours – over one additional work-year. On top of that, company attack surfaces fluctuate by +/- 10 percent monthly, adding even more hours of initial discovery per year for newly added assets. Lastly, the average attack surface size for an enterprise is 50,000 assets, not 5,000. That’s 30 work-years.

Discover All of Your Assets

CyCognito’s asset discovery process starts with a single data point – the name of your organization – and then uses AI to crawl and understand financial databases, news sites and hundreds of websites. The result is a complete view of all of the brands, subsidiaries, joint ventures and other entities related to your organization. Once the organizational structure is understood, our platform populates the model with web applications, services, IP ranges, APIs and more, giving you the full picture of your attack surface.

Get Full Asset Context

CyCognito builds context around all of the assets it discovers. Our AI maps each asset to a business unit or brand, and adds details like the type of asset, underlying technology, related applications, and if there is potentially exposed customer data. The riskiness and attractiveness to attackers is also automatically added. Figure 1 below shows how this is displayed in the UI of the CyCognito platform.

Figure 1. Acme Corporation, its subsidiaries and their assets after being mapped by AI.

Classify Assets by Type or Risk

Assets can be easily organized by type or risk. Our AI classifies assets by examining their API call responses and types of data they store. All of this information is added to the graph.

Ask Any Question

Want to see all of your ecommerce servers not protected by a WAF? Just ask: “Show me all ecommerce servers not protected by a WAF.” Our AI language processing allows you to ask almost any question about what you own and if it’s at risk without using a specialized query language.

Automatically Configure Tests

No more time consuming test configuration. Our AI understands the type of assets you own and automatically configures the test engine to run the proper test types with the right payloads. These varying tests run across thousands of assets and take into account business context, technology type, potential threats, issue types and applicable compliance regime.

AI Techniques CyCognito Uses

CyCognito uses a number of AI techniques and technologies to create high-precision discovery and testing. Let’s look at some of them and how they are applied in the platform.

Bayesian Machine Learning

Bayesian Machine Learning (BML) combines statistical inference with machine learning. BML allows learning to evolve with the accumulation of evidence. While traditional machine learning excels with abundant data, BML shines when the data is sparse, yet the model is complex, as is the case with attack surfaces. CyCognito uses BML to structure graph data models and test hypotheses around asset ownership and type. 

Generative Pre-trained Transformers

Generative Pre-trained Transformers (GPT) are large-scale machine learning models trained to generate text based on the input received. These models generate new information from data on which they were pre-trained and transform relationships into new content. CyCognito uses GPT to summarize the relationship between organizations or entities and to create answers to user search queries.

Generative AI

Generative AI (GenAI) is a branch of artificial intelligence focused on creating new content, whether it’s text, images, or other media. It employs generative models that learn from existing data to produce novel outputs that mirror the characteristics of the training data. Popular models include Bard, Claude and GPT. CyCognito uses GPT-3.5/4 to summarize the relationship between organizations or entities, and to create answers to user search queries.

Graph-based AI

Graph-based AI (GraphAI) is the practice of using graph data models in conjunction with AI. Graph data models show how data is interlinked, expressing relationships within the data that cannot be communicated using tabular models. Graphs are an excellent tool for inferring relationships and enhancing artificial intelligence by presenting context. CyCognito uses GraphAI to represent assets and their relationships to organizations, people, threats and other technologies.

The Attack SurfaceConnections on the GraphHypothesesExploring the Attack Surface
Creating HypothesesFine Tuning Hypotheses
The attack surface is represented as a graph data model including business entities, IT entities, and cybersecurity entities.Connections (edges in graph lingo) among entities are given confidence scores. They are treated as hypotheses rather than facts.A customer can tune its attack surface confidence threshold.Hypotheses are generated using various ML techniques, revolving mostly around NLP over multiple data sources.Hypotheses are fine tuned using both a singular approach to validate the connection (mostly using LLMs) and a graph neighborhood approach where we aim to “balance” the graph (using Bayesian ML over the graph data model).LLMs and GenAI are used to help the customer to explore and analyze the attack surface in the most natural way

Table 1. The attack surface is represented as a graph model. Connections are created as hypotheses and then confirmed when a confidence threshold is reached. The graph can then be explored.

Large Language Models

Large Language Models (LLMs) are advanced machine learning models trained on vast amounts of textual data, enabling them to understand and generate human-like text across diverse topics and styles. While many machine learning models are designed for tasks ranging from image recognition to game playing, LLMs are specialized for text. This specialization allows them to tackle language-based tasks with a depth and breadth unmatched by smaller models. CyCognito uses LLMs for several tasks, including asset discovery, ownership attribution, data enrichment, and search.

Natural Language Processing

Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on enabling computers to understand, interpret, generate, and respond to human language. Most of us use NLP when digital assistants like Alexa and Siri interpret our voice commands. There are several other subfields, however. Two that are especially useful in cybersecurity are information retrieval, which extracts relevant information from a larger dataset, and named entity recognition, which identifies particular entities such as names, organizations, and locations within a text. CyCognito uses NLP to understand organizational structure by extracting information from business databases and websites, and then matching entities mentioned within. NLP has a significant performance advantage over LLMs – milliseconds vs. seconds – and is a better choice for large scale applications like attack surface discovery.

The development team at CyCognito continues to experiment with new applications of all the techniques about. We’ll have more to talk about on LLMs and GenAI in the near future.


Topics



Search the Blog



Recent Posts




Tim Matthews
How to Budget for EASM
By Tim Matthews
November 18, 2024




Top Tags



CyCognito Research Report

State of External Exposure Management, Summer 2024 Edition

State of External Exposure Management, Summer 2024 Edition

Download the report now to stay ahead of emerging threats and strengthen your organization’s security posture for 2024.

O'Reilly Report

Moving from Vulnerability Management to Exposure Management

Moving from Vulnerability Management to Exposure Management

Download the report to learn about the historical trends behind the emergence of exposure management, how to develop a strategic plan and assemble a team to smoothly transition frameworks, and example tech stacks to consider for your organization.

Request a Free Scan

See Exactly What Attackers See

Get a Free Scan of Your Attack Surface

Get a free scan of your attack surface and gain valuable insight into your organization's risk posture by allowing CyCognito to discover, contextualize, and test externally exposed assets on a portion of your parent company or a single subsidiary.