AI at CyCognito

Since our beginning, CyCognito has integrated AI techniques to power robust and accurate attack surface discovery and testing. And we’re also adding new capabilities all the time. In this post, we’ll share with you some of the tasks that CyCognito’s AI helps power or simplify, along with a primer of the various techniques we leverage.

What You Can Do with CyCognito’s AI

Can Humans Keep Up with Discovery?

Assuming a human could find all of the information the CyCognito platform does (up to fifty data points per asset, including IP address, name, type, owner, location, included technologies, related assets, known vulnerabilities, and threats), we estimate it would take at least one hour per asset on average for initial discovery and 30 minutes per asset to keep all of the asset metadata up to date.

So, one hour per asset, multiplied by a midsized attack surface of 5,000 assets, that’s 5,000 hours, or over two work-years just for initial discovery. Spending 30 minutes per asset once per year to keep it up to date would add another 2,500 hours – over one additional work-year. On top of that, company attack surfaces fluctuate by +/- 10 percent monthly, adding even more hours of initial discovery per year for newly added assets. Lastly, the average attack surface size for an enterprise is 50,000 assets, not 5,000. That’s 30 work-years.

Discover All of Your Assets

CyCognito’s asset discovery process starts with a single data point – the name of your organization – and then uses AI to crawl and understand financial databases, news sites and hundreds of websites. The result is a complete view of all of the brands, subsidiaries, joint ventures and other entities related to your organization. Once the organizational structure is understood, our platform populates the model with web applications, services, IP ranges, APIs and more, giving you the full picture of your attack surface.

Get Full Asset Context

CyCognito builds context around all of the assets it discovers. Our AI maps each asset to a business unit or brand, and adds details like the type of asset, underlying technology, related applications, and if there is potentially exposed customer data. The riskiness and attractiveness to attackers is also automatically added. Figure 1 below shows how this is displayed in the UI of the CyCognito platform.

Figure 1. Acme Corporation, its subsidiaries and their assets after being mapped by AI.

Classify Assets by Type or Risk

Assets can be easily organized by type or risk. Our AI classifies assets by examining their API call responses and types of data they store. All of this information is added to the graph.

Ask Any Question

Want to see all of your ecommerce servers not protected by a WAF? Just ask: “Show me all ecommerce servers not protected by a WAF.” Our AI language processing allows you to ask almost any question about what you own and if it’s at risk without using a specialized query language.

Automatically Configure Tests

No more time consuming test configuration. Our AI understands the type of assets you own and automatically configures the test engine to run the proper test types with the right payloads. These varying tests run across thousands of assets and take into account business context, technology type, potential threats, issue types and applicable compliance regime.

AI Techniques CyCognito Uses

CyCognito uses a number of AI techniques and technologies to create high-precision discovery and testing. Let’s look at some of them and how they are applied in the platform.

Bayesian Machine Learning

Bayesian Machine Learning (BML) combines statistical inference with machine learning. BML allows learning to evolve with the accumulation of evidence. While traditional machine learning excels with abundant data, BML shines when the data is sparse, yet the model is complex, as is the case with attack surfaces. CyCognito uses BML to structure graph data models and test hypotheses around asset ownership and type.

Generative Pre-trained Transformers

Generative Pre-trained Transformers (GPT) are large-scale machine learning models trained to generate text based on the input received. These models generate new information from data on which they were pre-trained and transform relationships into new content. CyCognito uses GPT to summarize the relationship between organizations or entities and to create answers to user search queries.

Generative AI

Generative AI (GenAI) is a branch of artificial intelligence focused on creating new content, whether it’s text, images, or other media. It employs generative models that learn from existing data to produce novel outputs that mirror the characteristics of the training data. Popular models include Bard, Claude and GPT. CyCognito uses GPT-3.5/4 to summarize the relationship between organizations or entities, and to create answers to user search queries.

Graph-based AI

Graph-based AI (GraphAI) is the practice of using graph data models in conjunction with AI. Graph data models show how data is interlinked, expressing relationships within the data that cannot be communicated using tabular models. Graphs are an excellent tool for inferring relationships and enhancing artificial intelligence by presenting context. CyCognito uses GraphAI to represent assets and their relationships to organizations, people, threats and other technologies.

The Attack Surface	Connections on the Graph	Hypotheses		Exploring the Attack Surface
		Creating Hypotheses	Fine Tuning Hypotheses
The attack surface is represented as a graph data model including business entities, IT entities, and cybersecurity entities.	Connections (edges in graph lingo) among entities are given confidence scores. They are treated as hypotheses rather than facts.A customer can tune its attack surface confidence threshold.	Hypotheses are generated using various ML techniques, revolving mostly around NLP over multiple data sources.	Hypotheses are fine tuned using both a singular approach to validate the connection (mostly using LLMs) and a graph neighborhood approach where we aim to “balance” the graph (using Bayesian ML over the graph data model).	LLMs and GenAI are used to help the customer to explore and analyze the attack surface in the most natural way

Table 1. The attack surface is represented as a graph model. Connections are created as hypotheses and then confirmed when a confidence threshold is reached. The graph can then be explored.

Large Language Models

Large Language Models (LLMs) are advanced machine learning models trained on vast amounts of textual data, enabling them to understand and generate human-like text across diverse topics and styles. While many machine learning models are designed for tasks ranging from image recognition to game playing, LLMs are specialized for text. This specialization allows them to tackle language-based tasks with a depth and breadth unmatched by smaller models. CyCognito uses LLMs for several tasks, including asset discovery, ownership attribution, data enrichment, and search.

Natural Language Processing

Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on enabling computers to understand, interpret, generate, and respond to human language. Most of us use NLP when digital assistants like Alexa and Siri interpret our voice commands. There are several other subfields, however. Two that are especially useful in cybersecurity are information retrieval, which extracts relevant information from a larger dataset, and named entity recognition, which identifies particular entities such as names, organizations, and locations within a text. CyCognito uses NLP to understand organizational structure by extracting information from business databases and websites, and then matching entities mentioned within. NLP has a significant performance advantage over LLMs – milliseconds vs. seconds – and is a better choice for large scale applications like attack surface discovery.

The development team at CyCognito continues to experiment with new applications of all the techniques about. We’ll have more to talk about on LLMs and GenAI in the near future.

Artificial Intelligence (AI)DiscoveryRisk ContextualizationRisk ManagementRisk PrioritizationSearch Capabilities

About the Author

Tim Matthews

Chief Marketing Officer

Tim Matthews, Chief Marketing Officer, has been in and around cybersecurity for over twenty years, from encryption and digital certificates to modern analytics and cloud security.

CyCognito Blog

AI at CyCognito

By Tim Matthews

Chief Marketing Officer May 1, 2024

What You Can Do with CyCognito’s AI

Can Humans Keep Up with Discovery?

Discover All of Your Assets

Get Full Asset Context

Classify Assets by Type or Risk

Ask Any Question

Automatically Configure Tests

AI Techniques CyCognito Uses

Bayesian Machine Learning

Generative Pre-trained Transformers

Generative AI

Graph-based AI

Large Language Models

Natural Language Processing

About the Author

Tim Matthews

Chief Marketing Officer

Topics

Search the Blog

Featured Posts

By Emma ZaballosNovember 11, 2024

By Jason PappalexisOctober 14, 2024

By Jason PappalexisSeptember 30, 2024

By Emma ZaballosSeptember 23, 2024

By Graham RanceJune 3, 2024

Top Tags

CyCognito Research Report

State of External Exposure Management, Summer 2024 Edition

O'Reilly Report

Moving from Vulnerability Management to Exposure Management

Request a Free Scan

See Exactly What Attackers See

Related Guides

Learning Center Attack Surface

Learning Center Exposure Management

Learning Center Vulnerability Assessment

Related Products

Platform > Products

Platform > Products

Platform > Products

© Copyright 2025 CyCognito

Chief Marketing Officer
May 1, 2024

By Emma Zaballos
November 11, 2024

By Jason Pappalexis
October 14, 2024

By Jason Pappalexis
September 30, 2024

By Emma Zaballos
September 23, 2024

By Graham Rance
June 3, 2024