By John Stockton and Neil Wiley

Open source intelligence addresses global threats.

As the intelligence community struggles to upgrade its capabilities for the modern information environment, adversaries continue to press for advantage across the competition spectrum. Through a complex, increasingly globalized web, both state adversaries and nonstate actors are employing a host of proxy entities to exert foreign malign influence. A lack of proper intelligence and the technology to support it makes it difficult to know who to trust, who holds control over key partners and critical resources, and who is ultimately behind exploitative activities across the world.

Open source data represents a severely untapped potential for addressing this challenge. The Open Source Intelligence (OSINT) mission sounds easy compared to classified endeavors. In reality, the effort is difficult for the intelligence community (IC) due to the sheer scale of data collection required, the technical challenges of complex data fusion, the sophisticated means by which malign entities shield their identities and the risk of intentionally manipulated data. These challenges are compounded by an outdated OSINT culture within the IC, as well as general failures to adapt to the modern digital world or to leverage the full potential of artificial intelligence (AI). As a result, malign activities and actors can “hide in plain sight,” escaping detection by living within the noise of too much unprocessed, unfiltered and unlinked data.

A renewed OSINT framework that embraces an aggressive shift toward automation could illuminate the activities of competitors through the construction of a joint common knowledge graph. Fusing a large number of data sets of entities, including people, organizations and the relationships that connect them into one graph database, increases the odds of discovering essential missing links between entities—and threats—that bespoke analyses would almost always miss. Analysts using graph technology compliant with intelligence tradecraft would increase the scale and speed of threat assessments by orders of magnitude, empowering continuous screening for unanticipated risks and prioritizing only those that require precious human attention.

The risks of foreign malign influence go well beyond the great power competition implications of China’s Belt and Road Initiative or Russian private military companies. Failure to detect and understand this activity would prevent effective management of a range of security-critical challenges, including: vetting potential global partners and supply chain vendors; taking action against weapons and disruptive technology proliferation and illicit trafficking networks (supporting rogue states, terrorist entities and transnational criminal organizations); identifying the root sources behind disinformation campaigns; and preventing the pilfering of resources and “state capture” of developing nations through foreign influence and coercion.

Pursuit of these objectives is challenged by several concurrent trends. The scale and complexity of crucial data sets are evolving faster than government timelines, with much of it being unstructured and in foreign languages. This is compounded by an increase in real-world relationship complexity and interconnectivity, as criminal networks are globalizing just as fast as legitimate economic activity. Layers of obfuscation to launder money and influence are increasing in sophistication from shell companies to shelf companies to offshore accounts. Finally, the threats themselves are changing rapidly, and corresponding models need to be updated regularly to keep pace.

The silver lining of these trends is that to play in the malign influence markets, interested parties must expose their activities in various open source data sets to operate. When viewed as a whole, signals of relationships (from corporate registrations to news articles) can reveal the bigger picture of influence in the networks at play. Open markets make malign activity easier, but they also make it easier to detect.

The intelligence community increasingly recognizes the necessary role of shareable OSINT. The ownership of OSINT in the IC is under review, and the primary recommendation from the House Permanent Select Committee on Intelligence regarding combating the economic threat posed by China reads:

“First, the U.S. intelligence community must enhance its ability to collect and integrate open-source material into its China analysis. … We need to be confident that our intelligence community can simultaneously comb through and identify what among this vast trove of information has actual intelligence value … While an enhanced focus on open source intelligence is of particular relevance to the China mission, the insights have broader applicability.”

Fortunately, all these malign activities are analytically similar. Characterizing them relies on understanding the complex interrelationships between entities and identifying their actions, and can, therefore, be approached and exposed through a common construct. The key to powering this transformation is a comprehensive data model that simultaneously enables data fusion and threat assessment in a single technology platform. This model should take the form of a joint common knowledge graph, where AI is used for automatable tasks, freeing up analysts for augmented decision making.

A knowledge graph is a database that integrates data into a network structure, including the key elements of nodes (persons, organizations, events and other entities) and edges (formal and informal relationships between entities). Knowledge graphs have a rich history in industry with open projects that were eventually adapted into consumer-focused engines such as Google’s Knowledge Graph. The same technology has the potential to be used at scale in intelligence applications, targeting geopolitical actors and terrorist groups instead of celebrities and consumer products.

Of course, knowledge graphs are not new to intelligence analysts who are historically familiar with “link analysis” tools. Even non-analysts are familiar with fictional investigators connecting criminals on a wall map with push pins and string. Unfortunately, many existing government projects and tools are not so different from those walls of string. They are highly manual and require a user to “bring your own data,” placing the truly difficult work back on the user to laboriously acquire, parse, load, link and label that data without any reusability.

What is required is a widespread, unclassified, entity-centric graph platform, usable across the U.S. government and by allies, that could mitigate this nonscalable pattern. Any effort to fill this gap should recognize several principles.

It must be identity intelligent. A graph-based identity intelligence system must have people and organizations as primary nodes and their formal and informal connections as the primary edges or relationship types, such as ownership and partnership associations. Creating an automated graph with data fusion requires state-of-the-art natural language processing, entity-resolution and entity-linking using “name science” to confidently link data sets and profiles representing real-world entities. Once integrated, the graph would enable forest-for-the-trees analyses that identify “relationship risk,” pathways and clusters, even when no direct risks exist for an entity.

It needs automated risk assessment. The open source intelligence community traditionally serves as collectors that relay raw data elsewhere, sending downstream the responsibility of signal extraction onto the recipient.

With the modern data deluge, it is no longer acceptable to decouple collection from sense-making. Data must be prioritized “up the funnel” through algorithms that sort results by risk, and only then use the domain training of analysts to interpret the algorithmic suggestions and avoid the traditional bias of hypothesis-first tradecraft. A common risk ontology should be comprehensive, covering criminal activities, adversarial influence, financial health and environmental social governance risk. These risks must be considered collectively and not broken into agency silos, because risks are correlated. For instance, traffickers of all kinds, from wildlife to weapons, are known to use overlapping networks.

It needs technically centralized OSINT. The proposed system should integrate a vast number of open source data sets as a one-stop shop. The data sources, both structured and unstructured, should be unclassified to maximize sharing, with a focus on the long tail of international, foreign language news sources. These sources should include both publicly and commercially available information and a system that manages data set prioritization and procurement.

It must be feedback enabled. The system should be designed around the strengths of subject matter experts as users, but it should also enable them as “trainers” who help guide and course-correct machine learning models.

It has to be shareable. A properly centralized system for OSINT management should enable cross-agency sharing by following a modern service-oriented architecture, employing multiple ways to distribute results from a common graph resource across many government platforms, including data exports, reports and APIs. The system should also enable contributions from nontraditional partners, including NGOs, journalists, allies and private financial institutions.

And it must be responsible. Any intelligence platform cannot betray American values while protecting American interests. Responsible AI principles have been laid out by Office of the Director of National Intelligence and U.S. Defense Department, and the Defense Innovation Unit is working to operationalize them. Products must establish trust through transparency and support principles of augmented intelligence by designing to specific user-centric workflows, as described in the AIM initiative. In addition, the outputs from this platform must conform to intelligence tradecraft standards and be compatible with intelligence workflows.

Developing a common, graph-based architecture can facilitate achieving the primary mission to make foreign malign influence more visible. This technology would help clear the fog of noise that currently prevents successful intervention strategies in the modern geopolitical, military and economic conflicts.

OSINT is the natural starting point on this journey, and it also represents a test. If the IC cannot integrate unclassified data effectively across agencies, then there should be even more skepticism for its ability to collaborate with classified data.

An aggressive vision to increase OSINT automation would bring agencies and allies together operationally, not only with a shared mission but with a shared technology base, offering a chance to compete with global adversaries with eyes wide open.

John Stockton, Ph.D., is a physicist and co-founder of Quantifind, an AI-driven software company developing entity risk screening technology.

Neil Wiley is former principal executive in the Office of the Director of National Intelligence and a former chair of the National Intelligence Council.

The views expressed are the authors’ and do not imply the endorsement of the Office of the Director of National Intelligence or any other U.S. government agency.

Originally published: