Analyzing the unstructured data available in news, forums, and public social media to increase investigation efficiency and better calibrate your customer risk scoring.

At Quantifind, we take pride in extracting meaningful signals from the unruly world of external, unstructured data by intelligently correlating it with the private data of our clients. While we have grown as a company doing this for major consumer brands, we are now extending our foundational technology to financial crime risk management.

Whereas the goal in our brand business is to discover significant drivers of revenue from masses of customer conversation, our mission within the financial crime vertical is focused on helping financial institutions take advantage of the extremely noisy and ever-expanding information environment to reduce their risk exposure while making the investigations more efficient.

To be specific, one core capability of Quantifind is to confidently discover links between internal subjects (individuals and organizations) and incriminating external records at scale. A bank may flag a marginally anomalous transaction, but know nothing else about the entity involved, especially if it is a non-customer. Without further information, the risk may be assessed as “low” and not progress to a government-mandated Suspicious Activity Report (SAR).

By contrast, with Quantifind, an entire extra dimension opens up to reveal, for example, that the individual was previously involved in a relevant crime ring.

Even when a direct link to news or other external data is not available, our algorithms can help recalibrate the customer risk scoring model: we do this by training models using machine learning over internal and external data to assess the likelihood that an organization or person is involved in a “high-risk” area. For example, even without explicit evidence, our models can infer that a customer company is likely a cryptocurrency organization (or a ponzi scheme or a hate group or …), because it walks, talks, and acts like one, prompting a risk escalation.

Banks are good at modeling their own internal data in silos, but are typically weak exactly where we have built our core technology.

Our first key ability is to ingest, organize, and featurize massive swaths of diverse, unstructured information (news, forums, blogs, public social media, arrest records, watch lists, …) with strict requirements on being available and up-to-date. In other words, we cast our net wide, to collect a huge number of external data sources in order to capture as many potential signals of interest as possible. The engineering challenge here is obvious: managing huge streams of external data in a reliable manner is difficult, to say the least.

Our second key ability (the flip side of the coin), is in solving the resulting science challenge: the more data that comes in, the more noise there is, and the more danger there is for false positives and other spurious output signals to overwhelm the user.

But this is where our data science team excels. We use modern machine learning techniques, validated over truth data, to ensure both that the connections we find are legitimate and that the risk-factors we discover are relevant to the case at hand.

These abilities materially improve the current investigation processes which push the burden of entity resolution onto the user (with limited “negative news” data sets), and only focus on limited scale use cases (e.g., small arrest record databases accessed that require a social security number). With more modern data and smarter algorithms, we aim to take the know-your-customer (KYC) field to the next level, and turn our customers into domain leaders, who surpass even regulator expectations.

While we cannot reveal specifics, we can outline a few examples of where we have proven valuable to financial institutions. By bringing our scientific approach to large bodies of external data, Quantifind can:

We have previously linked a subject involved in cross-border narcotics activity to separate felony activity via Department of Justice records.

We have linked a single subject to both a cryptocurrency scam as well as more classic money laundering methods by referencing online forums. We’ve also identified a “travelling con man” who goes from one country to another propagating the same religious scheme.

We have identified people who have misrepresented their occupation (like supposed truck drivers who were actually in a higher risk occupation), and front/shell organizations with alternate motives (like a marijuana dispensary representing itself to a bank as a cookie shop). Other than direct linking, we can also use models over names, locations, and even email domains, to predict categories of interest. This identity build-out process is especially valuable for non-customers who are flagged in a transaction, where banks are typically blind.

By using the external network of connections, like “co-mentions” in news, we can cluster two cases or groups of subjects that would otherwise have been treated in isolation, hence increasing the efficiency and intelligence of the case management. We have used this to find extended networks around seeds of particular crime rings, such as a rural healthcare scam network in one instance.

The biggest efficiency gain comes from when we “scour the earth” and find either the absence of any negative signal, or the presence of a legitimizing signal. This helps investigators save time by automatically fast-tracking certain cases without having to laboriously mine the internet themselves, one inefficient search at a time.

The need for these techniques is only growing more urgent. Let alone the continued explosion in the scale and complexity of external data, the dynamics of financial crime are rapidly evolving and making it nearly impossible to keep up. There is an arms race between new technologies (e.g., bitcoin, new digital payment systems, dark web), new laws (e.g., marijuana), and new trends (e.g., increasing international entanglements), that is making it a necessity to reach far beyond a financial institution’s internal data to determine risk.

All parties involved, from banks to regulators to law enforcement are under pressure to do more with less. In response, banks are incentivized to file SARs defensively and do not always file reports designed to be useful for eventual law enforcement. The number of SARs filed by financial institutions to FinCEN (the Financial Crimes Enforcement Network) is approximately 187,000 per month, and has been increasing steadily by more than a thousand per month for the last five years. Increasing the volume of SARs with sub-optimal quality, and passing the burden of sifting out signals to law enforcement cannot be the answer. At Quantifind, we believe that there is an opportunity to significantly increase the quality of SARs, by using automation to efficiently reduce the risk exposure of banks while simultaneously increasing the probability rooting out the criminal activity.

Banks and other financial institutions can no longer afford to ignore the wealth of data sitting outside their walls. If they do, their exposure to risk will continue to grow as the scale and complexity of the financial crime world continues to grow around them. These problems will only be solved by employing transparent, scalable technology that makes investigators, regulators, and law enforcement more efficient at their jobs.

If you are interested in a demonstration of our technology, please contact us.