Nigel's Eyes

20220523 Using actual intelligence to inform artificial intelligence in financial crime risk and compliance

This post started as a comment in someone else's thread in social media and then grew into a post of my own and then, because space limitations undermined how the piece could be presented, I decided to produce a far more comprehensive version as a Blog . So here it is, edited and expanded.

"If a RegTech company would like to use my brain to develop something properly clever, not just a flow-chart of basic algorithms and capacious storage of structured data, I'd be very happy to rent it to them :)"

I wrote that in a recent thread about pathways within the brain and the use of various types of input, with particular reference to audio which led me to language processing (which, so far, no one has got right in typed text much less audio) and, from that to the use of language, mathematics and philosophy.

(I know... you are now thinking "he's a money laundering guy, what has any of that got to do with him or, even, me?" Well, if you don't know that, it's because you've not read my books or taken my training.)

I'll give you an example of why I look at so many aspects of what I do and why I can add value to the concepts underlying RegTech, to take it far beyond that which is currently in the market.

There are many aspects to using algorithms (it's really not artificial intelligence) to identify e.g. patterns of speech (now being dramatically undermined by the rapid growth of imprecision in language), the globalisation of words and phrases including those which are incorrect in some variants or a language but acceptable in others.

We know that, for some years, speech pattern analysis has proved a useful tool in, for example, insurance fraud.

I am not, here, talking about e.g. bad English nor am I talking about accents, dialects or, even, the baby-like failure to properly pronounce "r" or "th".

The world seems to be focussed on moving away from written communication to call centres and the analysis of language becomes ever more important - and ever more liable to failure.

False assumptions in analysis of text

If we look at the typed word, we find that language analysis makes fundamental errors based on the most basic of false assumptions. I'll compare English and American but other varients are available and each has their own idiosyncrasies.

For example, in English a "faggot" is a meatball or a bundle of sticks (see Thomas Hardy, for example). In English, a "fag" is a cigarette or a junior boy performing menial tasks for a senior in a public school; in American, a public school is, well, it's not exactly Eton is it?

In English, a "fanny" is a vagina. A bum is a bottom and that's what you sit on. So. a "fanny pack" and a bum bag" are the same thing if one is bilingual but entirely different if not - and American AI such as Google is monolingual and they are supposed to be the world leaders.

An arse is not an ass because an ass is a donkey and it was a donkey that Jesus rode on. Yet, American speech analysis marks all of those English words as offensive.

The rapid change in the use of the word "impact" is particularly difficult: impact is a strong word. It means a violent collision. This strength has been militated against for a long time by the use of the expression "impacted upon." Not long ago, some people dropped "upon" which reduced the sense but did not obliterate it. But recently "impacted" has become a word in common use that reduces the sense not of itself but, by replacing them, removes the clarity of two entirely different words. This is because "impact" is used to replace "effect" and "affect." So in total, three words have been corrupted, diminished or lost.

Also problematic is the trend to the abolition of apostrophes. Even the British government has recently fallen prey to this particular nonsense, for nonsense is what it generates. In passing the very welcome legislation relating to Down's Syndrome (itself a replacement name for "mongolism") the Act is called the "Down Syndrome Act." In US medical literature, we see "Tourette Syndrome" instead of "Tourette's." It's following on from a trend in US fiction that, so far as I can tell, started in the 1970s but, as globalisation of American literature and TV, and the reporters that write about it, has crossed the globe, has received a powerful boost in the past few years. So now we see "doll house," "butcher knife" and the like. Worse, the Oxford English Dictionary (where you will find -ize spellings and the Oxford Comma) approves it, as it does so much American.

The point about "mongolism" is important because every so often someone decides that a term in common use is demeaning and becomes vocal about it. So we have "arrested development" instead of "retarded." Same thing, different names. How long before "arrested development" is considered demeaning and has to be replaced? We have lost the use of the technically defined terms "idiot", "moron" and others which were descriptions of degrees of lack of discernable intelligence.

We know that intelligence tests are misleading because they concentrate on particular forms of intelligence. We know that autism is not a failure of intelligence: it's simply that the brain is "wired" differently to Mr Average. But we also know that autism is problematic, partly because the symptoms can be very mild or very pronounced and at the pronounced end of the scale, can result in a wide range of problems for both the autisitic person and those around him.

Yet there is a single word that, outside the specialists who deal with it. Should we, when looking at data, be happy with portmanteau words or should we be looking for very fine-grained information and, if we do, how far do we go? Equally importantly, how far can we justify going?

Imprecision in Language

I have a 1938 Webster's Dictionary. In it it correctly defines "billion" as 1 million million because "bi" means a million to the power 2 and trillion et. seq. follow on. But in the 1970s someone (I really don't know who but it seems to have been in America) decided that a billion should be 1,000 million i.e. 1,000th of its true amount. That definition was adopted by the International Standards Organisation (one of a number of mistakes it has made which cause confusion not standardisation) but even today it is not universally accepted. It is common to hear "how many zeros is that" in many countries - and even worse, while the American definition is used in English, in many other languages, a billion still means a billion - and a thousand million is known as a "milliard" or a linguistic variation of it. Until recently, even the US Department of Defense thought it prudent to explain what it meant when it referred to "billion."

All of this is simply vocabulary. With a little bit of grammar thrown in. But what about other areas of incomprehension?

A confusion of acronyms

Let's look at two aspects of acronyms.

The first is multiplicity of uses: let's take one example.

SARS: the South African Revenue Service. Oh, wasn't that what you first thought of?
FSA: Financial Services Act (UK, the thing that created the "big bang"), Food Standards Agency (UK), Farm Security Agency (US), Flexible Spending Account, Financial Services Authority (UK - superseded, Labuan, others), Financial Services Agency (Japan), Financial Separation Allowance (USA), Free Syrian Army, Formal Safety Assessment (USA), Forward Sortation Area (US Military - and why it's not "sorting" I have no idea, other than that it looks like one of those words made up because someone thought they were being clever or couldn't think of the right word), full speed ahead, fire support area (US military about weaponry not conflagrations). The UK government has published a list of acronyms in the US military. This is what it lists for FSA:

FSA Field Support office
FSA Finite State Automata
FSA Future Security Architecture
FSA Future Surface-to-Air
FSA/CAS Fuel Savings Advisory and Cockpit Avionics System

To give you an idea of how serious this issue has become, those entries are on page 163 of the list. There are 373 pages. Single line spacing. And if you think that you'd get to the end and fall asleep, if you grab some zz s, it means zig-zag. There's a link to the document in "further reading" below.

The list emphasises another problem: the use of acronyms as words.

MIDAS Maintenance Information and Defect Analysis System
MIDAS Make Ideas Develop Assets Successfully (Suggestion scheme)
MIDAS Management Information, Data and Accounting System
MIDAS Marine Inclination Differential Alignment System
MIDAS Maritime Intelligence Dissemination & Analysis System (RAN Command
SysteM)
MIDAS Mine and Ice Detection/Avoidance System
MIDAS Missile Defence Alarm System
MIDAS Mobile Integrated Digital Automatic System
MIDAS Motorway Island Detection and Automatic Signalling System
MIDAS Maritime Integrated Defensive Aids Suite

Perhaps, you might say, you don't come across MIDAS often in your daily activities. How about

FIST Facility for Infantry Situation Training
FIST Fire Support Team
FIST Fused Interface SimulaTor (displays)
FIST Future, Infantry, Soldier, Technology

Now imagine that every time you see that it's written "Fist". Actually, there's no need to imagine, there is a growing trend, led by The Guardian, a UK media group, which says that to convert acronyms into words by removing the capital letters is their "house style." That's fine if the purpose of their house style is to reduce comprehension or to engender confusion or uncertainty.

Why do we care?

Why does this matter? And why should we in financial crime risk and compliance care? And why should RegTech and its supporting services be paying attention to it?

It's for the simple reason that it moves the burden of comprehension from the speaker or writer to the listener or reader. In short, from the customer to the financial institution.

We deal in certainty as the basis of the decisions we make.

On a larger scale, we could say that undermining language undermines society and we could examine the way in which extremists contort, modify or exclude words that give a sense of continuity: while many say that truth is the first casualty of revolution, the real first casualty is language because without language, no truth can be communicated and nor can the lies of revolution. It's important to bear that in mind even though it's not the central purpose of this blog.

And yet, even this, and its complex parent, linquistics, is only a fraction of the reason that a new, radical, approach to RegTech - and risk assessment - is needed if it is every to be properly effective.

Tomorrow belongs to....

RegTech is focussed on data matching and the pretence that that's about suspicion.

That's so yesterday.

Tomorrow belongs to finding ways to make computers "think" so that they can actually identify suspicion.

And that's why a serious RegTech company should be renting my brain. I store, access, match vast amounts of data and use it to create patterns and extrapolations. I don't necessarily identify with pinpoint accuracy what we are looking for (sometimes I do) but I do identify a mass which can be examined and explained.

If it can be examined and explained, it can reverse engineered and the process replicated.

"AI" prides itself on being a science. It's time to apply art.

In 1996, in How Not To Be A Money Launderer, in which the concept of financial crime risk was introduced (without a name and, especially, without an acronym) I wrote that compliance systems would need audit but that audit should be done by lawyers not accountants because understanding financial crime is an art requiring flexibility of thought.

It's all about the data. Really?

Today, in 2022, systems audits are about box ticking, not about qualititave assessment. And that's the direction that RegTech is taking. We hear those with a high-profile in the industry saying "it's all about the data."

Well, yes, on a superficial level it is. But when we look behind the data we find that there is an enormous amount of interpretetation that simple rules-based analysis is not equipped to handle.

That's the next frontier. not simply gathering more data and trying to build machines that learn from their mistakes in that simple form of analysis.

We need machines that gather and retain everything that passes by, from the colour of someone's eyes to the rate that their hair goes grey and thins, which knows how and why that is relevant. We need to identify their speech patterns and to analyse their development over time - and predictively.

These are the obvious open-doors of the next generation RegTech systems or, perhaps, the one after that or even the one after that.

But now the industry is stagnant: it's just doing the same old stuff, just bigger.

There has long been research into ethical decision making and that's a starting point but it's a long way from where we need to be.

Where do RegTech companies need to be?

Right here. Inside my head. Because, again, what you are trying to build is an electronic version of my brain. Taking it out of a pickling bottle and slicing and dicing it will tell you nothing. You need it while it's still in full-flight.

And while I'm still interested and haven't said "bollocks to this" and gone off to open a beach bar overlooking the South China Sea, selling cold beers and grilled fish for a few hours a day.

'cos, as it almost says on the sales notices, when I've gone, I've gone.

And once I've gone, I ain't comin' back.

---------------

Further Reading:

https://assets.publishing.service.gov.uk/government/uploads/system/uplo…