It is often argued that a single data-point cannot be considered to have value but that is a naive view. There are several reasons why that is so.
The first is that data points are almost never without context. So there is almost never a genuinely single data point.
The second is that even a single data point is – and this is going to sound so obvious as to be stupid – a data point. It is the piece of information which piques interest and the starting point for thought.
Also, all data is valuable data. It might have been regarded as insignificant or, even, irrelevant but that doesn’t mean that it will always be so.
Some people consider data to be only numerical but, in the world of financial crime, that’s flawed reasoning. Yes, if one is trying to calculate the distance from A to B in a straight line, the data is numerical – but only partly so. From A to B is 156. That’s data but what does it tell us?
a) someone wants to know, or at some time wanted to know and the answer was recorded, that the distance between two points that person identified was measured and an answer produced.
b) we do not know who produced the answer
c) we know that the answer is 156 but that’s almost as useless as saying “42” without additional explanation.
The reason is that we don’t know what unit of measurement is being applied.
But we could use interpretative data: we might know if one or both were scientists or in most of the world’s countries that the unit would be a metric unit of measurement – metres or, either, a multiple or fraction of a metre. So it could be, for example, 156mm or 156km or any of many other options. Or if not, it could be imperial measurements – feet, inches, yard, miles and so on. So, without additional contextual data, the number 156 has so many possible meanings as to have no meaning. But it still has value, especially if A and B are identified because, then, we have context.
So, the distance from London to Paris is… which London, which Paris? Again, we need context or definition – neither of which directly relate to the data point of 156 but which give us an explanation.
In investigations, data tells a story. If we read a novel, a thriller, we expect a certain level of explanation. Thriller writers try to obfuscate: a who-dunnit has no surprise if the who and even the how and the why are revealed in the first few paragraphs. And so, we find the master of suspense – Thomas Hardy – using the first pages of Return of the Native to describe the activities of people on the heath and of the heath itself, with its road showing in the moonlight like a parting on a head of dark hair as people carry their faggots on their backs.
A single data point is open to wide interpretation and that interpretation depends on both the available background knowledge and on the prejudices of the person looking at the data.
It is the addition of other data points that starts to build the story, piece by piece until it becomes clear.
In these coronavirus-ridden days, there is an extraordinary amount of data but much of it is not fixed. More importantly, much of it is not accurate. It is important to segregate data from opinion. Opinion is a pointer, it is information, but it is not data. Except that the identity of the person who opines is data.
This is data: a spammer using the domain “leatshalt.icu” and the user name of “Face Mask” has issued a message headed “Across the World Face Masks Are Becoming Mandatory.” Is that a fact? No. Is it data? Yes. It is data because we know it’s not a fact and therefore it is an indicator of falsehood.
Secondly, the “.icu” top level domain is not in widespread use: clearly, like the .org TLD that is supposed to be restricted to, inter alia, non-profit bodies but is not, the .icu domain can be ordered by anyone. Conclusion – it cannot be assumed that the domain is being used by anyone in a hospital with an intensive care unit.
That leads to a conclusion, an opinion, that the user of this domain has chosen it because of its association with medical care, that many coronavirus patients are admitted to the Intensive Care Unit of a hospital and that victims of the spam will be, at least, subliminally encouraged to a positive view of its contents.
A search for leatshalf returned many entries showing a typo – “leats half” in an article, widely replicated without anyone paying attention. It’s talking about “at least half,” by the way. But you wouldn’t know that without context.
The ICANN data search for that domain produced more data – it was registered by one of the domain registration services that we most often see in domains used by criminals – NameCheap. It was registered using an anonymising service on 11 May 2020 – the day the spam was issued. And – as I’ve been banging on for years – it’s DNS is at Cloudflare.inc, a company which steadfastly refuses to put in place measures to detect and deter the use of its service by criminals, which tells us to bog off when we send them information about criminal activity and about which the FBI’s local field office ignores us when we refer suspicious cyber activity to them. The address is, as so many are these days, shown simply as “Panama.”
So, from that single “icu” datapoint, we have a story. We have suspicion. Yes, that could be assumed from the fact that there was a spam – and safety says it should be. But it’s not enough to start action. However, using context to that and other data points, the end result is a group of dots that can be joined up.
But that’s not all. There’s one more thing. on 2020-01-30, also using Namecheap and the same anonymising service as well as showing the country of registration as Panama, and also with the DNS at Cloudflare, there was another registration – leatshalf.rest.
That’s pretty conclusive: now there’s a pattern and patterns are what those who say a single data point doesn’t provide say they need.
And yet, it all started because of a spam with the letters “icu” where they shouldn’t be.
Paperbacks by Nigel Morris-Cotterill
“Cleaning up the ‘Net” and “Understanding Suspicion.”
Click on “books” above.