There has been a number of story runs on the media the
past couple of days regarding the search keywords ‘migrate’ and ‘migrating’
trending HARD soon after the election. Seeing
that it involved data analytics and number crunching, which is a passion of
mine and is a good part of my day job for the past 17 years, it immediately
struck a chord.
I first saw it on Rappler. I read through
the article and, as soon as I was done, searched online to see what other agencies
ran the story and how they presented it. Using google search tools, I saw the
first run of the story is by Davinci Maru of ABS-CBN at 11:47pm on May 13, 2019,
entitled “Dismayed at initial poll results, Filipinos search for ‘migrate’
online. Then the Philippine Star soon
followed, with a more detailed account of the interview. To give credit where credit is due, of all
the story runs, only the Philippine
Star article actually provided more useable content, rather than the usual
strategy employed by most news outfits of (1) picking a specific quote; (2)
creating a hype and blowing it up; and (3) supporting it with a few twitter/FB
screenshots from real people; then (4) making it appear as the overall sentiment
of the majority of the populace.
According to ABS-CBN, the data in their report is
from a certain Isaac Reyes of DataSeer, a data science and analytics training
provider. A quick search of the name shows a profile of a “data scientist” who
is the Lead Trainer and the proprietor of DataSeer. On his profile, this is how he describes his
decade long relationship with data, “I live and breathe it.” Reading through the profile seems to strongly
indicate his expertise on the subject of data analytics.
I do not claim to be an expert on the subject, but
I have a pretty good grasp of data analysis, having done many business case
studies, scenario simulations and business process improvements projects, all involving
loads of data analytics. But after reading through the story on the “trending”
search keywords being promoted on the mainstream media (of course with their
own twists and turns), I cannot help but scratch my head and question my
understanding of numbers and analytics. It
seemed to me that some data analytics fundamentals have been flouted, purposely
or otherwise, in the development of this particular narrative. Either that, or I
may really just have gone so dumb to a point that I cannot understand a very
simple story that they are trying to tell.
You know how your head hurts when you are certain you
know something but your head cannot chuck out the information that you know? Like when you see a face of an actor and you
know his name but somehow the name just would not come out? That is exactly how my head hurt the past
couple of days. I know I understand data
and trends but I cannot seem to wrap my head around the ‘election results’-‘migration’
narrative. So, I decided to dig deeper, analyze
the “analysis” and write about it. Please
read on. I promise, there will be some
really entertaining $#!+ here, especially when you get towards the end. Here goes…
The
Beginning
This right here is exactly where it all
starts. Our data scientist’s posts on FB
relate to data from Google trends that appears to point to the idea that some
people in PH do not like the initial election results so much that they are
ready to pack up and leave. This was
picked up by news outfits who decided to run the story.
The Data
Here is the data that created the hype. Indeed, there was a “massive” spike of
interest on searches for ‘migrate’ and ‘migrating’ @8:00pm on May 13, 2019,
relative to the rest of the week.
The Narrative
Based on the spike on the Google trend data on the
keyword ‘migrate’ (showing the spike of interest coinciding with the release of
the first election results) and the FB post of the data scientist’s
proposition, the media solidified the narrative, which runs along the line of… Filipinos
are so dismayed by the preliminary election results that Filipinos have gone
straight to the internet and started researching using ‘migrate’ and
‘migration’.
It suggests that people suddenly started
researching on ways to get out of the Philippines as soon as possible. Here are a few of the headlines ran by the
mainstream media.
So What?
There is data to support the narrative, and numbers
do not lie. The analytics can be
recreated simply by going to trends.google.com yourself and you can see that
there really was a spike in interest for the said keywords. So if the numbers
are true and you can recreate the conditions of the study then this is not fake
news! So “what is wrong with the narrative?” you ask.
Let me count the ways…
1.
The Data Set
The most common mode of failure in data analysis is
the use of inappropriate data. Selecting
the sample, cleaning up, and grouping the data are always the first things you
do before doing any analytics. The
sample must be a good representation of the population. Rogue data must be eliminated from the sample
to ensure that the trends do not get skewed.
Then the data must be organized and grouped to identify trends.
The trouble with the data from Google trends is you
do not have visibility on the raw data.
Presumably, the data has already been cleaned. Granting that this is the case, the next
issue is finding the right search keyword for the study, which in this case is very
specific, i.e. “migrate” and “migrating”.
Under normal circumstances, this would be okay as an independent data
set. However, the claim is that the
election has something to do with the spike.
By choosing only ‘migrate’ and ‘migrating’ as the
search keywords, we only point to a sample population that actually searched
for these specific keywords as an independent data set, not as subset of a
bigger data that contains related subsets.
Therefore, with the current data and the study criteria, we can only see
the spike relative to the volume of search during the study period, but we
cannot make a correlation whether the search volume that drove up the trend make
up 1% or 90% of the population that was actually interested on the election. This bring me to point no. 2.
2. The Correlation
Data analysis, especially when percentage or
relative volumes are used instead of actual count, would depend very much on
correlation with other data. Had the
data been on actual count of search hits, say 2 consistent hits throughout the
week then a sudden spike driven by 1,000 search hits on a given period, then
from the 1,000 hits we can somehow deduce the relevance of the data against the
voting population. Unfortunately,
Google trends data do not show counts.
Therefore, to tell a proper story, it is vital that a proper correlation
is made. Without correlation, the analysis is meaningless. The data presented
to us through the media lacked any workable correlation.
Let us try and make some sense of the data then by trying to make some correlation. To do this, we need to first understand assumptions. In this case, the proposition is that maybe,
just maybe, the election results have something to do with the spike. We can use this then as our assumption. To use
this information in the analysis, we add another keyword to the comparison
alongside “migrate” and “migrating”. For
this purpose, I chose “Halalan 2019”, just so we are consistent with the
hashtag our data scientist used at the time of posting.
This is what we get.
As you can see, we have somehow established a meaningful comparison between the volume of people in the Philippines searching for the keywords, ‘migrate’
or ‘migrating’ and those people searching for keywords about the election, i.e.
‘halalan 2019’.
Now we can see clearly (for those with microscopic
vision) the significance of the data from the selected sample on the report
being promoted by the media (see the red and the blue dots) in relation to the
total volume of the search from the population interested with the election using
the “halalan 2019” keyword. Simply put,
the significance is “insignificant” and that there isn't really any workable relationship between the data sets to establish meaningful correlation.
This is why I cannot seem to understand where the statement, “Search terms ‘migrate’ and ‘migrating’ are
trending HARD in PH right now.”
came from. I personally see no HARD
trend developing anywhere in those graphs, when all things are put in proper context.
There are also other comparisons that could have
been made to put more perspective into the data. Examples could be relative volume compared with the recent
Philippine elections in the last decade, to see the then and now. They could have also looked at the historical
trend (say from 2004 to date) of the keywords to show the relevance of the data
they are presenting in isolation.
But they would not, because if they did, it would be clear that the
overall interest on these keywords are actually on the decline in itself.
Presenting the graph above, showing historical
trend from 2004 to date, will definitely not support the proposition and is
definitely not in line with the narrative.
So this is definitely a graph/analysis that is not fit for purpose. Which brings me to point no. 3.
3. The Driver
There is always a driver – a purpose, when we do
analytics. In business, we do it to
support sound and informed decision-making.
In debates, we use it to convince the opposition to see things from our
vantage point, if not to fully get them to our side of the fence. In quality, we do analytics to find the
balance between the cost of quality, the bottom line, and the satisfaction of
the customer we serve.
There is always a driver in analytics – an end goal,
if you may. The report we generate in
analytics is the tool that we use to get to that designed end goal, be it an
event, a condition or a call to action.
This makes data analytics deliberate.
The end result we want to achieve and the action that we want to happen
always dictates the way we select our sample population and how we present the
data.
This is the HARD truth about data analytics. And the true measure of brilliant data analytics
is when the end goal is achieved. Analytics
without any driver is an utter waste of time.
On this particular case, we all know that the
election is over. The votes have been cast.
There is no amount of data analytics that can change the results. There are no more decisions to make nor actions
to take with regards to the election. So
what is this data analytics report, and the media promoting it, trying to
achieve? What is the driver? What is the
designed end goal?
This is where the fun part that I promised begins…
4. The Keyword
Let us be naïve for a while and assume that the
perceived irregularities I pointed out on the first 3 points do not exist and that
the data set is pristine. My next big question
would be, “What was the methodology used for arriving at the search keywords,
‘migrate’ and ‘migrating’? Yes, it was claimed
that the words started showing up on real people’s posts on social media, which
prompted the more detailed analysis on the keywords. However, these keywords did not actually
trend on Twitter, Facebook, nor Goolge as far as I am aware of. They were never at the top 10 of any trending
list between May 10 and May 15. Of the
many trending election-related words to pick during the study, ‘migrate’ and
‘migrating’ were chosen. There are also other words like, ‘happy’, ‘hopeful’,
‘sad’, scared’, ‘cheating’, etc. that could have been used to at least assess
popular sentiment. Even
“#halalandayaan2019”, according to coconuts.co, was the no.1 trending topic on
Philippine Twitter as of May 14. Why was
there no analytics done on that hashtag instead?
I appreciate that choosing keywords for search
analytics is more technical than just choosing random words to anyone’s liking. Having said so, if no selection methodology is
presented, my first assumption is the selection was ad hoc, arbitrary,
whimsical – and “maybe” deliberate.
Now it gets more interesting…
5. The Timing
I always believe that in everything, timing is
everything. So, I could not let this
minor detail pass. I would like to draw
your attention to these two curious screenshots, with particular focus on the
highlighted parts…
Yes, you are right!
There is no imagining things here.
The claim that “people in PH are not liking the preliminary elections
results” thus driving the search terms ‘migrate’ and ‘migrating’, was actually made
15 minutes before the results from the first 15 clustered precincts are
actually received.
I said in point no.4 that the selection of search
words, without a defined selection criteria, “maybe” deliberate. This evidence right here showing the claim was
made 15 minutes in advance of the actual data being made public somehow makes what
initially seemed as an arbitrary selection appear more of a deliberate one. It is as if there is clairvoyance power at
play, which enabled foreseeing the upcoming trending of the “randomly” selected keywords
in the very near future.
Could it be a case of local server clocks at Rappler
and Facebook as well as the poster’s local system clocks not being
synchronized? It may well could be. But in this day and age when clocks
(especially computer clocks) are synced through date and time servers around
the globe, which in turn are synchronized with each other, I really find it
hard to believe that this is the case here.
My Final
Thoughts
There has always been that skeptic in me that make
me tend to question things. Amongst
them include my questioning of how mainstream media presents facts and
data. In the past, and up until now, I
notice that numbers, data and figures have always been misrepresented, misconstrued
or miscalculated by them, intently or otherwise. This started off subtly but this one here is
so blatant that it feels like a slap on the face. It is as if the Filipino people do not have
brains and are incapable of thoughts and analysis.
Isolating a small subset of data and presenting it
as if it is the entire data, and somehow making it subtly (and sometimes, not
so) look like a true representation of an entire population is a big insult to
the Filipino intellect. Case in point, Rappler claims that freedom of the press is under attack under this administration. However, it is only them that is being restricted, primarily due to the fact that the government sees them as abusing their press privilege. But somehow, they have turned it such that Rappler is the "press" and the perceived baseless attack on Rappler is an attack on the entire press - an isolation of small subset of data and turning it into a true representation of an entire population. However, when you look around, the remaining majority of the press are able to exercise their privilege without much fuss. For cases of self-serving generalization such as this, I take
grave personal insult. This is why I suddenly
felt the need to exercise my constitutional right to freedom of
expression. This is my expression of
outrage and dismay.
While we are on the issue of “freedom of expression”,
that many claim is under attack and is stifled by the current administration,
this I say to you: the fact that the
media freely and continuously promotes reports such as this, with very little
repercussions, if at all there is any, is a testament that the freedom of
expression, and with it, democracy, is very much alive.
Please do not continue to insult the Filipino people. I think we all know what is at play here. Seriously, who are the people who would consider migration as a means of escape from a 'dire' national situation? Those with the financial capacity to migrate! That is the upper class. These lot are surely well traveled and would surely already be aware, more or less, of the migration processes. They would not need to do much research, especially if consultants are used for the migration. Would you have us believe then that the lower class would be the ones that have driven up trend? How many of them even have passports? In dire situations, the lower class' first worry would be where to get the money to buy the basic necessities for the day. Also, common sense would dictate that as part of the migration process, the search hits for 'passport' should also show strong correlation with this data since a passport is a mandatory requirement. But alas, the available data do not show any meaningful correlation to support this.
What about the middle class then? Well, the majority of the middle working class are OFWs whose search hits (for 'migrate' in case they really are on the lookout to migrate) would not even be counted in the data set for searches in the Philippines. So I am pretty sure they are not part of that google data. What about the middle class still in the Philippines? Well, for starters, they should have triggered a surge on the 'passport' keyword search at least. In fact, if you look at the data on 'passport' search hits in the past 12 months, that too is on the decline. There really isn't anything much to work with here to make the narrative credible at the very least.
Amidst these senseless, insignificant data that the media decided to pick up on and publicize, I go back to question on the 'driver'. What is the 'driver' here? Better yet, who is the 'driver' here? What is the end goal? From what I can deduce from the available data, the 'driver' would be to cast doubts on the electoral process. I can also deduce that the 'drivers': (1) are within the Philippines; (2) are a very insignificant fraction of the voting population; (3) already have passports; and (4) have the capacity to migrate. Hmmmmmmm... Esep Esep pa more.
What is the end goal? I suppose for this, we will all just have to wait and see.
To conclude, this is clearly a very blatant disrespect to us thinking Pinoys. I say to the Filipino intellectuals out there and those Filipinos with considerable reach, please use your gifts and powers responsibly. You never know when your opinions will be picked up and used as an instrument for discord rather than for strengthening our nation. For those who cannot, does not want, or simply refuse to aid in the development of our nation, kindly please follow these steps:
- open up your browser;
- type google.com on the address bar,
- type ‘migrate’ as your search word
- hit ‘Enter’.
If you need help,
contact Mr. Locsin. I heard he is
willing to help out disgruntled citizens wanting to pack up and go.
Leave the Philippines to patriotic Filipinos who truly care for this nation.
Comments
Post a Comment