A Datacentric World

What Is New with Data?

By Santi on June 14, 2019



Then and Now

Perceiving the world around us, gathering information and using it to get ahead in business endeavors is certainly nothing recent. Picture a medieval tavern: the owner, if he cared, made sure to learn the price of the pint on its nearest competitor, the flow of people in and out his shop throughout the day, and the villagers’ favorite kind of ale. If he was a crafty beer maker, he would also make sure to keep the recipe up to date according to the best trade secrets. Companies today own more data and better processing techniques, but the same principle applies: appropriate information means better decisions and products.

Data is at the center of the media scene. Its quality and quantity have grown orders of magnitude from what they used to be, its mining and processing techniques have seen tremendous development. Hardware processing power has improved exponentially (following Moore’s Law), meaning that huge amounts of data, which previously took months to be processed, started taking hours, then minutes, and finally seconds. This, added to new mathematical and statistical models, and modern software, has enabled the creation of incredible digital and physical goods and services, which allow machines to surpass human skill on multiple, yet still specific, tasks.

Business owners, executives, politicians, employees, and curious citizens are all paying attention. The Data Revolution has arrived and the new gold is there for anyone willing to do the panning.

How Things Have Changed

Many have called this panorama Data Revolution, implying a sudden change. A much better expression would be Data Paradigm or Datacentrism, since no single point in time was particularly distinctive. Businessmen, the media, and marketers are sometimes fetishists of big words: Revolution, Business Intelligence, Big Data, Machine Learning, Artificial Intelligence. Experts on these subjects often explain that it all comes down to plain -though complex- math, plus the right equipment and tools. A conspiracy theorist might conjecture that there is a secret plan in place: show the marvels of technology, spread FOMO (fear of missing out), make companies spend more on unnecessary data projects and employees want to work on them. Even though reality is hardly ever so blunt, hype is high around the subject. In fact, 40% of the European startups that were classified as using AI technology are completely lacking it, while attracting 15 to 50% more funding.

It is hard to anticipate the consequences of such rapid change. The Internet Revolution was a necessary precursor for the data ecosystem to thrive. Admittedly, it is important to be careful when analyzing recent history, but it is not an exaggeration to assert that most of the Internet's benefits are still to be seen. Possibilities for human communication have been forever expanded: e-commerce, working from home, reducing the effort and paperwork required for procedures. Yet, adoption has been parsimonious, and the benefits have been unevenly distributed (about 45% of the world’s population is still lacking access to the Internet as of today). If the network, 50 years old, still has fruit to bear, Datacentrism will bring benefits we cannot yet fathom.

Winners and Losers

Silicon Valley has the eyes on the prize. Big tech is investing heavily: they are sure they can take the lion’s share of the profits on specific links of the value chain, such as the required hardware, infrastructure, or anything requiring economies of scale or having high barriers to entry. Due to their internal bureaucracy, however, they are also finding some barriers to innovation. This has allowed many smaller companies to be born, grow and be acquihired by tech giants (which has become their definition of success).

For almost any company of any size, taking appropriate measures to handle data properly can bring real economic benefit (as it always has). It is a good idea for stakeholders to carefully asses the situation and craft a strategy, but caution is advised: bad things happen when the consumer doesn’t understand the product he or she is buying (as it happened during the subprime mortgage crisis).

Before embarking on a Data Project, two key subjects to explore are Data Ethics and Data Epistemology (theory of knowledge). Society is becoming more aware of the ethic issues on this matter: privacy violations, discrimination, lack of accountability on AI decision making, and job displacement due to increased automation possibilities. But there is an other half to those problems, and awareness is not so widespread.

Knowing the future is impossible. On a sufficiently isolated system, with limited physical forces, predictions can be accurate enough. But for more complex systems, even though simulations are possible, precision is highly limited, because incredibly small changes on the model’s initial parameters can have enormous disproportionate effects on the dynamic evolution of the system. These are known as chaotic systems, and examples include the weather, road traffic, and the stock market.

On the business side, resources are being sadly wasted on fool’s errands. The problems on data projects can be related to:

  • Goals
    • Impossible: for example: perfect weather prediction
    • Unrealistic, given the assigned resources
    • Useless: not bearing a suitable product or actionable insight, or not properly interpreted
  • People
    • Management: unable to allocate resources where needed
    • Engineers: lack of technical knowledge
    • Sales and Marketing: for example: creating an undesirable product
  • Resources
    • Time
    • Money
    • Technology
      • Tools
        • Software
        • Infrastructure
      • Input Data
        • Quantity: too little or too much
        • Quality: unreliable, stale, improperly formatted
      • Processing
        • Data handling: e. g.: retrieving and cleaning
        • Modeling and algorithm selection

So

Terminology and problems aside, Data as a first class citizen is here to stay. As this area keeps growing, opportunities for misuse will also be latent. Let us not forget that data has always been a central part of business, even if only recently the org charts have been updated (with titles like Chief Data Officer) and buzzwords have stolen the spotlight.

Next: On Data Projects ›