The Fourth Industrial Revolution is a fancy tag-line trotted around global think-forums such as WEF. It represents the current wave of cutting-edge technologies: artificial intelligence, machine learning, blockchain, internet of things, CRISPR CAS-9, et al. It’s exemplified by world-beating companies such as Google and Facebook, and it’s currently being scrutinized for issues surrounding privacy, mental health, bioethics, and inequality. With the European Union formally launching its GDPR initiative tomorrow, there is heightened skepticism around the trajectory of these trends and what it means for our society.
We saw similar patterns historically, with prior Industrial Revolutions. In the Second Industrial Revolution of the late 19th century, modern technologies centered around industrial production (iron, steel, rail) and chemicals (kerosene, petroleum, plastics, polymers) were the disruptors of agrarian-rooted society. These nascent technologies brought fantastic advancements, such as urban density, transportation and energy grids, but also unbeknownst externalities such as labor exploitation, tenement housing, and pollution. The Titans of their day (Rockefeller, Carnegie, and Morgan) amassed similar fortunes as the Titans of our day (Zuckerberg, Bezos, Page) and ultimately set an example with altruism and charity.
Carnegie’s charitable pursuits are relevant to this piece. He spent, by some estimates, 10 – 20% of his net worth endowing the creation of ~2,000 public libraries across the country. This article pins the root of his interest in libraries to his early days as a bobbin boy in a textile mill. Hungry to learn about machinery, he went to his local library, but couldn’t afford the subscription fees. This was the reality of the time. Going back to the days of abbeys and monasteries, access to knowledge was hardly ubiquitous and was reserved to those either gentile or committed to maintaining the traditional canon of knowledge.
Perhaps Carnegie was enamored by the democratizing impact that universal education could provide. Or as Stephen Jay Gould put it, “I am, somehow, less interested in the weight and convolutions of Einstein’s brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops.” Carnegie set out on his public library agenda in 1881.
Knowledge Classification – Late 19th Century Libraries
Right around that time, a young librarian by the name of Melville Dewey invented a classification schema for libraries that we now know of as the Dewey decimal classification. You may remember it as a kid, roaming through a public library, looking for the section on dinosaurs or planets. With the growing proliferation of public institutions of knowledge, classification became important, as public libraries did not have Franciscan monks at hand. Dewey created a system that allowed for the replication of stores of knowledge. At the same time, Carnegie was scaling these institutions across the country. Dewey provided a means of organizing that scale. The combination of the two was a seismic shift in the access of knowledge for common citizens.
Dewey Decimals (and perhaps even libraries) are now a relic for today’s youth. For this generation, knowledge is something that is ethereal and infinitely accessible. It’s been simplified from syntax to conversation. It is callable by the whirl of a few thumbs, or a “hey [insert androgynous name]” request. There is no recognition of classification schemas or even Encyclopedias. Knowledge is now a reflection of accumulated human experience, callable remotely and instantly, rather than a curated system of histories and canonical sequences. It reflects what the internet provides us as a species.
Knowledge Classification – Late 20th Century Digital Revolution
This evolution wasn’t always obvious, as the early days of the internet (and the conduits to the internet) show. Oversimplifying things, before there was Google, there was Yahoo, and before there was Yahoo, there were AOL, Compuserv and their like. Before Google (B.G.?), there was still a push towards Dewey-style indexation and classification as a means of curating the internet. In 1996, Yahoo’s homepage looked something like this:
It looks like a digital Dewey Decimal system, compartmentalizing and sorting topics, allowing users to navigate through the directed pathway to the curated areas of knowledge. With the nascence of the internet, this was an application of tried-and-true methods on a misunderstood technology. As tech-columnist Gil Press noted in a Forbes article, “Yahoo worked hard and employed many people in organizing in a neat taxonomy the rapidly-growing content of the Web. It even had a Chief Ontologist on staff.”
Knowledge Classification – 21st Century Learning Algorithms
We of course know what happened. Yahoo, with a huge incumbency advantage, was thoroughly surpassed by Google. Why? Because Google didn’t create ontologies and indexes; it used a learning algorithm (a very basic one at that) called PageRank to determine the relevance of websites. This was determined through algorithms rather than through subjective indexation. Page and Brin’s insight came from their research days at university. They noted that scientific papers were often “ranked” by the frequency of their citation. And so, without having to read, assess, and curate content, papers could be ranked somewhat reasonably by the frequency of their citation.
Purists cried foul. Surely a team of librarians, working studiously at Yahoo could curate better content than a simple relevance algorithm? In the early days, this was certainly the case, as the studious review of websites helped Yahoo users linked to Brittanica pages on Tigers instead of Geocities fanpages. Similarly, we saw arguments against unverified knowledge in the halls of higher education. This included professors forbidding citations to Wikipedia as late as the early aughts.
Today Google is the de-facto source for retrieving relevant information, and Wikipedia is perhaps the de-facto source for encyclopedic information, period. This is not to say a google query will provide the definitive truth, but in a world where Google is the first and last choice for search query, the idea of a correct response is somewhat self-fulfilling. The problem curation through human judgment was the internet’s growth, which was (and is) exponential. It compounds content and information at a rate that humanity had never seen before. This makes the manual curation of truth impossible. Growing at an exponential rate, no judgment-based system could keep up with that power law. The choice of Google to deploy a relevance algorithm in their “mission is to organize the world’s information” was another seismic shift in the access of a knowledge, given its infinite scalability and algorithmic means of improving relevance result.
Flash forward to 2018, and this exponential accumulation of data has necessitated relevance algorithms to make order our lives. Relevance algorithms are founded on beautiful mathematics and geometric principles. Most people would fail to recognize the geometry behind conical centroids can be used to determine the efficacy of relative rankings, or that graph theory is a wonderful way to develop self-learning ontologies.
Google, Netflix, and Amazon provide great examples of practical implementations for relevance algorithms. Video recommendations are now woven into user experiences across YouTube, Netflix and Amazon Prime because the abundance of choice cripples our ability to enjoy said abundance. Google will boast that it returned 23.5 million results for “Vietnam Travel Recommendations,” but user data show that 35% of users click on the first result, and almost 70% of clicks come from just the top 4. The adoption of data-driven decisions has been central to these consumer-internet giants. Their business models are built upon scale. Scale provides data, which provides fertile learning. Learning improves the algorithm, which improves the user experience, thereby expanding scale. It is truly a beautiful business model, and an unconventional economic model. The work of tech-economics such as W. Brian Arthur helped explain the efficacy of these network effects and how transformative they can be.
Financial Dewey Decimal Systems
And yet, in many industries, relevance algorithms are eschewed for human judgment. Finance, is of course, one of those industries. We do not use relevance algorithms to classify investments, we use proscribed buckets. And every organization has a slightly different definition of its buckets. For example, the hedge fund industry has grown from a few hundred to over 10,000 institutions by some counts, and yet there is no Google for hedge funds; no exhaustive, central repository that has quality data from which it can learn. Morningstar has something of a strangle on mutual fund data, but their recommendation engine is far more naïve than a Netflix movie recommendation.
This is partially due to the nature of the data, the importance of privacy, and the nature of the industry. Buyers of financial products do not have level access to information, and they are not willing to democratize sensitive data. Creators of this data are sensitive towards its dissemination and would rather guard it than proliferate it. While transparency in price and performance has been revolutionized by passive ETFs, there is still an asymmetry of knowledge between buyer and seller of most alpha-oriented financial products, which often necessitates intermediaries in the form of Financial Advisors (retail) and Consultants (institutional).
FAs and Consultants are like Financial Dewey Decimal Systems. They rely on accumulated and curated knowledge, through experience to create classification rules and applications of these knowledge structures. This system provides value to consumers given the asymmetry of information and the sheer amount of folly they can help their clients avoid. The role of the FA or the Consultant is ultimately to tilt clients into making favorable decisions in spite of their biased intentions.
But software is coming and looking for ways to tailor relevance. Various companies incorporate actuarial and behavioral information about individuals, providing pre-canned asset-allocation recommendations to FAs, so they can focus on relationship management and business development. Other fintech firms seek to aid in the determination of behavior, skill, and risk/reward.
The role of Fintech, and the future of Dewey decimals
In the same way that novel fintech companies have disrupted lending and credit evaluation, we think that statistical inference can be used to determine relevance in capital allocation decisions, and this extends beyond quants doing it for their own portfolios. It is a core foundation of the way Epsilon thinks. We often use the analogy of Yahoo -> Google in explaining our investment approach. A few novel techniques that have relevant application:
- Self-Organizing Maps to classify behaviors
- g., take a lot of investment management return streams and use SOM to create clusters/patterns
- RankAlgorithms to better define conviction
- Both Allocators and Investment Managers grapple with how to size positions based upon expected risk/reward. Outside of a Markowitz framework, this is often a “gut feel” process that can be optimized through various sorting algorithms
- Neural networks to better understand clustering phenomena
- Given the reflexive nature of markets, many participants want to understand the ebb and flow of clusters. An example would be hedge fund crowding or flash crashes, and techniques such as neural networks help unpack these phenomena
Fintech firms will continue to sprout, helping organizations with their data to drive insight, and one will eventually develop a platform to drive meta-insight. Perhaps then, we’ll see the Financial Dewey Decimal system replaced.
The information contained on this site was obtained from various sources that Epsilon believes to be reliable, but Epsilon does not guarantee its accuracy or completeness. The information and opinions contained on this site are subject to change without notice.
Neither the information nor any opinion contained on this site constitutes an offer, or a solicitation of an offer, to buy or sell any securities or other financial instruments, including any securities mentioned in any report available on this site.
The information contained on this site has been prepared and circulated for general information only and is not intended to and does not provide a recommendation with respect to any security. The information on this site does not take into account the financial position or particular needs or investment objectives of any individual or entity. Investors must make their own determinations of the appropriateness of an investment strategy and an investment in any particular securities based upon the legal, tax and accounting considerations applicable to such investors and their own investment objectives. Investors are cautioned that statements regarding future prospects may not be realized and that past performance is not necessarily indicative of future performance.