In AI, an increasing amount of history will cease to matter.
That's perhaps exaggerated: an increasing amount of historic data used in models will not matter to the newer varieties.
To see how let's briefly recount the famous story of AlphaGo and AlphaGo Zero. Machines could defeat Chess champions by 1996. However, given that the estimated number of legal positions in Go, at 10^170, exceeds all the atoms in the universe, we could not make machines beat the best of us in Go for two decades.
Finally, with deep learning and training in over 30 million human games, AlphaGo dethroned the human champion in 2016. Fast forward a year to 2017, and enter AlphaGo Zero. The new machine started with no knowledge of human gameplay, yet it surpassed AlphaGo in just a few days of self-play. Not only did it defeat AlphaGo, it did so with a staggering score of 100-0.
Let's re-emphasize: A machine, starting from scratch, does not need even a single human game history to beat machines fed on all human history.
Even before GenAI began dominating headlines like Covid during its peak phases, synthetic data was predicted by Gartner to constitute 60% of all data by 2024. Surely, there must be analysts who feel synthetic data percentage could exceed 90-95% by the end of the decade. AI models create human-generated text, art, and other material into high-level symbols during the training. Currently, most data is needed repeatedly for reinforcement and re-calculation of the weights.
However, the day may not be far away when machines use vast amounts of synthetic data (verified, in whatever it means) or high-level symbols for the training, along with frozen weights and neurosymbolic processes to train. Imagine a world where billions of synthetic books in Shakespeare speak, indistinguishable from his originals, already exist. Would new AI models of that era need the original works to create additional synthetic renditions, or could they rely solely on the synthetic corpus?
The legal and ethical battles concerning AI and data usage are intense and far from resolution. However, it's plausible that, by the time we settle these debates, machines will have advanced so far that they won't need to regress, even if some original materials have to be disregarded from their learning process.
In conclusion, Artificial Intelligence is standing on the precipice of another rapid shift that is yet to catch attention. The importance of historical data in AI modeling, currently the bedrock of machine learning, is likely being dramatically reevaluated within the models.