There are quite a few applications and websites out there capable to create a new image of a person by merging two different pictures or even create our own selves some years older. Artificial Intelligence (AI) is capable of generating what a new person would look like by mixing features from two pictures and is even able to anticipate how we will look with an older appearance. This is called ‘synthetic data,’ which involves data, like pixels in a picture, that’s artificially generated by a system imitating real-world data.
The purpose of synthetic data, aside from the fun aspect of how I would look when I’m 80 years old, is to recreate real-world conditions so AI applications can learn from them and transfer such learning to the real world. There is often not enough data available to train an AI solution; that data needs to be artificially created — that’s synthetic data. It also has many other applications, like Web 3.0.
Web 3.0 From a Data Perspective
“The fundamental difference between Web 2.0 and Web 3.0 is the pervasiveness of data collection and the deliberate actions and decisions based on that data…Where Web 2.0 gathered data in existing mediums, Web 3.0 actively and purposefully collects data in whatever context necessary. We go from analysis based on limited sets of data to analysis across sometimes seemingly unrelated data…The Semantic Web will enable machines to COMPREHEND semantic documents and data, not human speech and writings”.
This is what Tim Berners-Lee said in 2006, five years after ‘The Semantic Web’ paper was released.
It may seem surprising to read something like this about Web 3.0 that was written more than 20 years ago, while we are all discovering now what the Metaverse and Web 3.0 are all about. Even that document described that the Web 3.0 world wide web should be totally decentralized.
In this scenario, the usage of blockchain seems highly needed, as it would add encryption to specific data. This includes personal data or sensible data used within Web 3.0. This is something completely decentralized, where the end-user has the full authority and responsibility for specific sets of data.
Generating Contextualized Synthetic Data
If all this data is contextualized, then systems can generate synthetic data that’s now contextualized. Data would also bring the precious value of the context, providing a better understanding of what it means.
As of today, the billions of websites on the internet are not contextualized. Therefore, it has no meaning for a system or for search engines working on a keyword system. When data is contextualized, when we search ‘show me a mustang running,’ the AI application will better understand if the search implied a ‘Ford Mustang’ or a ‘wild mustang horse’.
It will not stop there. By generating synthetic data, AI will be capable of achieving more with less since the context will become better and therefore results will improve.
Final Thoughts
While combining synthetic data and Web 3.0 shows great technology opportunities, it also brings risks and challenges. New skills are needed from the end user perspective to be able to handle what data or information to share, when to share it, and with whom to share it. Great security systems and solid blockchain solutions should be developed — not to mention the full interoperability among different systems, clouds, and platforms. So, new standards of data transfer and system transfer will be required.