In my previous article on the roadmap to digital transformation, we introduced the Data Science Maturity Model as a simple tool that assesses how reliably and sustainably a data science team can deliver value for their organization. This model is based on the concept of levels or stages that the organization traverses as it matures managing data.
Understanding where the organization is standing within this maturity model is crucial, so it will draft the strategy to move to the next stage.
Going Beyond Digital Transformation
Many organizations seem to be focusing on the analysis side of the chart, to better understand what type of analysis process/skills are needed based on the overall state of the organization. A critical process that must take into account that Digital Transformation is not just about skills and technology, but also requires a cultural change.
However, we can’t ignore that the data, which is the commodity of digital transformation, needs to be sourced, stored, transformed, staged, and treated for analysis, regardless of the maturity state of the company. This is known as ‘Data Architecture Management’ cycle. Successful data management requires a properly sized and managed architecture.
Data Architecture
According to Wikipedia, Data Architecture is the models, policies, rules, and standards that govern which data is collected and how it is stored, arranged, integrated, and use in data systems in organizations.
Data Architecture is determined by technology available, financial and budgetary restrictions, and organization maturity when it comes to data analysis and culture—a culture of using data within the decision-making process. Some 15 to 20 years ago, data architecture had been concentrated in monolithic systems where data was stored in data silos, from spreadsheets to simple databases. As an organization matures, data warehouses are built to deal with more and more data.
Those monolithic systems have been functioning well, simply because the growth rate of data generated and stored was very lineal and somehow could be anticipated. Because of that, technical, financial, and human resources could also be anticipated. However, things changed dramatically from the early 2010’s
The explosion of Big Data brought a new paradigm into data architecture, that we synthesized in the so-named ‘The 3 V’s’: Volume, Velocity, and Variety. The volume of data generated -that needs to be stored-, the velocity of what that data is generated, and the variety of sources of data, forced human ingenuity to improve and create new ways to architect how that data should be stored and governed. Welcome to Modern Data Architecture.
Modern Data Architecture
Previous data architecture used to happen within organization premises, in local storage, data centers and similar equipment owned and maintained by the organization itself. Modern data architecture is now happening within the cloud environment.
Modern Data Architecture nowadays must address the following requirements:
- Shareable data
- High accessibility
- Flexible infrastructure
- Strong governance in place
These are not new requirements, for sure, but today’s context is very different to what it used to be until 1 decade ago.
Contextualizing Modern Data
How we understand and define ‘shareable data’ today is not the same as it was in the past. Now all professionals need to contextualize their own data with other data points. A process that requires access to other ‘data silos’ from other departments or even data generated outside the organization or company (weather data, infrastructure, etc). That recontextualization of data has transformed data warehouses into a component -not ‘the’ component- of the data repositories. A change that has resulted in the creation of data lakes.
Bringing actionable context to local data that interacts other sources requires reliable and fast access to the data. A requirement that becomes more important when real time analytics become part of the data processing equation. Therefore, availability of data, all data, must be highly available. This high availability involves many technical aspects such as redundancy, transmission speed, and many other security aspects.
Data grows at a very high rate but it doesn’t follow a uniform pattern. Not all data sources grow at the same rate and some even reduce their size or don’t grow anymore. Previous generation monolithic architectures had high maintenance and infrastructure cost. Modern architecture addresses those volatile scenarios by leveraging infrastructure that is highly flexible and adaptable to use the infrastructure to deal with the growing size of each source.
Today’s governance and security within this tumultuous data environment is incredibly important. Governance and Security have been always critical; traditional infrastructure has been mostly focused on user access control and other security aspects like redundancy. Nowadays, in addition to this internal security controls, organizations of any kind and size are facing growing challenges from outside.
Types of Clouds in Modern Data Architecture
Let’s go deeper about what types of clouds—modern data architecture—are and when to use it
‘Cloud’ is a buzzword that today has been normalized and adopted by the public. Yes, we all have an idea about what is ‘the cloud’ and we interact with cloud applications all the time in most of our smart devices -at least-, in our personal computers and other infrastructure.
However, in practical detail, do we know what is cloud computing? What are the different types of cloud computing and when to use them? Let’s make a high-level summary, as we have noticed that many CXOs are willing to learn what is cloud computing—in very plain language—its main features and when to use each type. Here are the 3 main types of clouds.
Public Cloud
Computing services are offered by a third-party provider over the Internet. A public cloud may offer some services or storage for free or sold on-demand, allowing customers to pay only per usage of CPU, data storage, applications and software or bandwidth consumed. Because of this model, there is no need to incur into any upfront investment or any cost of ownership.
A little clarification here, ‘public’ does not mean that your data is accessible to everybody, it means that the third-party infrastructure is available to be shared to many customers. Similar in concept to a time-share real estate, where the owner of the real estate rents the usage of its properties to multiple customers. The owner of the real estate is responsible for the maintenance, taxes etc; the same concept applies to a public cloud. Public clouds are secure environments.
Private Cloud
A private cloud can be easily defined as a cloud happening within your premises or within third-party premises. However, if is on third-party premises, it will be dedicated infrastructure, not available to the public. So, Private cloud brings the benefits of public cloud, but in a private or proprietary environment.
Hybrid Cloud
As the name suggests, a hybrid cloud combines an on-premises data center—or private cloud—with a public cloud. Some parts of data and applications are hosted in proprietary or private environments and other parts of data and applications are hosted in a public domain, and both data and applications can be shared between environments.
There is a variation of hybrid cloud called ‘multi-cloud’, where different public clouds converge with private clouds. Some organizations use this setup in order to separate environments, applications or functions.
When to Use Each Cloud Type
Public clouds are a great option to save investment on purchasing, managing, and maintaining on-premises hardware and application infrastructure. Public clouds can also be deployed instantaneously, and it is capable to scale up or down according to the user’s needs. Also, as a common environment for the organization, all users use the same application, same version for every application/software, same data -facilitating single-source of truth- and with the proper set up, it is more robust in terms of cybersecurity, Ransomware attacks, intrusion detection or even data leakage.
In contrast, setting up the right set of features for a public cloud can be tedious, depending on the customer needs and the profile of the cloud provider. Also, the customer must carefully examine the scope of the services, security features and service offerings by the third party. Usually, every service or feature required involve a fee or cost.
Private clouds deliver an extra layer of privacy through both company firewalls and internal hosting to ensure operations and sensitive data are not accessible to third-party providers. So, in environments operating with sensible data this setup may be recommended.
However, a private cloud requires a dedicated team of IT to maintain, update and set up the entire infrastructure according to the business needs. Also, the lack of flexibility of a private cloud or on-prem environment can add a financial and operational burden to the organization.
Hybrid clouds are a great solution for mature organizations where they can use sensitive data in private clouds or on-prem environments and operational data in public clouds, optimizing infrastructure and reacting fast to potential changes.
On the other hand, having separate environments can cause operational inefficiencies due to asynchronous updates, different versions of the same applications and software, and sometimes even incompatibilities.