
Sustainability means doing two things that are easy to define but very hard to implement:
- Understand the organization’s impact on the environment;
- Take actions to Mitigate that impact.
Simple, eh? Sure is, if you’re the CEO and just issue orders to your CIO, COO, CFO, and CSO (S for “Sustainability’”) officers to “fix that sustainability thing.” As a CIO who used to be a manufacturing/safety engineer, I feel for the CXOs trying to get their arms around those two problems. One of the first and biggest challenges revolves around the data needed to understand environmental impacts, so that’s where we’ll focus today.
Start with “Understand“: What must we measure to understand our environmental impact? Let’s start by defining boundaries for our measurements. According to one of the sustainability “Bibles,” “The GHG [Greenhouse Gas] Protocol Corporate Accounting and Reporting Standard,” organizations must first determine their “boundaries” (i.e., what’s “internal” and what’s “external” from a reporting standpoint). If you have an accounting or finance background, this part is for you because you’re already dealing with ownership vs. control issues from a financial reporting standpoint.
Once you have divided your universe into “us” and “everyone else,” you must deal with three ever-larger “Scopes” of emissions:
- Scope 1: Direct GHG Emissions
- Scope 2: Electricity-Indirect GHG Emissions
- Scope 3: Other Indirect GHG Emissions
Scope 1 starts with the question, “What are you measuring?” In a nutshell, you must measure every source of GHG emissions produced by assets you own/control. So the focus now turns from accounting data to engineering data:
- What processes/assets emit GHGs?
- What sensors do we need to measure those GHGs?
- How do we get the sensor data into our database?
If your organization hasn’t been set up to bridge the IT/OT (Operations Technology) divide —and most organizations haven’t been, out of fear or just inertia — you need to build those bridges . . . with lots of attention to cybersecurity.
Scope 2 sounds easy: Just ask your electric utility for the data you need! And compared to Scope 1, it may be. But take it from me, a former CIO at an electricity retailer, there are a few complications your data folks need to manage.
- Do your utilities have the data you need? Emissions data by customer endpoint was never a priority for regulated utilities, and their data collection tools are often ancient. I expect regulatory, social, and investor pressure to change this someday, but you should expect things to be in flux for a while.
- With how many utilities and resellers do you do business? There is very little data standardization among regulated utilities, so you will deal with many ETL (extract, transform, lead) processes and changes to incoming data formats.
- Data quality is an issue. Missing data, duplicate data, or just plain wrong data winds up in utility extract files. If you don’t build a robust data validation process, you will regret it.

Which companies are the most important vendors in data? Check out the Acceleration Economy Data Modernization Top 10 Shortlist.
Scope 3 represents the most complex data problem by far because it deals with everything upstream and downstream from your owned/controlled assets. Do you use a contract trash hauling company to pick up your trash? You need their GHG data. Does that trash get burned at some incinerator or wind up in a landfill? You need their GHG data (even if you don’t have a direct business relationship with that landfill or incinerator).
That’s upstream . . . but you also care about downstream emissions. If you make motorcycles, will you track and report on the GHG emissions from the use of every bike (and what will your customers think of that)?
The good news is that Scope 3 is usually defined as optional data for reporting — today. So society will have a bit of time to establish norms, and we engineers, IT folks, and accountants will have time to figure out workable mechanisms. I have two thoughts on Scope 3 complexity:
I imagine a “data brokerage” industry that collects firms’ Scope 1 emissions data and sells it to counterparties for use in Scope 3 (and even Scope 2) calculations. When I was in the energy business, we bought lots of data from ZE Data, which collects, validates, stores, and distributes numerous industry sources.
Another model of data sharing — built to handle large volumes of individual transactions rather than large industry datasets—is the Covisint platform (now part of the OpenText Business Network Cloud), which U.S. automakers started as a data and transaction switch for parts suppliers.
Life would be much simpler for Scope 3 counterparties if data were moved via a hub-and-spoke model, through a firm or firms that got paid to oversee data quality, rather than a spiderweb of constantly-changing point-to-point connections!
Since the Scope 3 “emission chain” extends from pulling the raw materials out of the ground to a product’s end of life (probably buried back in the ground), our sustainability data and process model may look like that of the European VAT (Value-Added Tax)in which we track emissions added by each step in the product’s lifecycle. This data and process model ensures that our calculations aren’t double counting emissions (remember the rule that “My Scope 1 = someone else’s Scope 2 or Scope 3).
I’ve learned from digging into GHG emissions that sustainability is a genuinely staggering data problem. I suggest your C-suite start discussing your “sustainability data lifecycle” without delay so your organization can take an early and active role in shaping this complex and far-reaching exercise in data science.
Want more insights into all things data? Visit the Data Modernization channel: