This episode is brought to you by the Cloud Wars Expo. This in-person event will be held June 28th to 30th at the Moscone Center in San Francisco, California.
Highlights
00:20 — Data wrangling is the process of normalizing—or, cleaning up—your data for machine learning.
00:45 — Paul shares an example of a project that’s he’s been working on involving a 10-year dataset focused on finance data and productivity data in healthcare. Looking at the dataset, they identify fields that are missing key pieces of information.
01:15 — Another example is de-identifying data. This could be for security concerns as well as help with bias in the data.
01:50 — In some cases, you’re usuing data science and machine learning techniques to fill in the fields with missing data.
02:17 — Once the data wrangling process is done, you will, ideally, end up with a complete dataset that you can use to build and train your machine learning models.
02:28 — The first steps of data wrangling may require strong data talent and resources before you send off the datasets to your data scientists.
Looking for real-world insights into artificial intelligence and hyperautomation? Subscribe to the AI and Hyperautomation channel: