In this episode, Katie and Chris recap the IBM Data Science and AI Roadshow, IBM’s traveling exhibition for Business Partners that showcases what’s possible with IBM’s suite of tools including IBM Watson Studio, IBM Watson Machine Learning, and IBM Watson OpenScale. Listen in as they discuss key features like reducing and mitigating bias in AI models, regulatory compliance, the foundations of trusted AI, and much more.
Listen to the audio here:
- Need help with your company’s data and analytics? Let me know!
- Join my free Slack group for marketers interested in analytics!
What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.
This is in your insights, the trust insights podcast.
In this episode of inner insights, we are talking about all things IBM data science and artificial intelligence. As a IBM registered business partner, we had the opportunity last week to attend a multi day workshop on the latest offerings from IBM, including a sneak peek at Watson Studio 2.0. But the first day of the event was all around the business cases for data science and AI. And so Katie, I actually want to start off by asking you, what did you get in terms of the the perspective as a business owner, as a business leader, on the way that IBM is talking about data science and AI, since we talked about that, as a company an awful lot,
um, you know, it was a good conference, I think that it was a good partner event. And one of the things that I appreciated was, especially on day one, they were really catering to a wider audience. So IBM, in general, the Watson products, there geared towards technologists, they’re geared towards data scientists, but day one was really more of a general session to help folks like me who aren’t data scientists get a better understanding of what’s going on in their ecosystem. So one of the things that, you know, people in general typically know about, Watson, and then, you know, we as partners know very well is that their marketing, and their branding is constantly changing. And so we talked a bit about that, at the partner event on day one, in terms of what the product families actually look like, what these different
products are used for, and sort of what the upcoming roadmap looks like. And so as someone who, you know, either needs to sell in these products or understand how we ourselves are spending money to use these, it was helpful to know where each of these things stood, and how we as even a small business, a very small business could make the most out of these points. So it was good in the sense of just sort of getting the holistic overview of the Watson ecosystem.
One thing I thought was interesting is that there was a lot of emphasis put on the Watson studio toolkit, you know, the the, if you will, the fully stocked kitchen, to do data science. And there was a lot of less emphasis placed on the pre packaged the off the shelf products, like Watson health and Watson finances stuff. When you think about the conversations that you have with other business leaders, how much are they looking for the fully stocked kitchen verte approach versus the I just want to take something off the shelf and and run with it approach, knowing that the pre packaged approach has a couple of extra zeros and the price tag,
it’s usually splits out in the middle. So to get the pre package right off the shelf approach, you know, you don’t have to have a full data science team, you have to have somebody who understands understands what the thing is doing. But you sort of have these two conversations, if you get it off the shelf, then you can get up and running pretty quickly. versus if you want to get something that takes a bit more planning and thinking through and manual manipulation to get where you want to go with the models, then you have to have a data science team. So you have to weigh the cost. So one data scientists could essentially cost the same as one off the shelf product, or a team of data scientists could certainly be a large investment, but then you have all of that expertise built in house. So it’s really
a risk analysis in terms of, you know, investing in those folks trading them up to do what you want them to do for the company? And then do they take that institutional knowledge with them? Do you have turnover, versus having one or two folks who know how to use this off the shelf thing, which might cost more, but then the cost investment isn’t a piece of software, not in people. So it’s it’s, it’s two sides of the conversation. And, you know, I, I’ve always been a fan of build it yourself. So you would have you would invest in the people versus the software. But I can certainly see arguments for both sides. And it really depends on the situation of the company.
Yeah, when I look at what’s happening in data science, one of the things that I lean towards build on as well, is the rate of change. When you have a pre packaged product, you have to have things like QA, you have to have things like production, testing, and load testing, and all those things that make a software product stable. But when you have things changing so fast in the field, like there’s a whole new sub discipline that appeared literally in the last few months on using active learning for labeled data, which is using machine learning to help you create labels to create for machine other machine learning to analyze. So if you had, for example, a list of tweets, and you want to label that this is a tweet about healthcare. This is a tweet about finance.
There are now techniques for you to provide a very few number of labels and have the machine learning and the keep asking you, this is what I don’t understand what should this be. And that’s not something that’s going to be in the pre packaged products, not for a while and it will eventually make it there. But it will be 612 1824
months. And in a industry where speed is one of the competitive edges that you have, particularly when we’re looking at the AI landscape overall, and China’s dominance of the landscape. That six to 24 month gap could be a major deciding factor between your ability to take advantage of technology versus somebody else just eating your lunch. For someone who is risk averse, but still needs to be competitive. How should they be thinking about it is it is it back to back to our old friend planning? It really is.
You always gotta start with a plan. You know, it’s it’s an interesting point point that you’re making where the software might be out of date. Whereas So instead, so it sounds like you’re making the argument to invest in the data scientist. So a couple of things there. One, we know that there is a deficit of data scientists, there’s more jobs open than there are people qualified. So that’s Problem number one. Problem number two, is, you know, you’re making the assumption that people can keep up to date with the changing technology with the changing models with the changing methodologies. And so you know, you’re back to that, which is faster than machine or the human. And
you know, it, I always feel, and I always say this to you, both publicly and within within the Office is that you’re still way ahead of where a lot of people are. And so even if there’s a new methodology or technique or model that just got lost, we’re still, you know, three or four years ahead of a lot of other people, a lot of other companies in terms of the applications of the technology, you know, within the marketing space or within the verticals that we particularly work in.
You know, we experienced this a couple of weeks ago, when we were at the General Assembly event, we were talking with a couple of other data scientists who are so far in the weeds with what they’re doing, that it’s nearly impossible for them to think about that professional development and staying up to date on the latest and greatest things. And so, you know, I don’t have a good answer for you. But I think it’s going to be challenging. Either way you decide to go if you’re investing in people, then you need to carve out that time for them to stay up to date on what’s coming. If you’re investing in the software, you need to do your due diligence and understand how often they update the software and what their roadmap looks like.
Yeah, I think that is an important point particularly, we’re talking in the context of IBM because IBM is not known as a cutting edge market leader of a lot of this technology, IBM his brand is built around trust security verifiability and cover your but basically it the old saw you no one ever got fired for buying IBM is true partly because of that reliability, that trust and that security, not because it is the latest and greatest, when you look at what is in Watson studio versus what is insane, like h two O’s toolkit, it shows toolkit is more advanced, it has more cutting edge stuff, but it requires much more knowledge. And you have to know what you’re doing with it. And you have to know you have to explore it for yourself with the limitations of the platform are because it’s not well documented.
To put it it’s not the documentation on that is is not the best. And so that trade off between cutting edge versus covering your butt is a very real consideration. So for the average marketer, what should they be thinking about? What do how should they be weighing that trade off?
Well, so one of the things that IBM does do is when they’re developing their new machine learning and artificial intelligence projects, products, rather, they approach it like a true academic project. So they have their papers around how they built this the methodology, the algorithms, the math, all the science behind it. And I don’t know that a lot of other companies that slap the AI label on their software, take that same approach. And so if you’re trying to figure out, Okay, what can I use? What software is available to me? Even if it’s a free trial of something? I would say start with figuring out how did they build it, even if you don’t fully grasp everything that they have? in their onboarding materials? That’s so okay, because they at least did something they at least have the documentation available, they have somebody available to answer your question about how this thing is built. And so if you’re looking, if you’re a marketer, looking for
some software to help with sentiment analysis, I know that there’s a lot of lightweight tools out there that do that. Ask them, how is it built? What is it trained on? You know, why, why does it say this? Is this this is negative versus this positive versus this neutral? How did it get there? and IBM answers that question for if you ask them, they will tell you, you it might go way over your head, my head, definitely. But they will tell you. And that was something that I really appreciated about
the partner days that they were doing last week is they brought in people from sort of all different departments and skill sets to answer those questions, as best they could. And if not, they would follow up with you. And so I think that’s where you as Mr. would want to start is, how is this thing even built? Because, you know, if this is the software that I’m using to report for myself, or my company for my clients, I need to know what it does and where the data comes from.
One thing I thought was a miss that happened at think was that IBM explained it’s trusted AI framework. And I think that was a missed because it was not explained it part in today’s and and it speaks exactly to what you’re talking about. The components of trusted AI, at least is IBM Research spits out is robustness, fairness, explain ability and lineage. Which means is the system secure? That’s robustness is secure and reliable? Is it fair? Meaning it’s free? It can be free of bias? Or can it help at least help identify and mitigate bias? Is it explainable Can Can you understand what the model is? To your point? How does the thing work? What’s in the box? Right? Which is a really important thing to do, especially with deep learning, and then lineage? Where did the data come from? Where did the model come from? Where’s it going? How trustworthy are those sources? Those are all things that when you’re working with open source code, when you’re working, building your own code and stuff, if your if your data science and AI teams are not versed in each of those aspects, there is a risk of bias or corrupt ability or insecurity, or lack of interpret ability that can sneak into your systems. And depending on depending on on what regulatory environment you were operating, that could be a severely business harming problem. And so I would, that’s the one thing I wish they had spent more time on, because I think for anyone working with data science or AI, that’s essential.
So it’s interesting. I don’t remember the name of the product, I’m sure you’ll think of it.
They were IBM was talking about one of their
product services that helps detect bias, open scale, open scale, that was the one and I remember asking the question, how does how do you guarantee? Like, what is what happened? How did you build this thing? Because that, to me is such, like, if you’re going to put your name on a product that says this in an automated fashion detects bias. My first question is, how, how did how did you pull that together? Because they started the day talking about all of the ways that AI has gotten wrong. I mean, there’s that famous Amazon example with, you know, they trained the data set on higher end data, and then it immediately brought in some really big bias issues. And so those are the questions that, you know, even if you not sure what the answer is, when you get it asked the question, how did this come about? How does this work? Just ask the question, how does this work?
Yep. And that is partly because there are many, many more sales and marketing people in AI, then there are AI practitioners in AI who can explain in in detail how the system works. I mean, it was interesting talking to on day two, with the more technical folks who were in the room, talking about how they use the systems and, and even their feedback was some things and the product don’t work as well as others, etc, and so on so forth. With open scale, specifically, because I think it’s an important discussion topic for another time, we can briefly touch on it, you have to declare, what are the what are the protected classes, if there’s gender in the data set, you have to declare I need to protect this and this is the outcome I expect. And if there is age, you have to say like, this is the distribution I expect you to hold. And then the system will measure its modeling against those protected classes. But just as with anything, if you don’t declare this is a bias I am looking for the system does not know, to protect against it. Now what is nice about the open scale product is that once you declare something like gender, or age or ethnicity, the system will constantly look for that bias, but also also look for inferred variables that function equivalent to that bias. And by protecting that aspect, or that distribution, the inverted variables can’t create bias on their own, because the system is constantly towing all the model back to what you said. So, for example, if you were modeling Facebook, with Facebook data, you could absolutely create a model that gets gender or sexual orientation, right 99% the time based on somewhat what they like, right, if you like, I don’t know, pick a movie, that and there’s a known bias for people who liked that movie, but so called chick flicks, right? They, you can still say to the model, I’m protecting on gender. So even though you think chick flicks is an important predictor, you still have to have a 5050 split.
So it’s interesting, we’re going back to that, buy it off the shelf versus have people, you know, have staff that actually know. So this is a really good example of, yes, you can buy it off the shelf, however, you still have to have a team of people who understand even how to set it up. And so what you’re talking about, you know, with protected classes and making sure that you’re accounting for those things. And it’s not a slight against marketers, but I would guarantee that most people wouldn’t even know where to start to set that up. And I would imagine that in the next few years, the technology is going to become more sophisticated, where it literally sort of that walk through setup wizard will start to ask you those questions up front to say, Hi, I am Watson open scale, and I am going to walk through all of the different things that you need to think about with bias. Here’s a list of the protected classes. Which of these Do you care about the most? Like, I would imagine that there’s there’s going to be some little like, you know, Microsoft Clippy thing next to it, kind of walk you It looks like you’re trying to protect the class.
You know, but I it’s going to get there. But for right now, as of today.
I don’t believe that that kind of setup wizard actually. So yes, you can buy it off the shelf what these wants and products, but you still have to have people who know what they’re doing. Not just with the setup, but even what the output is supposed to look like. So you can’t skip that step.
That is 100%. Right? And the systems can’t solve everything. Right? If I feed you a bias data set, you may set up rules saying protect gender 5050. But if all the data coming in is male only, that protection won’t work. Because the data set itself is biased. And that’s one of the reasons why lineage is so important to be able to say here’s where the data came from. And here’s how we gathered it. And that, again, is something that marketers don’t spend a lot of time thinking about. If you, for example, are ingesting a lot of Pinterest data, there is a known set of biases that come with Pinterest, there’s a known set of biases that come with Tick tock, for example of with age. And so even in the off the shelf situation, there’s best practices can only get you so far you do really do need that domain expertise in that within data science, to be able to guarantee well to lessen the chances of of an issue where I think this will come into play a lot. Is this will be regulated upon marketers GDPR already does that within the within the EU, you must be able to explain how machine learning is using data. And if you can’t, you are in violation of the law. California is going that way it My guess is especially as we get to the next election cycle with all the chaos that will come through that that regulation will happen for all these technology companies. I mean, what’s what’s your perspective on it?
That’s my perspective on it.
No, I gosh, this is such a heavy topic sometimes.
I you know what? I agree with you. That’s my perspective.
I think that’s it. I think that what is a let’s table that for a future episode of the show, but to cap
the IBM road show was I think a good starter, I think there’s some things that were missed that in terms of the big talking points that we could have gotten deeper into. But certainly, you have a much more nuanced understanding of build versus by what a partner should offer their customers. And then for all of us as marketers and as business people. Lots more to think about key questions to be asking of all of our systems, particularly as AI makes its way to every product we use from Google Analytics all the way up to the most sophisticated models, anything I left out.
I guess the only other thing that I would say is if you’re looking to get started with incorporating some AI into your marketing into your company. You know, we at trust insights are well suited to help you we are data scientists on demand as needed.
As you’re building that practice, so feel free to reach out to we can help
we will help you cover your butt.
Make sure of course to subscribe to the trust insights newsletter over at trust insights.ai and of course the YouTube channel and the podcast. We’ll talk to you next time take care.
Thanks for listening to in your insights leave a comment on the accompanying blog post if you have follow up questions or email us at marketing at trust the insights.ai If you enjoyed this episode, please leave a review on your favorite podcast service like iTunes, Google podcasts, Stitcher, Spotify. And as always, if you need help with your data and analytics, visit trust insights.ai for more
You might also enjoy:
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, Data in the Headlights. Subscribe now for free; new issues every Wednesday!
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new 10-minute or less episodes every week.