So What? Marketing Analytics and Insights Live
airs every Thursday at 1 pm EST.
You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!
In this week’s episode of So What? we focus on top News and Web Stories. We walk through web content production, how to use SEO measurement tools and AI’s role in top news and web stories. Catch the replay here:
In this episode you’ll learn:
- how much content is being produced for the web
- how to use SEO tools to measure the effectiveness
- how AI tools play a role in top news and web stories
Have a question or topic you’d like to see us cover? Reach out here: https://www.trustinsights.ai/resources/so-what-the-marketing-analytics-and-insights-show/
Katie Robbert 0:27
Well, hey everyone, Happy Thursday. Welcome to so what the marketing analytics and insights live show. I’m Katie joined by Chris and John. Hey guys, how’s it going? Over here? It’s,
Christopher Penn 0:37
it’s the time and December’s flying by?
Katie Robbert 0:42
Agreed? Yeah, I think we have what like basically another full week and a half before we can expect anyone to kind of get back to us and help us be productive. And that’s about it.
Christopher Penn 0:56
If that was that,
Katie Robbert 0:58
I was actually going to start keeping a tally of how many times either I sat or someone else said, Let’s revisit that in the new year. Because I feel like that’s where we’re at now. Well, anyway, so we are going to be productive today. So on today’s show, we are covering top news and web stories. So we’re going to look at how much content is being produced for the web, how to use SEO tools to measure the effectiveness and how AI tools play a role, and talk news and web stories. And so if you weren’t aware, every year, Chris puts out the 12 days of data. And he starts on December 1, and he goes through, you know, 12 days going through various topics. And one of the topics, Chris, that you always cover is how much content is being produced in the form of top news and web stories. So where would you like to start today?
Christopher Penn 1:49
I think we should probably start off with talking a bit about the data sources, because depending on your perspective, you get very different numbers, and very different types of what constitutes news. So what constitutes news? Katie?
Katie Robbert 2:10
You know, that’s, I think that’s the question, because it’s going to be a different answer for everybody. You know, we can assume the same question. But for me, it’s information about what’s happening in the areas that I care about. And so I would love to say that I care a little bit more about what’s going on politically, I think I’m just kind of burnt out. So I don’t check those particular stories as often as maybe somebody else would. But I do tend to care more about like, what’s going on with the environment and climate change, and, you know, things related to health and seasonality. So, for me, those are news stories, it should be articles that you know, present information about current affairs, essentially. John, what’s news to you?
John Wall 2:59
Yeah, that’s a great, you know, this is one of the things we really need a journalist on here to, like, explain this. Because no matter how, as a layman, you try to, you know, explain it the journalists have a better view of this. I mean, it’s recording history, right. It’s recording what is happening around us at the time. And then the challenge, though, that you have is there’s just an infinite assortment of topics that people are interested in, right, like, I mean, there’s a whole gigantic industry of sports journalism, which I just don’t have any time for. I mean, it’s not that I don’t like it or don’t care. But I just, that’s not anything I’m looking at today. But there’s plenty over there. So, yeah, it’s the big, the thing that gets me the most is, you know, it’s generating content about what’s going on in the world. And with all the tools we have now, it just seems to be exploding, you know, at an exponential rate. And so that’s why I’m always excited to see what happens with this. Because we have had, I don’t know, over the past few years, there were a couple years where it finally did trend down where the autom automated stuff had to have the dial turned back on it a little bit. And I think it was, we had just reached such saturation, that all of the spam tools, were no longer doing anything good. But now we’re entering a next generation of stuff with AI writing much better copy. So we may be stepping on the gas again, and maybe going from absolutely horrible to even worse than that, but excited to see what the numbers show here.
Christopher Penn 4:18
So from a technological perspective, news is who your data providers say news is, which is a thoroughly unsatisfying answer. But when as a market if you’re trying to figure out like, what constitutes news, it’s who the data provider says news is, so I’ll give you a couple of examples. We’re gonna go first, to Google’s G delt database. So for those who are unfamiliar, G doubt is an open source project, go to gmail.org. This is a global database of news powered by Google jigsaw, the contents of which are a substantial portion of what is in Google News, right? So the Google the GDL database is gigantic, huge database ingesting news in real time from all over the world. And as with so many different news sources, it’s based mostly on URLs, like these, you know, for example, cnn.com be a newsworthy domain, some dudes blogspot.com blog, not a news domain. And so, from that perspective, news is essentially who Google says News is. And this varies from provider to provider. So let me show you another example from the H refs software. H refs is SEO software, I put in a search for stopwords AI, basic stopwords, a or Android, because that’s the easiest way to scoop up essentially the entire database. In 2022, according to H refs, there have been 489,280 9 million pages published this year, that are in some way notable, but they added a button this year, saying, you know, let’s focus on news. And now you this this Slim’s down to about 65 million pages. So for them, they thinks this about 65 million pages of news in this year, if we go to the GDL database, we run the same query, I want to count the number of URLs, where the publication date was this year, you get about 53. Point 6 million URLs, which is about is what about 16% difference between the two. And when you quickly inspect the data to see like what’s in there, the date of the various in the URL is vary. But you do see, I would say more. More traditional accredited news sources, I think would be the the best way to put it in the G dot database, whereas the H refs database tends to be a little bit more, a little bit less restricted. Not, not entirely, but you see some things here, like, you know, Microsoft documents for the Microsoft documentation website, I would not personally call that a new source. Right, even though that is technically new is like this is this is what’s changed. So that’s the first and probably the most difficult place to start is what constitutes news. For the 12 days of data. When we look at what constitutes news, we used to use the atrust database. And then two years, three years ago, we switched over to the GDL database as the source of news like things, partly because it scales better. And it’s easy to work with. And partly because they do have a lot of interesting features in there that you don’t get out of an SEO tool.
Katie Robbert 7:57
Christopher Penn 7:59
there’s a system of encoding news events, called cameo. Cameo stands for conflict and mediation event observations created by the by Penn State University. This is a way for you for news organizations to encode what kinds of news stories there are, right. So there’s all sorts of things meetings and armed invasions, you name it. And this encoding combined with Google’s assessment of the news, they can they rate news in the genome database based on what’s called the gold steam scale, the gold scene scale is a a number between minus 10 and positive 10. Of how impactful a piece of news is, minus 10 means negative global consequences. Right? Example Russia invades Ukraine, right? That’s a pretty substantial negative impact news, some dope parks a ship in the Suez Canal the wrong way in your blocks international trade for 14 days, that event will be at minus 10. On the gold student sale on a positive 10 scale would be the rollout of mRNA vaccine for COVID. Right? It’s going to have strong positive effect. And then zero in the middle of the scale means has no impact, right? We put out a press release announcing that Chris cleaned his desk, that’s a zero. That’s a 10.
Katie Robbert 9:26
If you’ve if you’ve ever seen Chris’s desk and it gets cleaned and perhaps disinfected, that’s a 10. That’s a anyway, who all right, so let’s say I’m following these codes, who is deciding what’s a negative pen and what’s a positive 10 Google
Unknown Speaker 9:47
and we don’t know how,
Katie Robbert 9:48
and that I take issue with because, you know, as we’ve discussed in previous episodes, if it’s, you know, if they’re using some sort of automation or AI to code based on the words that are right In or, you know, databases that have been trained and those kinds of things, we know that there’s bias introduced into those things. And so, you know, the way that I feel about, you know, as a terrible example, terrible, terrible example, the way that I feel about a Russian invasion might be on a different, like, I might see it as a negative two, whereas John might see it as like a negative four, and Chris sees it as a negative 10. And yet, somehow, all of that is supposed to be, you know, telling us what is good and what’s bad, like I take issue with that.
Christopher Penn 10:38
And therein lies the challenge, which is probably a separate show entirely. So yeah, about how AI is an intermediary between ourselves and reality of recommendation engines are literally telling us what to see and read and watch and listen to. And it may not necessarily be conscious decisions on our part anymore. So these codes are all inside this the GDL database. And, among other things, let’s talk about some of the the data you get in here. Because I think there’s is that is always useful. All these tables provide the Oh, lovely schemas that explain what’s involved. You get the date the news was published, of course. And generally speaking, in the encoding of the data, there are actors. So you have ACTOR ONE and all the the encoding about actor two and all the encoding about it. So for example, when Russia invades Ukraine, Russia will be actor, one as a nation state and actor two would be Ukraine. And you will see all that data in there you have the event code, based on those cameo lists of what kind of event is it? And then you’d have things at the Goldstein scale, how important is it? How many times within the first 15 minutes of that event being occurring? Was it mentioned how many sources mentioned how many articles mentioned? What was the sentiment, the tone of that, I think more data about the the the actors in the geographies. And all this is put in these lovely tables. Now, here’s the part where I think, for anyone who’s an aspiring data scientist, or for anyone who’s just curious and likes to poke around, one of the wonderful things about the GDL system is that is completely and totally free of cost. Anyone as long as you can use Google console, Google Cloud, and you know how to use BigQuery, can go in, work with the data in the database, or export it and put it into your own software, totally free of cost paid for by Google, and the G dub Foundation, and is one of like 300 Free datasets that Google offers, which I think is pretty cool. So if you compare that to say SEO, software, SEO software, not free, and there are some pretty strict quotas, pretty strict limits on how much data you can export from the system, depending on the plan that you’re paying for. In Atrus, I think you have to have like the pro premium, Platinum altro plan to download your millions of records. Otherwise, you’re you’re limited to, I think, two or 50,000 records per month, which when you’re dealing with 10s of millions of news stories a year, you run out of runway real fast. That was one of the other reasons we did the GDL database when we’re doing our 12 days of data was except for the processing cost. We if you make a copy of it, you can work with this data for free.
Katie Robbert 13:35
Gotcha. I mean, that is interesting, because you don’t often see datasets that big, just like and here we are. With documentation, by the way,
Christopher Penn 13:47
with documentation schemas, and it’s in real time, I think there’s a 15 minute lag between when a piece of news appears on the wire. And when it shows up in the GDL database. We have actually spoken to clients in the past, recommending that in addition to the regular media monitoring, they may want to search for URLs, male mentioning brand names, it might not be the worst idea in the world to have that, you know, a simple monitor set up in there. So that’s the data sources where we get this information. The next part is what do you do with it? Typically, you It depends on your level of skill. There is no friendly interface to the software at all. It’s just a sequel tape database. The GDL project does offer some basic UI stuff if you want to try and mess around on their site, but it does not scale particularly well. They mostly recommend how you should probably go and write some code, which is not super helpful. So if you will have skills with with SQL, the SQL query language, you can just edit straight in the database up that’s wrong to query. You can just do your edits. Treating the database like when I was counting the number of news stories, I said, Show me the number of URLs in the database for this year, very straightforward SQL queries. And then it’s up to you then to decide how you want to work with that data and process it. We do it with, unsurprisingly, the our programming language. So we take, in this case, we take the data out of the GDL database, which takes a lot of effort. And then we format it, we process it and format it into essentially, some reasonably nice looking charts and graphs, which is substantially easier. So for example, this year, we took the Goldstein codes, the ones that were marked either minus 10, or plus plus 10, and said, what categories were they and the use of conventional military force and fighting with small arms and light weapons, and those were predominantly around two big events. One was the invasion of Ukraine, and the other was the ongoing genocide in Ethiopia. And between those two, that comprise the majority of the serious events this year, this thing’s either minus 10 or plus 10. And you see a few other things. When we look at the general events, you get a lot more variety, right, you know, made a statement made an appeal to requests, made a visit phrased or nor is hosted as a consultant, express intent to cooperate had a meeting, right? These, these codes are a lot more varied. But they allow us to see more of what the news was that even still, there’s still a lot of armed conflict and things 2022 was a very violent year.
Katie Robbert 16:45
So for the purposes of this exercise, what you are trying to establish is how much more news was published than previous years, if I’m not trying to establish that, what are some other use cases for this kind of data?
Christopher Penn 17:08
It depends on on what the purpose would be. One thing that would, that we’ve used this for, and this is actually subject to the next two days worth of 12 days of data is because the source URLs are built into the G delt system, you can actually narrow down specific kinds of news, one of the ones that is, you know, my personal favorite is identifying press releases, right press releases, from like business, wire, and so on and so forth. We can extract and count the number of of those things per domain. So, this year in 2022, as of last night, we’ve done 297,000 press releases this year, which is down a little from 2021, still up from 2020. And, you know, we’re only what eBay is into December. So we still got some, we still got time for marketers and communicators to issue a couple 10s of 1000s, more press release,
Katie Robbert 18:14
things that they’re pleased to announce.
Christopher Penn 18:15
exactly these are pleased to announce. And so if there’s specific types of news that you’re looking for specific publications, and you want to dig in and see just what’s the volume of that publication, this database is, if it’s in there is very helpful. So I could say, let’s see, and source URL, like cnn.com. And let’s go ahead and run this.
Katie Robbert 18:43
Now, I know that you said that GE Dell scales better for what you’re looking for. But if I was just looking purely to count the number of press releases, or something that contains a WordPress relief, could I go back to my SEO tool to do that kind of work?
Christopher Penn 19:04
It depends on how good the tool is. A lot of the SEO tools do very basic filtering. And they don’t allow you to do very elaborate queries.
Katie Robbert 19:16
But what if it’s not an elaborate query? What if I just want to literally find how many press releases like press release?
Christopher Penn 19:24
That’s an elaborate query. I’ll show you why. Yes, this is, this is what the query looks like to identify Presley’s seats. It selects your count from the table. And then these are all the different URL conditions you’d want identify domains and then certain types of words like you know, for immediate release, and so on and so forth within the URL so that you can export them out. And this is not this is something that your an SEO tool is going to really struggle with, because most tools don’t focus on press releases.
Katie Robbert 19:59
They’re out Technically a piece of content though,
Christopher Penn 20:02
exactly, they’re out there. But this, there’s a lot of different news wire services, one of the things we have to do every year and we do this stuff is check, do a quick bit of Googling, have there been any new press release companies that have come onto the scene this year, and then we add them to our queries?
John Wall 20:24
You know, it’s just amazing to me that with this whole database, you could easily do a front end of completely customized news, right? You could say like, here’s the 35 topics I want, and then filter out everything, you know, below a seven in either direction. And you could have completely tailored to your news, but because we’re clickbait driven, like nobody cares.
Christopher Penn 20:44
That’s true. I mean, and again, there is still the limiting factor of what constitutes news, right? Your site may not be in here, even if you may have a very trustworthy publication that is not in here or is very difficult to identify. For example, I read Dr. Jeremy Faust, inside medicine bulletin. He is a Harvard trained physician. He is one of the top experts in infectious disease. And he writes on substack. Let’s see if substack is in here.
Katie Robbert 21:22
Well, I mean, and this goes back to the question of what is news? Like so for you, that is a source of news for you. You know, I could start publishing to my website every day, like, How many cookies I did or didn’t eat? That’s news, technically. But I don’t think it needs to be in there.
Christopher Penn 21:44
Yep. And there’s very few limited, there’s very few pages from the substack domain in here. So I’m pretty sure his is not in there. Given that there are 1000s of substack newsletters, including my own, I have my my lunchtime pandemic newsletter is in there. That is a roundup of news. AI is a credible news source. I mean, no, I don’t think so. And I say that because I’m a marketing guy and a data analytics guy putting together a COVID newsletter, I am in no way shape, or form a qualified healthcare practitioner, you should not be taking medical advice from me for any reason. But I tried to share stuff that you could go ask a qualified health care practitioner, hey, is this a good idea for me to do? I mean, beyond the obvious, like, Hey, you should probably wear a mask because filtration of air is important. But in this database, you know, when we looked at when we did the CNN one, the CNN one had 117,000 pages in here, right? substack has 620? Are those both valid news sources? Eight depends. And then therein lies the challenge. So from a a marketing perspective, you can zoom in on specific data sources. And you can get a sense of what what data is available about those news sources to help you gauge sort of its importance and how a system like Google sees it, which again, is one of those things that you don’t really get the inside scoop from the front end of Google News, you don’t have any indication, let’s take a look at let’s do select star from a that’s that’s substack. One.
Katie Robbert 23:33
I would imagine another use case for this kind of query would be research in a way. So for example, if you are trying to write a press release or trying to write an article about something, theoretically, you want to know how many times this topic has already been covered? And what’s been said about it so that you’re not just regurgitating the same information, like, could you use this to try to figure out a different angle on something that’s been covered to death?
Christopher Penn 24:07
Possibly, but what the one thing that G dot database does not include is the actual content, you get the URLs, but you don’t get the text of the content itself, at least not in the events table, it might actually be in a different table, in which case, you’d have to do a join on it and extract that out. Let’s see, we got here we have from substack. We have only a couple of newsletters are in there. But the fields that we were looking for are the number of mentions the number of sources and the number of articles. So as we talked about the beginning number of mentions is the number of times this article this URL was referenced within the first 15 minutes of its publication. The number of sources is places that cited and the number of articles the number of articles that this article appeared in within the first 15 minutes of publication. So from a a an impact perspective Again, you can see, for a specific publication, how Google kind of sees the weight of that news.
Katie Robbert 25:10
So in terms of if we go back to our original topic, which is top news and web stories, what metrics would I be looking at to determine this is a top news story, like the number of times it was shared, the number of times it appeared? You know, what are the metrics that because there’s a lot of data here? Where do I want a lot of data here? Where do I want to specifically focus?
Christopher Penn 25:36
The two I would pick would be the number of mentions that a piece of news gets, I think, is probably your first piece. That’s important. And the second is the Goldstein scale, which is how impactful does a system like Google think this news is, right? Because you can get a lot of mentions on a story that might not necessarily be super impactful to the world that might be consequential to, for example, Taylor, Swift’s tickets were selling out right? It was very difficult to get Taylor Swift tickets, there was a lot of mentions of it. But in the grand scheme of things, and please don’t kill me. In the grand scheme of things, I don’t know that that is World impacting news a way that’s going to think it’s going to have massive consequences on society.
Katie Robbert 26:23
I would, I would agree with you, I’ll support you on that one, Chris.
Christopher Penn 26:27
So having those two numbers, your number mentioned in your Goldstein scale, I think are good starting benchmarks. The other thing that you might want to do is bring data from G delt into a custom system of some kind, so that you can actually read the article text and process it, right. So you could get some data on that. And then if you really want to get ambitious, if you have if you have the bandwidth and the budget to do so, if you’re also getting data out of your SEO tools, you can cross reference them and see the big stories from one source, the big source from another source, the SEO tool is going to have different metrics, right? Your SEO tool is going to have things like page traffic and traffic value and domain authority that you can cross reference your news with and say, okay, are these two things the same? Are these two things equally relevant?
Katie Robbert 27:18
I would also so as you’re talking about this, I could imagine, you know, a PR firm who puts out, you know, news and articles all the time on behalf of their clients, using methodology like this, to see where the stuff that they put out stands against everything else, like, How good did we do this year, of making sure that the information we’re publishing is newsworthy, is relevant. And I feel like that if I were, you know, in charge of an agency like that, I would want to know that information, like, I’m paying all of this money to get this news out there. But is it just getting buried in the noise compared to everything else? And obviously, you’d have to put it in context of similar topics. Like, if you’re putting out that, you know, Trust Insights, you know, Katie and Chris swapped their roles against, you know, the war in Ukraine. Yeah, I would expect that our news probably wouldn’t be that important. But if you look at it in the scheme of, you know, tech startups, for example, maybe I would want to know that it’s a little bit more newsworthy.
Christopher Penn 28:28
Exactly. News, like politics is all local. And you can see here just from like some of the top performing news stories in the hrs database, a lot of these are not news that would necessarily relevant us like the winning lottery numbers in a certain part of India, right as a big story of 5.2 million visits to that page. Is that a story that, you know, is that piece of news that would be impactful to what we do? Probably not. So that’s the other consideration with news and and to your point K, you might want to be a little more focused about which news you pay attention to. One of the things that, again, you can do with either of these tools, it’s easy to do with Jido that we saw first popularized by our friend Justin Lavie, back when he was working at Citrix was paying attention to share voice, which generally we hate as a measure, because it’s a stupid measure, in aggregate, but within the innovation he put together that I thought was a brilliant twist was paying attention to your share of voice within a specific set of publications, right? If you are in an industry, and you know, there’s 10, newspapers and magazines that cover, you know, virtualization software, we’re 10 bloggers or something like that. Using these tools, you could then
Katie Robbert 29:44
Junko, he’ll be back.
Christopher Penn 29:47
You could use these tools to say okay, I want to just focus on those 10 publications, and then find you know, how much presence do we have within that subset? Again, these tools apply Some of the data, you know, your custom tooling to extract the data would be a very useful way of applying this technology,
Katie Robbert 30:07
which is essentially what I was getting at. So I mean, I think that you know, Justin smarter than me. So that makes a lot of sense.
Christopher Penn 30:14
The other thing that and this kind of goes back to where we were talking about the beginning of the show is a lot of these publications are pretty well vetted. And so should we expect to see a massive explosion of content from AI? Not necessarily, I mean, some publications, probably, we’ll be experimenting more with more first drafts and more trivial content generated by machine. But for the most part, a lot of the content explosion is happening. And on non new sites, right, you know, if we put up more blog posts, some machine written blog posts and things on our site, we’re not an accredited news source. So that’s probably not gonna be impactful to the amount of news detected here, it will show up in the SEO tools, right, particularly for just web content, in general, you’re going to see a lot more content in the SEO tools, because there’s so much more publishing. One thing that you may want to think about as an organization is, are you have you tried and and we’d say long, slow process, but have you tried applying for to become a registered new source? If in fact, your your blog or your website, whatever it is, in fact, news? You would do this? Interestingly enough, you’d normally do in Google Search Console. Search Console is where you would identify a news source to begin with. And then there is an application process for you to be able to go in and say like, yes, here’s, here’s how the news works.
Katie Robbert 31:54
So to go back to the AI piece, John, you inserted to bring up a little bit about, you know, how AI is going to impact the amount of content being created, have you been so in the past week, so as we’re recording this, it’s mid December 2022. In the past week, there have been a lot of AI tools front and center, they’re doing a really good job of writing content, based on prompts. Have you played with any of this challenge? Do you think that it will change how you are creating some of your content?
John Wall 32:25
No, I haven’t done dug into any of this, really, because it’s, I you know, it seems like it’s great for the kind of stuff you do at the corporate level, you know, if you basically need I need 35 101 articles on a specific topic, or a specific area is great for generating, you know, a new take on stuff that already exists. And it’ll be interesting, you know, there’s definitely a dividing line between it’s very easy to do all of that ongoing content stuff, versus news. You know, and I know, there’s already some news generation AI stuff for, like, for sporting events, you know, there’s whole leagues of baseball, where there’s no human that does the reporting, you know, they just feed the box scores into the machine, and it spits back a story based on what it can do. So it’s gonna be interesting to see how fast this stuff can scale to a more news worthy approach. And how does that get managed? You know, I would love to think that there’s some real time editing going on, and at least, you know, some kind of proofing. But we also know that, you know, as we saw during the blog explosion, that there’s plenty of people that are just going to post, you know, and look and review later.
Katie Robbert 33:36
The sports, the sports piece of it is interesting, because you’re right, it’s here’s the scores, here’s what happened. And then the AI takes it and writes up whatever the news story is. So, you know, to your point, John, it sounds like AI is starting to play a larger role in news, just that particular segment of news, maybe not political news or other news. I mean, who knows? It might be so this is where I would turn to Chris to say, what don’t John and I know about where AI is in writing the news.
Christopher Penn 34:11
People have not seen really what the the newest language models are capable of and what the next generation ones will be capable of. They are so much better now than they even were six months ago in terms of their capabilities. For example, this is the DaVinci DaVinci, three models part of GPT-3 which is OpenAI as product. And I put together this prompt, this is a very complex prompt, write a press release about us, right and these facts here. Let’s put this one more thing. The Trust Insights URL is Trust Insights got a I don’t say I want you to write me a press release about this thing. This is, again, it’s a very detailed prompt. There’s a A lot of extra stuff in here that you don’t normally think of. And what you get is a decently written press release. Now there’s some things here that are factually incorrect, right? There’s the Trust Insights CEO is That’s incorrect.
Katie Robbert 35:14
So I was gonna say, Who the heck is John Smith? And when was I getting replaced?
John Wall 35:17
Christopher Penn 35:22
And so let’s take that out. But we can see like this prompt is essentially, well, it’s kind of like factual coding.
Katie Robbert 35:30
But I would you know, in it’s funny, as you’re doing this prompt, you’re taking the time to write all these facts, wouldn’t it just be the same as you actually just writing the press release, because you just listed everything that the AI is going to regurgitate back to you?
Christopher Penn 35:46
It is, but this is a lot easier, because a good chunk of it is going to be templated in ways that are going to be incorporated uniquely within the output that you’re not, wouldn’t just get from writing it yourself. And you actually could write it yourself. But think about this by valuing a few of the facts, and then feeding this in not just through the web interface, but through the actual API, you could generate 1000 of these 2000 of these in about the same amount of time.
Katie Robbert 36:11
And I’m sure that this is probably a conversation to have later. But we have a new product. Not did you make that up? For the sake of this, this? This is totally made up? Okay, that was like, wait a minute, what don’t I know, not only am I being replaced, and we have new products,
Christopher Penn 36:28
this is this is a a bit of tongue in cheek fun, artificial intelligence and machine based natural intelligence of humans. So your brain has 330 trillion neurons, right? So that is the largest neural network platform ever deployed, which is true, technically true. And, and now, it’s gotten those facts Correct. Right. Now we’ve, we’ve used this, we were doing this in analytics for marketers writing song lyrics and poems about Google Analytics, 4 writing or rap lyrics. And the model, the underlying model is extremely powerful and very, very flexible, in ways that people do not fully understand the capabilities of these tools, and this is a generation sort of 3.5 of this particular model, while the generation four model will be out sometime next year. And this is untrained. Right? So this is this is the big generic model. There are ways for organizations to fine tune this to say I want to feed in all of our existing blog content, or I could feed in all the transcripts from marketing over coffee. And that would add weight to have the output sound more like the kinds of things that you can’t, you’ve already generated, making it very difficult to distinguish. So from a news perspective, if you wanted to capture the tone of CNN to make your own new site, you could absolutely fine tune a model like this and say, Okay, I want to train on the way CNN publishes stuff and put it in here or a Scientific American, or Fox News or whatever the news source you want. from GE delt genome database, what we saw you with some code, you could extract that data, and then use that to fine tune these models.
Katie Robbert 38:24
All right, so John, let’s let’s start placing bets on how long before news is 100% AI written and people aren’t needed anymore? Rather, how long before John decides to upload all of the old marketing over coffee transcripts? And have the AI pretend to be John and Chris for a few episodes?
John Wall 38:48
Yeah, you know, it’s the thing with all this is, it’s always built on existing stuff. You know, I mean, there’s no exploration of the frontier. Yeah, I think right? It’s amazing for is yeah, here in code, being able to validate code and fix bugs, you know, as be able to do pair programming without a second person, or in research, you know, if there’s just research where you can determine how to do an experiment to be able to have a machine run 50 billion different, you know, variations on a single thing. That’s amazing there. But as far as Yeah, I don’t see it, doing a lot of creative writing on current topics, that that’s going to be a challenge. I mean, you can’t see how it would work here, though. If you basically just put the key story points in there, you would be able to bang out an article plus I think the other another angle of that is to be able to bang it out in 25 languages all at the same time. You know, that’s now you’re talking about something pretty interesting, pretty quick.
Christopher Penn 39:49
And what’s interesting about this is, and this is going to is already a problem. We’re already seeing this happening online, but we’re seeing it happen with very unser. investigative tools is, as the language models evolve, you can do a lot of things with them that are ethically questionable, or just outright illegal. So let’s take a story like this from CNN. Right, let’s take this year, a copy of that
a professional tone, a reading level of grade 12. and correct any grammar and spelling issues. And so you can take something that it perhaps was written for a sixth grade audience and upscale it, you can, you can change the language, you can move things up. And now from if you were to, if we put these two pieces of text side by side
or here’s a rewritten text, and here’s the original text. These are different, these are structurally different enough that a search engine is going to see. This is separate content, different content, right? It is not grammatically it isn’t. It’s not a copy paste. The plagiarism we’ve seen online so far has been really rudimentary, like swapping out one adjective for another really clumsy. And you can tell very easily by reading it. Okay, I know exactly which blog this was cribbed from like, when the focus over content marketing is to put out a new blog posts instantly in our social media mentions, media monitoring tools, we get notifications of all the copycats that are taking that content, either republishing it as is, or making really, really awkward adjustments. This is going to change that game. And this is going to change the game to the point where it’s now. It’s now unique enough that it will pass muster.
Katie Robbert 42:17
So to the question of top news and web stories, if I were running CNN, and I said, Okay, I want six different versions of this story for different audiences. This would be a good use case for using something like OpenAI, I write the initial story with all the facts in it. And then I can put in this prompt to say rewrite it for a reading level of grade three, grade six, grade 12, so on so forth.
Christopher Penn 42:48
Exactly right. And that helps me scale,
Katie Robbert 42:51
the amount of content that I’m then creating and putting out there. And because I’m a trusted news source, then I’m dominating the field in terms of the content that’s going out.
Christopher Penn 43:04
Look what happened when I said make IT grade three, with a more casual tone, mortgage rates have gone down again, this week, it’s the fourth week in a row that rates have dropped, right? It’s very, it is textually, a very different tone, different content. So to your point, Katie, you can absolutely re spin your existing content programmatically into very, very different formats, into very different ways of sound of the sound that is still preserving the factual data, right, this, this article, these rewrites, all are still preserving the data correctly, but they’re creating different variations. So yes, as a, as a marketer, as a content creator, these, this offers you an incredible amount of potential as an intellectual property defendant. This is kind of a nightmare scenario, because the rewritten stuff, you can plausibly say if it’s based on facts and facts cannot be copyrighted. That Yeah, somebody crypt your work respondent into something brand new, and it may do better than your content.
Katie Robbert 44:12
Well, on that happy note,
John Wall 44:16
I’ve seen more than one report that this is the end of the college and high school essays like it’s game over with this, it’s, oh, just write
Christopher Penn 44:22
it. I was in one of my Discord servers the other day and a friend was saying I’m really stuck trying to write a paper about it was it write a five paragraph analysis sis of the artistic techniques of school of Athens by Raphael, focus on the techniques and the cultural context? And they were like, I’m not really sure what to read as much to focus on this sort of that, and I said here, I copy paste out of OpenAI. So that’s, that’s the starting point for your paper. There have been some interesting conversations, particularly on Reddit, of students whose grades are fantastic now because they have a I just generate their papers for them. And the teachers get the stamp approval these past plagiarism checks because they original text. And so yes, this is 100%, the end of asset, your college essays, which raises the very valid question, what’s the point? Right, what is the point of having a student write an essay when a machine can read it better?
Katie Robbert 45:41
Well, I think that is a topic for another show. And something for us to ponder.
Christopher Penn 45:47
It is. But I want you to think about that. What is the point of your content marketing? Right? What if a machine can generate better content than you then what do you need to do to rise above what the machines are capable of? We’ve been saying now, and you can go on to the Trust Insights website or our YouTube channel for the last five years, that the bar of competency keeps going up. Five years ago, when we started Trust Insights, machine generated language was pretty awful was word salad, right? And so even the drunk intern would can do what was going to do a better job three years ago, have a drunken turn. For the record, we don’t even have a sober intern. Three years ago, the GPT-2 series came out and some of the machines could do okay, not great. GPT-2 came out, you know, two years ago, now that machine is starting to write reasonably well. And now with GPT-3, and 3.5, the machines are writing well, she’s writing very well. And so the challenge for us as marketers, and as creators is, what do we need to do to uplevel our skills to stay ahead of the machines? Because as of right now, like with a college student writing a paper, the machine is going to do a better job, period?
Katie Robbert 47:01
Well, I think that, you know, you’ve brought brought up a couple of things that we can cover on other podcasts and shows including, you know, recommendation engines, you know, AI powered recommendation engines, and what is the point of content marketing, if AI can write it better, I think we can absolutely dig more into those topics at another time. So for today, in terms of your top news of webstore, is it sounds like AI is definitely going to play a part in it. But if you’re looking to just do general research, your SEO tools are pretty good. Something like a G delt database is better. But know that it doesn’t have a user friendly interface. So be prepared to at least start doing some research on basic SQL commands. Those aren’t very hard to put together. And they’re pretty straightforward if you know, what you’re looking for which, you know, what’s the question you’re trying to answer before you start all of this? So Chris, final thoughts, John, final thoughts?
John Wall 48:00
Oh, yeah, Cameo rates, the articles, but only Google can say word up.
Katie Robbert 48:10
Oh, you gotta top that, Chris.
Christopher Penn 48:12
I’m not even try it. Thanks, folks, for hours. Until next week. Thanks, folks. Bye.
Unknown Speaker 48:24
Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources. And to learn more, check out the Trust Insights podcast at trust insights.ai/t AI podcast, and a weekly email newsletter at trust insights.ai/newsletter Got questions about what you saw in today’s episode. Join our free analytics for markers slack group at trust insights.ai/analytics for marketers, see you next time.
Need help with your marketing data and analytics?
You might also enjoy:
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new 10-minute or less episodes every week.