Skip to main content
University of Wisconsin–Madison

Tag: copyright

Ethics of robot journalism: How Automated Insights poses issues for data collection and writing

Algorithm journalism is now available for everyone.

A beta Wordsmith, a program that creates journalism from data, was made available on the parent company’s website Tuesday morning, the company announced.

But, one of the world’s largest news organizations already uses the software to automatically generate some stories – and its standards editor said the ethics of the software has to be carefully considered.

“We want to make sure that we’re doing everything the way we should,” Associated Press standards editor Tom Kent said. “We take our ethics very seriously.”

During the past year, the AP multiplied its publication of earnings reports tenfold. The number of earning reports on a quarterly from the AP increased from 300 stories to nearly 3,000.

The mass quantity of stories that the software can produce has led to the moniker “robot journalism,” but Kent said questions have turned from the capacity of robot journalism to the ethics of this new production tool.

“Robot journalism” is less a C-3PO-like character typing the news, and more automation of data collection using algorithms to produce text news for publication. The process is forcing journalists like Kent to address best practices for gathering data, incorporating news judgment in algorithms and communicating these new efforts to audiences. The AP announced its partnership with Automated Insights, a Durham, North Carolina-based software company that uses algorithms to generate short stories, in June 2014.

Collecting Data

Accuracy is just one concern when it comes to algorithmically gathering data.

According to a January press release from Automated Insights, automated stories contain “far fewer errors than their manual counterparts” because such programs use algorithms to comb data feeds for facts and key trends while combining them with historical data and other contextual information to form narrative sentences.

New York Magazine writer Kevin Roose hypothesized similarly saying, “The information in our stories will be more accurate, since it will come directly from data feeds and not from human copying and pasting, and we’ll have to issue fewer corrections for messing things up.”

When errors do surface, the process for editing is more or less the same as for human writers: review, revise and repeat. Once editors catch algorithmic mishaps, developers make the changes to the code to ensure it doesn’t happen again.

“We spent close to a year putting stories through algorithms and seeing how they came out and making adjustments,” Kent said.

He also adds that a less obvious concern may have less to do with what data is collected and more to do with how it is collected.

“Data itself may be copyrighted,” Kent said. “Just because the information you scrape off the Internet maybe accurate, it doesn’t necessarily mean that you have the right to integrate it into the automated stories that you’re a creating – at least without credit and permission.”

Synthesizing and Structuring Data

Besides collecting data, the way in which algorithms organize information requires additional ethical consideration.

“To make the article sound natural, [the algorithm] has to know the lingo,” BBC reporter Stephen Beckett wrote in a September article. “Each type of story, from finance to sport, has its own vocabulary and style.”

Stylistic techniques like lingo must be programmed into the algorithm at human discretion. For this reason, Kent suggests that robot reporting is not necessarily more objective than content produced by humans and is subject to the same objectivity considerations.

“I think the most pressing ethical concern is teaching algorithms how to assess data and how to organize it for the human eye and the human mind,” Kent said. “If you’re creating a series of financial reports, you might program the algorithm to lead with earnings per share. You might program it to lead with total sales or lead with net income. But all of those decisions are subject – as any journalistic decision is – to criticism.”

Since news judgement and organization are ethical questions that carry over from traditional reporting to robot journalism, Kent’s suggestion is to combat them in the same way.

“Everyone has a different idea about what fair reporting is,” he said. “The important thing is that you devote to your news decisions on automated news the same amount of effort you devote to your ethics and objectivity decisions at any other kind of news.”

Sharing Data

Finally, journalists must make decisions about the extent to which they engage audience members in the process of robot journalism.

University of Wisconsin-Madison journalism Prof. Lucas Graves, whose research focuses on new organizations and practices in the emerging news ecosystem, says disclosure is a must.

“I absolutely think outlets should be disclosing the use of algorithms,” he said. “If they aren’t, then they need to be asking themselves why.”

Kent echoes similar sentiments and goes even further in his Medium piece suggesting that outlets provide a link identifying the source of the data, the company that provides the automation and explain how the process works.

As algorithms develop and become more complicated into the future, Kent also advises that journalists remain diligent in both understanding how algorithms work, as well as how they interact with journalism ethics.

“As complicated as the algorithm gets, you have to document it carefully so that you always understand why it did what it did,” he said. “Just as any journalist who covers a particularly area should periodically reevaluate how she writes about the topic, we should always reevaluate algorithms. There is nothing about automated journalism that makes it less important to pay attention to ethics and fairness.”

Getty Images changes the game by allowing free non-commercial use of some 35 million images

Reporting for the British Journal of Photography, Olivier Laurent details how Getty Images is introducing a game-changing policy that allows free non-commercial use for a large portion of its digital photography library:

The controversial move is set to draw professional photographers’ ire at a time when the stock photography market is marred by low prices and under attack from new mobile photography players. Yet, Getty Images defends the move, arguing that it’s not strong enough to control how the Internet has developed and, with it, users’ online behaviours.

“We’re really starting to see the extent of online infringement,” says Craig Peters, senior vice president of business development, content and marketing at Getty Images. “In essence, everybody today is a publisher thanks to social media and self-publishing platforms. And it’s incredibly easy to find content online and simply right-click to utilise it.”

: : :

To solve this problem, Getty Images has chosen an unconventional strategy. “We’re launching the ability to embed our images freely for non-commercial use online,” Peters explains. In essence, anyone will be able to visit Getty Images’ library of content, select an image and copy an embed HTML code to use that image on their own websites. Getty Images will serve the image in a embedded player – very much like YouTube currently does with its videos – which will include the full copyright information and a link back to the image’s dedicated licensing page on the Getty Images website.

BJP offers this editorial note at the end of the article:  BJP will analyse the full impact of today’s news in a series of articles to be published later today and this week. Stay tuned.

Read the entire article here.

Updated 3.5.2014 10:30 CST:

There’s a lively discussion going on regarding the Getty Images policy change under a post by  Russell Brandom at The Verge.  Read it here.