Algorithms – Digital Humanities Portfolio

Introduction

Recent developments in machine learning are likely to have lasting impacts on the organization, productivity, and wealth distribution of the economy. Tasks associated with human intelligence are already being automated by machine learning algorithms that can make predictions and judgements. Despite hype surrounding machine learning, there are many unanswered questions about what the effects will be. One domain that is both tethered to the economics of information and is full of uncertainty is that of intellectual property. Concerns over machine learning tie together questions regarding fair use, copyright protection algorithms, community-driven knowledge production, patents, networked supply chains, and market transparency.

Policy makers and information specialists should be aware that in each of these fields there are mounting adverse effects for consumers. The ways in which machine learning transforms the relationships between data is at odds with traditional conceptions of transformative fair use. Machine learning helps copyright holders infringe upon the legitimate fair use of intellectual property because existing copyright law penalizes human oversight while giving automated systems a free pass. Google’s position with respect to machine learning doubly benefits from patent law. It uses open-sourced forms of knowledge production to capture insights from un-patentable fundamental research while patenting specific applications of machine learning. Lastly the penetration of proprietary machine learning algorithms into the networked supply chain may lead to market failures despite satisfying neoclassical assumptions of rationality and information transparency.

What is Machine Learning?

Machine Learning algorithms encompass a growing set of computational statistical approaches to analyzing data for the purposes of automated categorization, pattern finding and decision making. Because machine learning technology has the ability to mimic some of the faculties normally associated with intelligent behavior, proponents believe that machine learning will have a lasting impact on the economy (NSTC, 2016). As a computational statistical method for discovering patterns in information, the most basic machine learning technique is linear regression. Machine learning researchers, however, are concerned with more complex approaches such support vector machines and neural networks which have wide ranging applications from spam filtering to self-driving cars. Already, machine intelligence is penetrating many steps of the value chain in several industries up and down the technology stack (Zilis & Cham, 2016).

Commonly cited reasons for machine learning’s booming popularity are its disciplinary maturity, the abundance of data and advances in computational power. Machine learning as a field that unites statistics and computer science has a long history, but in the last few years there have been several break throughs in technical knowledge, especially in the field neural networks (NSTC 2016, 9-10). With improvements to both hardware and software, computers have become thousands of times faster than they were in the 1980s (Ford 2016, 71). Alongside advances in research, the magnitude of available data has increased (Ford 2015, 89). This story might make it seem as though the rise of machine learning was natural or even inevitable. What it neglects to mention are the economic motivations behind machine learning.

Machine Learning promises to impact the economy pervasively and productively. Machine intelligence already appears in several sectors such as transportation, finance, healthcare, education, agriculture, gaming, and energy (Zilis & Cham, 2016). One of the first uses of machine learning was to automate check processing (Nielsen 2015). Kroger has fully automated its warehousing except for the unloading and loading of trucks (Ford 2015, 17). Facebook uses a smart agent called Cyborg to monitor tens of thousands of servers for errors and solves them (106). Smart traffic systems can reduce “wait times, energy use, and emissions by as much as 25 percent…” (NSTC 2016). Unlike traditional software, machine learning will leave a unique mark on businesses because it can be used to automate more jobs associated with the service and information sectors.

Despite the hype, there is still a lot of uncertainty about how machine learning will affect the economy. Concerns about labor, security and the distribution of wealth are forefront (see NSTC 2016 & Ford 2015). Intellectual property offers a crucial lens to investigate the nuanced ways in which machine learning will affect the information economy. Creative uses of machine learning in art and design bring up unanswered questions about the relationship between machine learning and attempts to control information as a private, excludable good.

Electric Sheep: Fair Use in the Age of Automated Creativity

As more artists are turning to machine learning to create new forms of expression, artists, design firms, lawyers and intellectual property holders are starting to wonder if machine-generated works can be litigated under existing copyright laws. This uncertainty is typified by a user on the platform Quora who asks: “Would it be considered as a copyright infringement if a machine learning algorithm were trained on streamed frames from YouTube videos in an “unsupervised learning” fashion? (Quora 2016). The novice answers are anything but conclusive, ranging from “probably not” to a lukewarm “I think it won’t be, however…” Unfortunately, expert responses are not any clearer. Partly, because this territory is very new, and what counts as legal precedent is not yet well defined.

Earlier this year, the AI artist Samim Winiger developed a bot that can narrate image descriptions in the lyrical style of Taylor Swift (Hernandez 2015a). As an example of one of many bots that attempt to produce text in the style of an author using recurrent neural networks, this work prompts the question of whether output generated by machine learning algorithms is protected under existing fair use laws. The Stanford library blog on fair use explains that most cases of fair use fall between two categories: “(1) commentary and criticism, or (2) parody” (Stim 2010). Examples such as Winiger’s “Swift Bot” do not fit neatly into these usual categories. Arguably however, Winiger’s use of machine learning algorithms constitutes what fair use law calls “a transformative use” of Swift’s lyrics because, while his process uses intellectual property as an input, the output is something new or significantly different from the original (U.S. Copyright Office 2016a). This interpretation hinges on understanding machine learning algorithms as simply an extension of a human’s creative process, and therefore clearly already covered by existing fair use doctrine (Hernandez 2015b). What this perspective misses, is that if machine learning algorithms can truly capture the “style” of an author, musician, or other creative artist, as Winiger believes, style may become commodifiable or licensable (Winiger 2016).

In another example, the artist and machine learning researcher Terence Broad uploaded a neural-network generated version of Blade Runner—the film adaptation of the novel Do Androids Dream of Electric Sheep? by sci-fi author Philip K. Dick (Broad 2016). Like the previous example, Broad uses copyright protected property as an input, and outputs something different. But is the output transformative enough to constitute fair use?

Broad’s algorithmic adaptation mirrors Blade Runner frame by frame, using a machine learning technique called “auto-encoding” that attempts to embed a process for duplicating images within the neural network itself. The Holy Grail of auto-encoders can produce a perfect copy of the input with the caveat that the input must pass through a mediating layer with the smallest possible number of artificial neurons. Some researchers joke that auto-encoding is the most expensive ‘times one’ because, when it is effective, it superficially seems as if it has not effected any changes. In the machine learning community auto-encoding is not seen as simple copying though; it is understood as reducing an image to a minimal representation or a set of variables for further manipulations (see Kogan 2016). The process has real applications in generating new images and in CSI style digital zoom (see Neural Enhance 2016).

From the perspective of someone unaware of auto-encoding (or from a corporation’s content fingerprinting algorithm), an output generated by an auto-encoder may not be distinguishable from the original. Unsurprisingly, Broad’s project was issued a DMCA take down notice by Warner-Bros (Romano 2016). While the take down notice was later reversed, the scenario triggered reflection on what constitutes fair use when producing and reproducing images with machine learning algorithms. In Broad’s case, the output resembled a poorly compressed version of the original and could not be reasonably used as a substitute for the actual Blade Runner film. In principle however, an auto-encoding algorithm could be used to create a near duplicate rendering of the original. Such use signals an uncharted grey-area in existing fair use policy (Broad 2016).

This case triggers an underlying ontological question for information researchers and information policy makers regarding what constitutes the “property” that is protected by intellectual property law? Can digital information be defined entirely by the bytes the compose digital files, or does a digital work also include, to some extent, the process that produced it or the intent of the producer? These questions are deeply tied with the now age-old question of whether digital information has the properties of a private or public good. Because information can be both consumed without depleting it and without barring others access to it, information does not have the properties that we normally associate with private goods. Nevertheless, instead of answering these questions, intellectual property law and dominant online protocols manage to transform information into a private good, regardless of what its nature might be.

Copyright Policy and Copyright Protocol

The same genres of algorithms artists use to create the artwork that falls within the edge cases of copyright law are now being used to police copyright infringement. Algorithmic procedures undergird the mechanisms by which copyright holders restrict the supply of their intellectual property on the internet, thereby imposing rivalry and excludability on goods that are otherwise inexhaustible in use. In attempts to combat the piracy of digital property—such digital audio and video—law makers passed the Digital Millennium Copyright Act (DMCA) in 1996.

Considering the magnitude of information that is added to the internet every day, the DMCA is unenforceable without automated means of recognizing copyright protected material. For example, Youtube’s Content ID program allows copyright holders to submit files of audio and visual work they own to a database containing millions of reference files. Youtube then uses an algorithm to automatically test if the content contains intellectual property owned by others. The content holder has the ability to block, leave up, or make money off of the content that matches their reference file. Youtube purports that this system balances piracy’s positive effects of free advertisement against the negative effect of lost revenue by making copy protected material a potential source of advertising revenue for both Youtube and the copyright holder (Youtube 2016).

Not only are machine learning techniques required to sift through the hundreds of years worth of audio and video content that is uploaded to the internet every day, the status of machine readership as non-subjective has become the cornerstone of judicial applications of DMCA. Under the Online Copyright Infringement Liability Limitation Act (OCILLA) in the DMCA, copyright holders must knowingly misrepresent the fair use of intellectual property in a DMCA takedown request to be held liable for infringing upon legitimate uses (U.S. Copyright Office 2016a). Because copyright owners do not know exactly what automated algorithms do on their behalf, they have argued successfully that they cannot be held liable for false claims. Under this legal regime, the romantic readership of subjective interpretation becomes a bug while robotic non-subjective readership becomes a feature.

Professor of Law, James Grimmelmann traces the outcomes of several court cases where fair use laws come up against robotic readership. The author demonstrates that there are two tracks for fair use: one for human use and one for robotic readership. (Grimmelmann 2015). Historically U.S. courts have argued that algorithmic readership does not engage with the “expressive content” of works protected by copyright. If not meant primarily for human audiences, the algorithmic output is broadly construed as “transformative” and therefore protected under fair use laws. He speculates that because general level intelligence may one day match human interpretive abilities, subjective tests for “romantic readership” (readership that engages with the expressive content of works) should not be the basis of law. “[The doctrine of] Romantic readership asks a question nobody will be able to answer.” (Grimmelmann 2015, 680). Therefore, the philosophical basis by which robots get a “free pass” in copyright cases should be re-examined.

Because robotic readership gets a free pass, Content ID casts a wide net: Content ID flags any material that might contain copyright owner’s material. As the Electronic Frontier Foundation reports, “The current Content ID regime on YouTube is stacked against the users” (Amul Kalia 2015). Often Youtube’s Content ID system comes into conflict with legitimate fair uses of copyright protected material. Copyright owners using automated take-down services have faced some legal battles, because their system issues take-down notices before considering questions of fair use. While copyright owners are supposed to consider fair use before they send out takedown notices, the automation of the takedown notice system protects copyright owners from being liable. It is because the system is automated that it is impossible to prove that the owners issued the notices “deliberately or with actual subjective knowledge that any individual takedown notice contained a material error” (United States District Court Southern District of Florida 2012, 9).

With the help of the Electronic Frontier Foundation, Stephanie Lenz filed a suit against Universal Music Corp for issuing her a takedown notice after she published a video of her child dancing to a 20 second clip of music by Prince (Harvard Law Review 2016). She argued that Universal Music Corp did not take fair use into consideration before issuing the takedown notice. On the face of it, the court ruled in favor of Lenz, however, the decision may not substantially curb DMCA abuse. According to the Harvard Law Review, the decision may encourage copyright holders to use algorithmic takedown systems with even less human oversight.

Monetizing Learnings from Machine Learning

While Google uses machine learning to profit off piracy and the potential fair use of copy protected material it also champions open-source software development in the production of its own machine learning algorithms, architecture, and libraries. There has been booming activity in open-access research for machine learning techniques such as neural networks. The open-access journal ArXiv.org has seen over a 700% increase in machine learning related publications. In 2006, researchers submitted only 12 articles containing the keyword “Machine Learning”. In 2016, researchers summited 942 “Machine Learning” articles. Following this trend, the AI behemoth Google has open-sourced its R&D machine learning libraries in a framework called TensorFlow.

On one hand, this signals a continuity with existing copyleft permissive software licenses for popular deep learning research and development frameworks such Theano and Torch. On the other hand, the fact that Google made TensorFlow free for high-skilled researchers to use may signal that the monetization of Google’s products depends more on the quality and magnitude of their data and their data processing centers than the fundamental research that their data analysis techniques are based on (Thompson 2015). Therefore, Google can reap the rewards of information economics two-fold. First, by open-sourcing their deep learning R&D framework, Google can capture the creativity that thrives in open-source communities. Second, Google can appropriate the creative output for use within the context of their own locked-in networks for data gathering and distribution. Because of the nature of US patent law and policy, the community driven fundamental research is not monetizable on its own, but there are still profits to be made within the wider context of Google’s defensible patents and pipelines.

In legal theory, patents serve three basic functions: incentivizing, contracting and disclosure (Grimmelmann 2016). A functioning patent system is supposed to encourage innovation by balancing the circulation of knowledge with incentives for invention. As the theory goes: if fundamental knowledge were patentable, the circulation of knowledge would falter. However, if no protections were available for inventors, people would have no defensible economic incentive to innovate. Economists have argued that patents do not actually encourage innovation. After an economic analysis, Machlup famously said that if the patent system did not already exist, he would not suggest it, but because it has existed for so long, he could not conscientiously advocate abolishing it (Machlup 1958, 79-80). Today, the economic benefits of the patent system are not any clearer (Bessen & Meurer 2008, p 145-146). Nonetheless, only designs that are framed as specific enough inventions can be granted patents.

While the fundamental research in machine learning is not patentable, some applications of machine learning are. At the heart of many new technology companies, such as Uber and Amazon, are patents that protect algorithmic processes essential to the organizations’ business models. Amazon has patented an anticipatory shipping algorithm that uses machine learning to determine if it likely for a user to buy an item before they click (Kopalle 2014). Uber recently acquired an AI firm called Geometric Intelligence with patent pending machine learning techniques. Technology writers speculate that this might have to do with the race to produce self-driving cars (Newcomer 2016). However, from Uber’s job listings it is evident that they are looking for people with machine learning competencies in several other departments beyond vehicular automation such as: fraud, security, forecasting, finance, market research, business intelligence, platform engineering, and map services (Uber 2016). While machine learning often gets press for impressive feats like self-driving cars and besting human Go champions, the reality is that machine learning will likely have a more pervasive impact on digital marketplaces and value chains.

By integrating machine learning into every step of the production process, the oligopolistic tendencies of network economics are potentially worsened. It is not necessarily the case that networked production models and market transparency lead to freer, more competitive markets. Many retailers with an online presence cannot compete with Amazon’s proprietary pricing algorithm. Amazon prices can fluctuate more than once per day, with changes that may double or cut prices in half (Ezrachi & Stucke 2016, 13). Blindly matching prices is not strategic enough to ensure optimal profits (49). Instead, companies subcontract pricing to third-party technology vendors capable of programing more sophisticated and dynamic pricing software using “game theory and portfolio theory models” (14). This form clientalization dramatically decreases competition in the marketplace. Now many retailers such as Sears, Staples and Groupon Goods all turn to same subcontractor to price their wares: Boomerang (48). By concentrating pricing into only a few hands, this development has the potential to promote effects associated with collusion.

When online market prices are so transparent and de facto oligopolistic, the small pool of pricing algorithms can discover optimal supra-completive price points. First, when many companies all turn to the same vendor to set their prices, there is the possibility for hub-and-spoke collusion—i.e., their prices will move up in concert (Ezrachi & Stucke 2016, 49-50). Second, dynamic pricing algorithms are programmed to respond to one another. Far from being a race to the bottom, this can be a way to coordinate the increase of prices without sales loss (64). If the strategies are similar enough, and if prices changes can be made quick enough that customers cannot flee to cheaper retailers, the algorithms can settle on prices above what we should expect from a perfectly efficient market. Moreover, because of how much data is collected on users, sites can know if users are likely to practice comparison shopping. If they do not, pricing algorithms may charge them more than other shoppers (91). By integrating smart and adaptive algorithms into every step of the supply chain we have a scenario where highly rational actors with near perfect information do not necessarily set the most efficient prices.

Conclusion

At the intersection of information economics, machine learning and intellectual property are several concerns regarding social warfare—both in terms of freedom of information and economic inequality. Machine learning, bolstered by the DMCA, threatens fair use with copyright protection algorithms. Asymmetrical incentives rooted in patent law may continue to exploit community-driven machine learning research and drive rational, transparent, and yet inefficient online markets. It is time for information researchers, economists, and policy makers to come together and answer the unanswered questions about intellectual property.

References

Alexjc. 2016. “Neural Enhance.” https://github.com/alexjc/neural-enhance.

Amul Kalia. 2015. “Congrats on the 10-Year Anniversary YouTube, Now Please Fix Content ID.” Electronic Frontier Foundation. https://www.eff.org/deeplinks/2015/05/congrats-10-year-anniversary-youtube-now-please-fix-content-id.

Bessen, James, and Michael James. Meurer. 2008. Patent Failure: How Judges, Bureaucrats, and Lawyers Put Innovators at Risk. Princeton : Princeton University Press.

Broad Terence. 2016. “Autoencoding Blade Runner” Medium. https://medium.com/@Terrybroad/autoencoding-blade-runner-88941213abbe#.yd5ykmk9q.

Grimmelmann, James. 2015. “Copyright for Literate Robots.” SSRN, 1–31.

Ezrachi, Ariel, and Maurice E. Stucke. 2016. Virtual Competition : The Promise and Perils of the Algorithm-Driven Economy.

Harvard Law Review. 2016. “COPYRIGHT LAW — DIGITAL MILLENNIUM COPYRIGHT ACT — NINTH CIRCUIT REQUIRES ANALYSIS OF FAIR USE BEFORE IS- SUING OF TAKEDOWN NOTICES.”

Hernandez, Daniela. 2015a. “New Bot Uses Taylor Swift-Inspired Lyrics to Describe What It Sees” Fusion. http://fusion.net/story/228458/taylor-swift-neural-network-ai/.

Hernandez, Daniela. 2015b. “What If a Robot Stole Your Work and Passed It as Its Own?” Fusion. http://fusion.net/story/229139/computational-creativity-copyright-law/.

Knight, Will. 2016. “Uber Launches an AI Lab.” MIT Technology Review. https://www.technologyreview.com/s/603016/uber-launches-an-ai-lab/.

Kogan, Gene. 2016. “Game AI and Deep Reinforcement Learning.” ITP-NYU. http://ml4a.github.io/classes/itp-S16/06/.

Ford, Martin. 2015. Rise of the Robots : Technology and the Threat of a Jobless Future.

Machlup, Fritz. 1958. An Economic Review of the Patent System. Washington: U.S. Govt. Printing Office

Nielsen, Michael A. 2015. “Neural Networks and Deep Learning.” Determination Press.

Quora. 2016. “Would It Be Considered as a Copyright Infringement If a Machine Learning Algorithm Were Trained on Streamed Frames from YouTube Videos in an ‘Unsupervised Learning’ Fashion?” https://www.quora.com/Would-it-be-considered-as-a-copyright-infringement-if-a-machine-learning-algorithm-were-trained-on-streamed-frames-from-YouTube-videos-in-an-unsupervised-learning-fashion.

Romano, Aja. 2016. “A Guy Trained a Machine to ‘watch’ Blade Runner. Then Things Got Seriously Sci-Fi.” Vox. http://www.vox.com/2016/6/1/11787262/blade-runner-neural-network-encoding.

Stim, Rich. 2010. “What Is Fair Use?” Stanford Copyright and Fair Use Center. http://fairuse.stanford.edu/overview/fair-use/what-is-fair-use/.

Thompson, Ben. 2015. “TensorFlow and Monetizing Intellectual Property” Stratechery. https://stratechery.com/2015/tensorflow-and-monetizing-intellectual-property/.

U.S. Copyright Office. 2016. “Copyright Law: Chapter 5.” Copyright.gov. https://www.copyright.gov/title17/92chap5.html.

U.S. Copyright Office. 2016. “More Information on Fair Use.” Copyright.gov. https://www.copyright.gov/fair-use/more-info.html.

United States District Court Southern District of Florida. 2012. Disney Enters., Inc. v Hotfile Corp. No. 1:11-cv-20427-KMW.

U.S. Science and Technology Council. 2016. “PREPARING FOR THE FUTURE OF ARTIFICIAL INTELLIGENCE,” no. October.

Winiger, Samim. 2016. “Samim A. Winiger – Generative Design” The Conference. http://videos.theconference.se/samim-a-winiger-generative-design-1.

Youtube. 2016. “How Content ID Works.” YouTube Help. https://support.google.com/youtube/answer/2797370?hl=en.

Zilis, Shivon, and James Cham. 2016. “The Current State of Machine Intelligence 3.0” O’Reilly. https://www.oreilly.com/ideas/the-current-state-of-machine-intelligence-3-0?twitter=@bigdata.

Automated Futures

Category Archives: Algorithms

Aesthetics and Politics of Algorithms

Documenting Financial Infrastructure

Immortal Zugzwang

Machine Learning Economics & Intellectual Property