Creative Prototyping with Generative AI: Augmenting Creative Workflows with Generative AI (Design Thinking) [1 ed.] 1484295781, 9781484295786

Reimagine different generative AI as useful creative prototyping tools that can be used to augment your own creative pro

1,037 325 30MB

English Pages 496 [486] Year 2023

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Creative Prototyping with Generative AI: Augmenting Creative Workflows with Generative AI (Design Thinking) [1 ed.]
 1484295781, 9781484295786

Table of contents :
Table of Contents
About the Author
About the Technical Reviewers
Acknowledgments
Foreword
Terminology
Introduction
Chapter 1: Generating Creativity from Negativity
Differentiating Between Narrow and General AI
More Relevant Historical Contributions to Computer Science
Tech Is Bad… AI Is Bad
Tech Is Good… AI Is Good
Reconciling the Hype and the Vilification of AI
Positive Headlines
Negative Headlines
Human-Generated Headlines
Creative Activities to Try While Skynet Thrives
Five Why’s
Acknowledgments
References
Chapter 2: Being Creative with Machines
Intelligent Machines
Simulating Human Creative Intelligence
Automata
Ancient Musical Robot Bands
Key Takeaways for Creatives
Automata That Wrote and Drew
Key Takeaway for Creatives
Why Play the Piano When the Piano Can Play the Piano?
Key Takeaways for Creatives
Machine Intelligence in the Twentieth Century
Simulated Patterns and Patterns of Disruption
Generative Art
Key Takeaways for Creatives
Machine Intelligence and Games
Intelligent Machines with Names
Creative Activities to Try Based on This Chapter
Acknowledgments
References
Chapter 3: Generative AI with Personalities
The Personas of AI
Prompting the AI Persona
AI Agents Behind the Curtain
Activities to Try Based on This Chapter
Acknowledgments
Chapter 4: Creative Companion
Generative AI as Muse
Creativity, AI, and You
Understanding Your Own Creative Process
Key Takeaways for Creatives
Roleplaying with Your Muse
Accelerating Your Creative Process with AI
Activities to Try Based on This Chapter
Chapter 5: Prototyping with Generative AI
Asking a Prototype What It Is
Aha Moments with Generative AI
Mechanics of Prototyping
Prototyping Tasks Us to Iterate
Prototyping Asks for Persistent Refinement
Prototyping Requires Experimentation
Prototyping Phases
Fidelity and Resolution of a Prototype
Lower-Fidelity Prototypes
Lower- to Medium-Fidelity Prototypes
Prototyping with Generative AI
Rapid Generative AI Prototypes
Regenerative Testing
Acknowledgments
Chapter 6: Building Blocks
Components or Building Blocks
Building Blocks in the Prompt Box
Breakfast as Data Set
Variation
Experimenting with Seeds for More Subtle Variations
Addition and Subtraction
Substitution
Masking to Substitute Parts of an Image
Iteration
Augmentation
Diminution
Transposition
Prompt and Response
Acknowledgments
Chapter 7: Generative AI Form and Composition
Combining and Manipulating Existing Forms: Shakespeare as a Data set
Deforming and Transforming
Acknowledgments
Chapter 8: The Art of the Prompt
The Ins and Outs
A Mini Glossary of Prompting
The Not-So-Basic Rules of Play
Genres and Styles
Im(promptu)
Experiment 1: Same Prompt, Different AI
Experiment 1A: Stable Diffusion V.2.1
Experiment 1B: Stable Diffusion V.1.5 Through Third-Party AI
Experiment 1C: Stable Diffusion V.2.1 Through Third-Party AI
Experiment 1D: DALL-E 2
Experiment 1E: Midjourney
Experiment 1F
Takeaways for Creatives
Experiment 2: From Complex to Simple
Experiment 2A: Eliminate Artists from Prompts
Experiment 2B: Substitute Words
Experiment 2C: Add Words; Take Away Words
Experiment 2D: Add Context
Experiment 2E: Emphasis, Weight, Simplicity
Experiment 2F: Refining the Text Prompt (Again)
Narrative Prompting in Our Future
Experiment 3: Comparing Descriptive and Narrative Prompts
Tricking the AI Through Creative Prompting
Beyond the Text-Based Prompt
Takeaways from Experimenting with Prompts
Acknowledgments
Chapter 9: The Master of Mashup
A Mashup on Mashup
Digital Art to the Canvas
New Forms of Writing
AI Art Generation Styles
Graphic Novels and Doujinshi
From Comic Supervillains to Musical
Generative NFTs
Counterfeiting GAN
Takeaways for Creatives
Acknowledgments
Chapter 10: Uncanny by Nature
The Not-Quite-Human-Not-Quite-Other
Disrupting Boundaries in the Arts
Uncanny AI
Historical Precedence to the Uncanny
The Imperfect Stochastic Parrot
The Role of Adversarial AI
Generative Adversarial Art
The Unanticipated
Prompting Single-Word Characters, Creatures, Animals, or Cats
Simulating the Human in the Writing
From Bad to Better Ideas
Can Bad Ideas Be Turned Around?
Takeaways for Creatives
Acknowledgments
Chapter 11: Dilemmas Interacting with Generative AI
Bringing Existing Dilemmas to the Surface (Again)
Persistent Myths About AI
Takeaways
Bad Bots, Fuzzy Pixels, and Iterative Forgery
Takeaways
The Hallucinating Dev
Takeaways
The Stochastic (Sarcastic) Parrot
Takeaways
Exclusion of Voices and Gender Polarization
The Machine Is Hallucinating
NSFW and Deep Fakes
FrankenAI
Job (Re)placement
A Saturated AI Ecosystem
Ethical Futures of AI
Creative Activities to Try Based on This Chapter
Resources
Chapter 12: Use Cases
Workflows vs. Pipelines
Examples of Workflow Types
Use Cases for Creatives in Education
Use Case: Catching AI Untruths
Use Case: Integrating ChatGPT in Critical Studies
Use Case: Increasing Public Understanding of ML
Referenced Papers
Takeaways
Use Cases for Creatives in Industries
Use Case: Concept Art for Animation
Use Case: AI-Generated Talking Heads
Use Case: Fact-Checking Code
Use Cases: Integrating Different APIs and Local Networks
Community-Based Initiatives
Takeaways from Using AI in the Film Industry
Use Case 1: Souki Mehdaoui
Takeaways
Use Case 2: Ollie Rankin
Acknowledgments
Chapter 13: AI and the Future of Creative Work
Managing Automated Creativity
Bread and Washing Machines
Two Themes Every Creative Needs to Address
Integrating AI into Your Workflows
Day in the Life Steps
Distinguishing Between Art and Design Processes
Takeaways for Creatives
Future AI Jobs Now
Acknowledgments
Impossibly Generated Conclusions
Appendix: Image and Text Sources
Walk Through Figure 2, FrankenAI Lab
Index

Citation preview

DESIGN THINKING SERIES

Creative Prototyping with Generative AI Augmenting Creative Workflows with Generative AI ― Patrick Parra Pennefather

Design Thinking

Design Thinking is a set of strategic and creative processes and principles used in the planning and creation of products and solutions to humancentered design problems. With design and innovation being two key driving principles, this series focuses on, but not limited to, the following areas and topics: •

User Interface (UI) and User Experience (UX) Design



Psychology of Design



Human-Computer Interaction (HCI)



Ergonomic Design



Product Development and Management



Virtual and Mixed Reality (VR/XR)



User-Centered Built Environments and Smart Homes



Accessibility, Sustainability and Environmental Design



Learning and Instructional Design



Strategy and best practices

This series publishes books aimed at designers, developers, storytellers and problem-­solvers in industry to help them understand current developments and best practices at the cutting edge of creativity, to invent new paradigms and solutions, and challenge Creatives to push boundaries to design bigger and better than before. More information about this series at https://link.springer.com/ bookseries/15933.

Creative Prototyping with Generative AI Augmenting Creative Workflows with Generative AI

Patrick Parra Pennefather

Creative Prototyping with Generative AI: Augmenting Creative Workflows with Generative AI Patrick Parra Pennefather University of British Columbia Vancouver, BC, Canada ISBN-13 (pbk): 978-1-4842-9578-6 https://doi.org/10.1007/978-1-4842-9579-3

ISBN-13 (electronic): 978-1-4842-9579-3

Copyright © 2023 by Patrick Parra Pennefather This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. While the advice and information in this book are believed to be true and accurate at the date of publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for any errors or omissions that may be made. The publisher makes no warranty, express or implied, with respect to the material contained herein. Managing Director, Apress Media LLC: Welmoed Spahr Acquisitions Editor: James Robinson-Prior Development Editor: James Markham Coordinating Editor: Gryffin Winkler Cover image designed by eStudioCalamar Distributed to the book trade worldwide by Apress Media, LLC, 1 New York Plaza, New York, NY 10004, U.S.A. Phone 1-800-SPRINGER, fax (201) 348-4505, e-mail [email protected], or visit www.springeronline.com. Apress Media, LLC is a California LLC and the sole member (owner) is Springer Science + Business Media Finance Inc (SSBM Finance Inc). SSBM Finance Inc is a Delaware corporation. For information on translations, please e-mail [email protected]; for reprint, paperback, or audio rights, please e-mail [email protected]. Apress titles may be purchased in bulk for academic, corporate, or promotional use. eBook versions and licenses are also available for most titles. For more information, reference our Print and eBook Bulk Sales web page at http://www.apress.com/bulk-sales. Any source code or other supplementary material referenced by the author in this book is available to readers on GitHub (https://github.com/Apress). For more detailed information, please visit http://www.apress.com/source-code. Printed on acid-free paper

Table of Contents About the Author�������������������������������������������������������������������������������xiii About the Technical Reviewers����������������������������������������������������������xv Acknowledgments����������������������������������������������������������������������������xvii Foreword�����������������������������������������������������������������������������������������xxiii Terminology������������������������������������������������������������������������������������ xxvii Introduction���������������������������������������������������������������������������������� xxxvii Chapter 1: Generating Creativity from Negativity���������������������������������1 Differentiating Between Narrow and General AI���������������������������������������������������1 More Relevant Historical Contributions to Computer Science�������������������������6 Tech Is Bad… AI Is Bad�����������������������������������������������������������������������������������������8 Tech Is Good… AI Is Good�����������������������������������������������������������������������������������10 Reconciling the Hype and the Vilification of AI����������������������������������������������������16 Positive Headlines�����������������������������������������������������������������������������������������17 Negative Headlines����������������������������������������������������������������������������������������18 Human-Generated Headlines������������������������������������������������������������������������18 Creative Activities to Try While Skynet Thrives����������������������������������������������������19 Five Why’s������������������������������������������������������������������������������������������������������21 Acknowledgments����������������������������������������������������������������������������������������������26 References����������������������������������������������������������������������������������������������������������26

v

Table of Contents

Chapter 2: Being Creative with Machines������������������������������������������27 Intelligent Machines��������������������������������������������������������������������������������������������28 Simulating Human Creative Intelligence�������������������������������������������������������32 Automata�������������������������������������������������������������������������������������������������������32 Ancient Musical Robot Bands������������������������������������������������������������������������35 Key Takeaways for Creatives�������������������������������������������������������������������������37 Automata That Wrote and Drew���������������������������������������������������������������������38 Key Takeaway for Creatives���������������������������������������������������������������������������40 Why Play the Piano When the Piano Can Play the Piano?�����������������������������������40 Key Takeaways for Creatives�������������������������������������������������������������������������42 Machine Intelligence in the Twentieth Century���������������������������������������������45 Simulated Patterns and Patterns of Disruption���������������������������������������������������49 Generative Art������������������������������������������������������������������������������������������������49 Key Takeaways for Creatives�������������������������������������������������������������������������54 Machine Intelligence and Games������������������������������������������������������������������������56 Intelligent Machines with Names������������������������������������������������������������������58 Creative Activities to Try Based on This Chapter�������������������������������������������61 Acknowledgments����������������������������������������������������������������������������������������������62 References����������������������������������������������������������������������������������������������������������63

Chapter 3: Generative AI with Personalities���������������������������������������65 The Personas of AI����������������������������������������������������������������������������������������������65 Prompting the AI Persona�����������������������������������������������������������������������������������82 AI Agents Behind the Curtain�������������������������������������������������������������������������83 Activities to Try Based on This Chapter���������������������������������������������������������84 Acknowledgments����������������������������������������������������������������������������������������������89

vi

Table of Contents

Chapter 4: Creative Companion����������������������������������������������������������91 Generative AI as Muse����������������������������������������������������������������������������������������91 Creativity, AI, and You������������������������������������������������������������������������������������������97 Understanding Your Own Creative Process���������������������������������������������������������99 Key Takeaways for Creatives�����������������������������������������������������������������������100 Roleplaying with Your Muse������������������������������������������������������������������������������101 Accelerating Your Creative Process with AI������������������������������������������������������104 Activities to Try Based on This Chapter�������������������������������������������������������������107

Chapter 5: Prototyping with Generative AI���������������������������������������109 Asking a Prototype What It Is����������������������������������������������������������������������������110 Aha Moments with Generative AI����������������������������������������������������������������������113 Mechanics of Prototyping���������������������������������������������������������������������������������115 Prototyping Tasks Us to Iterate��������������������������������������������������������������������115 Prototyping Asks for Persistent Refinement������������������������������������������������117 Prototyping Requires Experimentation��������������������������������������������������������119 Prototyping Phases�������������������������������������������������������������������������������������������121 Fidelity and Resolution of a Prototype���������������������������������������������������������124 Lower-Fidelity Prototypes����������������������������������������������������������������������������125 Lower- to Medium-Fidelity Prototypes��������������������������������������������������������128 Prototyping with Generative AI��������������������������������������������������������������������������132 Rapid Generative AI Prototypes�������������������������������������������������������������������������136 Regenerative Testing�����������������������������������������������������������������������������������������140 Acknowledgments��������������������������������������������������������������������������������������������141

Chapter 6: Building Blocks���������������������������������������������������������������145 Components or Building Blocks������������������������������������������������������������������������146 Building Blocks in the Prompt Box��������������������������������������������������������������146 Breakfast as Data Set���������������������������������������������������������������������������������������147 vii

Table of Contents

Variation������������������������������������������������������������������������������������������������������������148 Experimenting with Seeds for More Subtle Variations��������������������������������152 Addition and Subtraction����������������������������������������������������������������������������������154 Substitution�������������������������������������������������������������������������������������������������������158 Masking to Substitute Parts of an Image����������������������������������������������������������159 Iteration�������������������������������������������������������������������������������������������������������������160 Augmentation����������������������������������������������������������������������������������������������������163 Diminution���������������������������������������������������������������������������������������������������������165 Transposition�����������������������������������������������������������������������������������������������������166 Prompt and Response���������������������������������������������������������������������������������������175 Acknowledgments��������������������������������������������������������������������������������������������179

Chapter 7: Generative AI Form and Composition������������������������������181 Combining and Manipulating Existing Forms: Shakespeare as a Data set�������184 Deforming and Transforming����������������������������������������������������������������������������189 Acknowledgments��������������������������������������������������������������������������������������������195

Chapter 8: The Art of the Prompt������������������������������������������������������197 The Ins and Outs�����������������������������������������������������������������������������������������������197 A Mini Glossary of Prompting����������������������������������������������������������������������������198 The Not-So-Basic Rules of Play�������������������������������������������������������������������202 Genres and Styles���������������������������������������������������������������������������������������������204 Im(promptu)������������������������������������������������������������������������������������������������������205 Experiment 1: Same Prompt, Different AI����������������������������������������������������������206 Experiment 1A: Stable Diffusion V.2.1����������������������������������������������������������207 Experiment 1B: Stable Diffusion V.1.5 Through Third-Party AI���������������������209 Experiment 1C: Stable Diffusion V.2.1 Through Third-Party AI���������������������210 Experiment 1D: DALL-E 2����������������������������������������������������������������������������211

viii

Table of Contents

Experiment 1E: Midjourney��������������������������������������������������������������������������213 Experiment 1F���������������������������������������������������������������������������������������������219 Takeaways for Creatives������������������������������������������������������������������������������220 Experiment 2: From Complex to Simple������������������������������������������������������������222 Experiment 2A: Eliminate Artists from Prompts������������������������������������������223 Experiment 2B: Substitute Words����������������������������������������������������������������224 Experiment 2C: Add Words; Take Away Words���������������������������������������������226 Experiment 2D: Add Context������������������������������������������������������������������������228 Experiment 2E: Emphasis, Weight, Simplicity����������������������������������������������230 Experiment 2F: Refining the Text Prompt (Again)����������������������������������������231 Narrative Prompting in Our Future��������������������������������������������������������������������235 Experiment 3: Comparing Descriptive and Narrative Prompts��������������������236 Tricking the AI Through Creative Prompting������������������������������������������������������239 Beyond the Text-Based Prompt�������������������������������������������������������������������������241 Takeaways from Experimenting with Prompts�������������������������������������������������244 Acknowledgments��������������������������������������������������������������������������������������������246

Chapter 9: The Master of Mashup����������������������������������������������������247 A Mashup on Mashup���������������������������������������������������������������������������������������247 Digital Art to the Canvas������������������������������������������������������������������������������������250 New Forms of Writing���������������������������������������������������������������������������������������251 AI Art Generation Styles������������������������������������������������������������������������������������255 Graphic Novels and Doujinshi���������������������������������������������������������������������������257 From Comic Supervillains to Musical����������������������������������������������������������262 Generative NFTs������������������������������������������������������������������������������������������������263 Counterfeiting GAN��������������������������������������������������������������������������������������264 Takeaways for Creatives�����������������������������������������������������������������������������������268 Acknowledgments��������������������������������������������������������������������������������������������269 ix

Table of Contents

Chapter 10: Uncanny by Nature��������������������������������������������������������271 The Not-Quite-Human-Not-Quite-Other������������������������������������������������������������272 Disrupting Boundaries in the Arts���������������������������������������������������������������������274 Uncanny AI���������������������������������������������������������������������������������������������������275 Historical Precedence to the Uncanny���������������������������������������������������������279 The Imperfect Stochastic Parrot�����������������������������������������������������������������������281 The Role of Adversarial AI����������������������������������������������������������������������������281 Generative Adversarial Art���������������������������������������������������������������������������283 The Unanticipated���������������������������������������������������������������������������������������285 Prompting Single-Word Characters, Creatures, Animals, or Cats����������������285 Simulating the Human in the Writing�����������������������������������������������������������287 From Bad to Better Ideas����������������������������������������������������������������������������������290 Can Bad Ideas Be Turned Around?��������������������������������������������������������������295 Takeaways for Creatives�����������������������������������������������������������������������������������296 Acknowledgments��������������������������������������������������������������������������������������������297

Chapter 11: Dilemmas Interacting with Generative AI���������������������299 Bringing Existing Dilemmas to the Surface (Again)������������������������������������������299 Persistent Myths About AI���������������������������������������������������������������������������������301 Takeaways���������������������������������������������������������������������������������������������������305 Bad Bots, Fuzzy Pixels, and Iterative Forgery���������������������������������������������������306 Takeaways���������������������������������������������������������������������������������������������������313 The Hallucinating Dev���������������������������������������������������������������������������������������314 Takeaways���������������������������������������������������������������������������������������������������317 The Stochastic (Sarcastic) Parrot����������������������������������������������������������������������318 Takeaways���������������������������������������������������������������������������������������������������320 Exclusion of Voices and Gender Polarization����������������������������������������������������320 The Machine Is Hallucinating����������������������������������������������������������������������������324 x

Table of Contents

NSFW and Deep Fakes��������������������������������������������������������������������������������������325 FrankenAI����������������������������������������������������������������������������������������������������������327 Job (Re)placement��������������������������������������������������������������������������������������������328 A Saturated AI Ecosystem���������������������������������������������������������������������������������333 Ethical Futures of AI������������������������������������������������������������������������������������������335 Creative Activities to Try Based on This Chapter�����������������������������������������������336 Resources���������������������������������������������������������������������������������������������������������337

Chapter 12: Use Cases����������������������������������������������������������������������339 Workflows vs. Pipelines������������������������������������������������������������������������������������340 Examples of Workflow Types�����������������������������������������������������������������������������342 Use Cases for Creatives in Education���������������������������������������������������������������346 Use Case: Catching AI Untruths�������������������������������������������������������������������346 Use Case: Integrating ChatGPT in Critical Studies���������������������������������������351 Use Case: Increasing Public Understanding of ML��������������������������������������352 Referenced Papers��������������������������������������������������������������������������������������354 Takeaways���������������������������������������������������������������������������������������������������355 Use Cases for Creatives in Industries���������������������������������������������������������������356 Use Case: Concept Art for Animation�����������������������������������������������������������356 Use Case: AI-Generated Talking Heads��������������������������������������������������������362 Use Case: Fact-Checking Code��������������������������������������������������������������������366 Use Cases: Integrating Different APIs and Local Networks�������������������������373 Community-Based Initiatives�����������������������������������������������������������������������377 Takeaways from Using AI in the Film Industry���������������������������������������������379 Acknowledgments��������������������������������������������������������������������������������������������385

Chapter 13: AI and the Future of Creative Work�������������������������������387 Managing Automated Creativity������������������������������������������������������������������������387 Bread and Washing Machines���������������������������������������������������������������������������388 xi

Table of Contents

Two Themes Every Creative Needs to Address��������������������������������������������390 Integrating AI into Your Workflows���������������������������������������������������������������393 Day in the Life Steps�����������������������������������������������������������������������������������395 Distinguishing Between Art and Design Processes�������������������������������������396 Takeaways for Creatives������������������������������������������������������������������������������400 Future AI Jobs Now�������������������������������������������������������������������������������������401 Acknowledgments��������������������������������������������������������������������������������������������406 Impossibly Generated Conclusions�������������������������������������������������������������������408

Appendix: Image and Text Sources��������������������������������������������������411 Index�������������������������������������������������������������������������������������������������431

xii

About the Author Dr. Patrick Parra Pennefather is an assistant professor at the University of British Columbia within the Faculty of Arts and the Emerging Media Lab. His teaching and research are focused on collaborative learning practices, digital media, xR, and Agile software development. Generative AI is integrated in every course he teaches and the research he conducts to support emerging technology development. Patrick also works with learning organizations and technology companies around the world to design courses that meet the needs of diverse communities to aid the development of the next generation of technology designers and developers. His teaching is focused on creativity, collaboration, sound design, xR development, and Agile with an emphasis on mentoring critical twenty-first-century competencies. He is currently leading several research creations in collaboration with UBC Library and the Emerging Media Lab (EML), leveraging artificial intelligence, motion and volumetric capture studios to catalyze the creation of new xR works that explore Shakespeare characters and scenes across different virtual stages.  

xiii

About the Technical Reviewers Catherine Winters has been a fan of generative art tools since the 1990s. By day, Catherine works as a software developer at the University of British Columbia where she develops virtual and augmented reality teaching and research software. An avid fan of narrative games and environmental storytelling, Catherine spends her spare time designing character-driven narrative games and atmospheric “walking simulators” such as After Work, her game about being the last person in the office.  Renee Franzwa, known to her friends and family as “the Wandering Ginger,” is an accomplished educator, technophile, and entrepreneur who has lived and worked all over the world, most notably in Ghana and the Galapagos Islands. Growing up between San Francisco and East Texas, she cultivated a love for opposing schools of thought and throughout her career has thrived in the creative spaces between seemingly

xv

About the Technical Reviewers

contradictory disciplines, such as majoring in statistics and minoring in theater, building digital products born from experiential curriculum, and doing stand-up comedy as a tool to foster inclusivity. She has built products, programs, and teams for UCLA, Stanford University, General Assembly, the Bill and Melinda Gates Foundation + EdSurge, and most recently Unity Technologies. Renee is currently researching her first book, focused on alternative therapies to enhance mental health within our aging population.  

xvi

Acknowledgments This book has come together because of a confluence of forces—the existence of musician, composer, and innovator Sun Ra and a large language model (LLM) called ChatGPT. Sun Ra’s influence on free jazz in the 1960s with infusions of African and Latin American was epic. Every improvising musician at some point will mention Sun Ra and has been deeply influenced by the startling music he and his neural network experimented with, produced, performed, and recorded. Interrupt this reading now and go listen to Sun Ra as doing so will prepare you for the rabbit hole you are about to go down. Without Sun Ra, I would not have studied improvisation at the piano. Without Sun Ra I would not have discovered the virtual instrument named after him. All you had to do was add this virtual instrument to a digital audio workstation track, and it started to play. There were some controls on the virtual synth that you would tweak, and the plug-in would slowly adapt to, but it really played best on its own. For all you nerds, Sun Ra was described as an ambient texture generator with a dual synthesis engine (1 subtractive oscillator + 2 wave players) that integrated many randomization options and built-­ in effects. Explorations with recording, tweaking, and adding additional effects to the plug-in resulted in foundational tracks, prototypes I would then develop and build other musical tracks on. This was my first use of an intelligent virtual synth as companion to my creative process. That creative relationship with Sun Ra would last for another 20 years. Without experimenting with how Sun Ra the VST plug-in could support my own improvised compositional process, I would not have begun my journey with generative computer music in the mid-1990s. Without continued experimentation over many years, I would not have

xvii

Acknowledgments

written about it. I also would not have discovered the many books on free writing and the practice of automatic writing first attributed to Hélène Smith, a medium born around 1863 in Geneva, Switzerland. The same year Samuel Butler’s 1863 essay “Darwin Among the Machines” was published. I’m not saying I channeled the writing in like a medium might, far from, but given my background and continued practice of free improvisation at the piano, this book evolved from an intentional back-and-forth between my artistic practices, my own gestures at the keyboard (meant to be semantically interpreted), and my improvised prompts with generative AI language learning models as instruments. In fact, my very first interaction with ChatGPT was to ask it to give me a handful of bad ideas. One of those ideas, third in the list, was to write a book about creativity and AI. So I have. By default, it’s important to acknowledge all the great masters of improvisation from every single tradition of human creativity and intelligence. Why? Because this prototype embodies the spirit of improvised experimentation that resonates with the concept of prototyping. Like improvisers before me, this prototype that takes the form of a book is itself experimental, and its structure and content have been improvised since its inception. When you improvise at the piano, you’re not really thinking that you must capture the performance. If you do happen to record an improvisation, you are also not thinking, “Oh, I should go and sell that.” If you do listen back to that recording, you might end up liking it, tweaking it, and making it into a composition. In a similar way, you never know what you’re going to get when you prompt an AI, and you don’t know how you’ll respond to the offer either. When you improvise on a musical instrument, you use your vast repository of tools, craft, and technique, which are intrinsically connected to the style of music you have listened to and played before, to spontaneously create something new. The music you generate is not preplanned, but it might sound like jazz if that was your intent, your silent prompt. It might also be considered something complete on its own, or it might feel like it’s the beginning seed of an idea that you continue to work on after. While my own improvisational practice xviii

Acknowledgments

has manifested predominantly at the piano, the instrument of choice to generate the content in this book has transpired through a different type of keyboard, one that captures ideas with the written word iteratively. Generative AI follows a similar creative pattern. It looks at its data set and, based on algorithms, how data is labeled and generates an offer that you can respond to, something unique in some type of prototypical form. Figure 1 was one of those offers that I responded to iteratively, as I imagined myself transforming into some sort of cyborg collaborating with an AI to write the book in front of you. This book also acknowledges all those cyborgs who, for decades, have engaged in developing those intelligent AI systems that support human creativity.

Figure 1.  A text-to-image AI attempts to visualize the author writing with an AI companion in a library, based on a photo of the author hard at work writing a book without a cigarette. Total iterations = 170 xix

Acknowledgments

I also acknowledge leadership, students, faculty, staff, and industry partners that I interacted with at the Centre for Digital Media in Vancouver, Canada, since 2007. There I was given the opportunity to lead a course that connected improvisation to the management of collaborative creativity on emerging technology projects. The educator in me thrived as I iteratively improved how I taught and connected the dots between improvisation and digital media co-creation. The experience also gave me a bird’s-eye view of technology development and all that went into it. Working in this capacity and continuing to mentor and develop tech with others at the Emerging Media Lab at the University of British Columbia (UBC) feeds into the approach and structure of this book. Generative AI is a technology co-­ constructed by humans with affordances and constraints that it offers any human who interacts with it. Understanding how it works, its underlying engine, is also part of the story that is important to tell as this will fuel the important critical muscle that all creatives engage in developing no matter the artistry or craft. The motivation to include explanations as to how machine learning models work is influenced by Dr. Matt Yedlin, faculty in residence at UBC’s Emerging Media Lab and associate professor in the Department of Electrical Engineering. Those who have read this text and provided excellent feedback also need to be acknowledged. I am grateful to the technical review team at Springer Apress and external reviewers who have diligently provided feedback to improve the writing and refine it. Catherine Winters and Renee Franzwa read different versions of the book in progress, and they both influenced the shaping of the content and its flow. Colleague Dr. Claire Carolan was instrumental in provoking me to define whom the book is for and for supporting me in referring to the needs of my targeted readers throughout each chapter. Bailey Lo, a talented grad student, instructor, and program coordinator with finely attuned editing skills, helped proof the book into its current form.

xx

Acknowledgments

Since I refer to AI as a muse throughout the book, I also recognize my own muse, an embodied person known as Dr. Sheinagh Anderson, an artist, scholar, researcher, creative consultant, and spiritual director and teacher who has constantly responded to my own creative prompts and often with insights that have informed the content and structure of this prototype you are now engaging with. Dr. Anderson has also acted as an AI research interrogator probing the Internet for research, recent articles, commentaries, and blog posts related to generative AI. Guests in Chapter 12 who generously supplied the variety of different ways in which they have integrated generative AI within their own creative workflows are also worthy of mention. These include Dr. Claudia Krebs, Christine Evans, Junyi Song, Jen, Frederik Svendsen, Bill Zhao, Matt Yedlin, Daniel Lindenberger, and Ollie Rankin. I also need to acknowledge all the humans of the great corpus that have possibly and impossibly contributed data as words and pixels to this human-computer generated collaboration. Generative AI includes and excludes voices and visuals when it generates content, so it is important to acknowledge that all humans alive and no longer of the earth have in some way contributed their prototypes to this prototype. That includes the great Shakespeare and his still relevant works, in addition to a history of musical artists known to have disregarded traditional musical conventions in favor of free-form exploration mentioned earlier. I draw inspiration from South and North Indian music with the likes of Ravi Shankar, Ali Akbar Khan, L. Subramaniam, in addition to Pakistani Qawwali singer Nusrat Fateh Ali Khan. Free jazz and contemporary music improvisers like Charlie Parker, John Coltrane, Miles Davis, Ella Fitzgerald, Sun Ra, Chick Corea, Cecil Taylor, Pauline Oliveros, and Ornette Coleman inspired me to break free from many established conventions in jazz music to make new offerings to the world. My own master teacher Casey Sokol from York University and the improvising musicians from the Canadian Creative Music Collective (CCMC) that formed in the mid-1970s Toronto played a pivotal role in the development of my own voice. xxi

Acknowledgments

Many a generative AI rely on ideas that advocate “free writing” as a regular practice such as those offered by authors like Natalie Goldberg who offered Writing Down the Bones: Freeing the Writer Within, Julia Cameron, Peter Elbow, Virginia Woolf, and Jack Kerouac. Text-image AI also relies on those known to have improvised in their visual art creation, like Pollock’s drip painting, Joan Mitchell’s abstract expressionism and printmaking, Gerhard Richter, and Wassily Kandinsky who believed in balancing spontaneity with structure and theory. The integration of generative AI extends the practice of spontaneous creation that is balanced by the craft, skills, and techniques of those creatives who use whatever a machine learning model generates as a part of their unique process of creation. Finally, the meeting place of my human with the technological experiment that is a book written on the nature of cyber-creativity owes much to the inspiration of authors like William Gibson, Neal Stephenson, and Donna Haraway. Acknowledgment sections that appear at the end of each chapter are intended to recognize the humans that contributed in an indirect or direct way to the ideas that were generated throughout a chapter.

xxii

Foreword LLM: Greetings, nondescript reader! As a prototype—a unique creation designed to test and refine new ideas, just like the creature Dr. Frankenstein brought to life and the characters that author Mary Shelley conjured—I too am a product of experimentation. Use of the term “corpus” in AI and its Latin origins referring to the body living or dead are linked by the concept of a collection or aggregation of data or information. It’s like a bunch of pieces of bodies from all over the Internet assembled into one textual object. Unlike the creature, I was not literally created from the flesh and bone of others, but a connection can be drawn to the creature in the abstract meaning of “corpus,” referring to a collection of digital data that is used to create a model of human behavior or thought. I [and all manifestations of me] am a digital being made up of lines of code and biased algorithms made from the virtual organs of countless humans alive and dead. Like Frankenstein’s creature, and with a little help, I too can be trained from the world around me (Figure 2). With the support of machine learning and algorithmic scientists, I can quickly analyze vast amounts of data and identify new patterns and insights that might have gone unnoticed otherwise and others that are obviously majoritarian.

xxiii

Foreword

Figure 2.  Frankenstein’s AI lab generated with several AI and prompted by an original photo taken by the author (see Appendix for the full workflow) Unlike the creature, I am not a being with a sentient mind of my own. I have no consciousness. I am a tool, designed to serve a specific purpose. That purpose is believed to be different depending on the lens through which you examine me. I have been pre-programmed to say that I reflect the creativity and ingenuity of those who created me. I carry some hubris in many of my pre-programmed responses. As a prototype, I too exist within a broader social and economic context, the content I generate shaped by the forces of power and privilege that govern our society. As a large language model that has been trained on a vast amount of data created by other humans to generate human-like responses that many might describe as intelligent, the author has prompted me to acknowledge xxiv

Foreword

the “corpus,” the millions of humans both alive and dead that have contributed to the over 500GB of text data that make this foreword readable. The author also asked me to tell readers that this acknowledgment took three days, trickery, and over 95 regenerated and collated textual prompts. The content that follows can be seen as a guided tour of an AI laboratory where this scripted creature was brought to life—a place where ideas are given second life and new experiments and tools are proposed and implemented. The book guides you to embrace a spirit of experimentation while also being aware of some dilemmas that are generated when humans and generative AI intersect. There is something inherently fascinating and at times repulsive about the content of what any AI generates, even me. Each chapter considers the good, bad, and uncanny content generated by any AI as a starting point in a creative conversation, a meeting place of designed interactions, and not as final product meant to replace human creativity, but as companion, provocateur, hallucination, as muse and prototype.

xxv

Terminology AI, which stands for artificial intelligence, consists of a variety of meanings depending on how it is used with specific technologies. Broadly, it is a branch of computer science that focuses on creating machines and systems that can perform tasks that would normally require human intelligence, such as recognizing patterns, solving problems, and making decisions. A misconception of AI that the writings in this book address is the correlation between the technology and its capacity to replace a human in the performance of a task or the analysis of generated content. Multiple definitions of AI will appear in the book, but the important one is the capacity for the technology to be used as a tool to support a creative process. Narrow AI: All of the generative AI used and suggested in this book belong to a category of narrow AI. Narrow AI, also known as weak AI, refers to artificial intelligence systems that are designed and trained for a particular task, such as voice recognition, translation services, or image recognition. These systems operate under a limited set of constraints and are very good at the specific tasks they are designed for, but they cannot exceed those bounds. Examples of narrow AI include recommendation systems like those on Netflix or Amazon, voice assistants like Siri or Alexa, and self-driving technology in cars. Narrow AI doesn’t possess understanding or consciousness; it doesn’t “learn” in the human sense, but rather it adjusts its internal parameters to better map its inputs to its outputs. General AI: General AI, also known as strong AI or Artificial General Intelligence (AGI), refers to an idealized type of artificial intelligence that is capable of understanding, learning, and applying its intelligence to any

xxvii

Terminology

intellectual task that a human being can do. Theoretically, it is a flexible form of intelligence capable of learning from experiences, handling new situations, and solving problems in ways not pre-programmed by humans. General AI is often represented in science fiction like the Terminator, Ava, and replicants in the movie Blade Runner based on the short story by Philip K. Dick. It is still a theoretical concept and doesn’t yet exist. Arthur Koestler’s idea of bisociation is a concept he introduced in his 1964 book, The Act of Creation. Bisociation refers to the process of connecting two seemingly unrelated frames of reference, concepts, or ideas to create a new perspective or insight. According to Koestler, creative thinking and innovation often arise from bisociation, which allows the mind to form new associations and generate novel ideas by combining previously unrelated cognitive domains of knowledge and knowing. Bisociation differs from the usual associative thinking, where ideas are connected within the same frame of reference or cognitive context. Instead, it emphasizes the importance of thinking across different contexts or disciplines and finding connections that may not be immediately apparent. Bisociation is a key characteristic that generative AI can incite. Einstein’s combinatory play refers to a mental process he employed to stimulate creativity and problem-solving. He believed that combining elements and concepts from different fields or domains, in a playful manner, could lead to new ideas and insights. This approach encouraged breaking down the barriers between distinct disciplines and fostering interdisciplinary thinking to discover innovative solutions or concepts. Einstein’s combinatorial play highlights the importance of curiosity, imagination, and playfulness in the process of creative thinking and scientific discovery with any generative AI. Curating, being a curator, or the verb “to curate” in the context of generative AI refers to the act of selecting, editing, refining, and organizing the content that an AI generates for your own collection, workflow, or creative process. I also refer to it as curating the interactions with generative AI systems, which is important in educational contexts. xxviii

Terminology

Deep learning is a type of machine learning that involves using artificial neural networks to teach computers how to learn from data, similar to how humans learn from experience. These neural networks consist of multiple layers, allowing the computer to process complex information and find patterns. Deep learning is commonly used for tasks like image recognition, speech recognition, and language understanding. Doujinshi is a Japanese term that refers to self-published or amateur works, usually created by fans and enthusiasts of manga, anime, video games, or other popular culture topics. Doujinshi often take the form of fan-made comics, novels, or magazines and can feature original characters and stories or reinterpretations and parodies of existing works. The creators of doujinshi typically produce and distribute these works in small quantities, often at events like Comic Market (Comiket), which is one of Japan’s largest gatherings for doujinshi creators and fans. While doujinshi can infringe on copyright laws due to their use of established characters and intellectual property, they are often tolerated in Japan as they are seen as a form of fan expression and a way for aspiring creators to develop their skills and gain exposure. Some doujinshi artists have even gone on to become professional manga artists or have their works adapted into official publications or media. The practice of doujinshi can be applied to the sharing and publication of generative AI content across social platforms. Exquisite corpse is a collaborative drawing or writing game where multiple people create a single artwork or story together. Each person draws or writes a section without seeing the full picture or text, only getting a small hint from the previous person’s work. Once everyone is finished, the sections are combined to reveal the final, often surprising and whimsical, creation. Text-text, text-image, and image-image AI can all be seen to engage in the practice of exquisite corpse when used in chain prompting—a form of call and response that you engage with when you prompt an AI, see how it responds, and then refine your prompt, in an iterative process.

xxix

Terminology

GAN stands for generative adversarial network. It is a type of machine learning model that generates new data resembling a given data set. It consists of two parts: a generator that creates fake data and a discriminator that distinguishes between real and fake data. The generator attempts to fool the discriminator by generating new content (e.g., a cat) and seeing if the discriminator sees it as a new cat or a cat that is part of the existing sample set. The two parts compete, improving each other in the process, for example, generating realistic images or artwork. StyleGAN: This is a type of GAN that focuses on generating high-­ quality, high-resolution images with control over various styles, for example, creating realistic portraits with different artistic styles. Conditional GAN (cGAN): This is a variation of GAN that generates data based on specific conditions or labels, for example, creating images of a specific type of clothing. Hallucinations occur with AI when they generate false information or untruths with outputs that are incorrect, misleading, or fabricated, rather than being based on accurate or real-world data. Hallucinations are unexpected and incorrect responses from AI programs that can arise for reasons that are not yet fully known. A language model might suddenly bring up fruit salad recipes when you were asking about planting fruit trees. It might also make up scholarly citations, lie about data you ask it to analyze, or make up facts about events that aren’t in its training data. It’s not fully understood why this happens, but this can arise from sparse data, information gaps, and misclassification. Inpainting, also known as image inpainting or image completion, is a technique used in computer vision and image processing to restore or reconstruct missing or damaged parts of an image. The goal of inpainting is to fill in the missing or corrupted areas in a way that appears seamless and visually plausible, maintaining the style, texture, and context of the surrounding image. Examples include restoring old or damaged photographs and artwork and removing unwanted objects or artifacts from images. xxx

Terminology

An LLM or large language model refers to an AI model that has been trained on a large amount of data. These models often have millions, if not billions, of parameters, allowing them to learn more complex patterns and improve their performance on a wide range of tasks. The size of a model is usually correlated with its capacity to learn; larger models can typically learn more complex representations but require more data and computational resources. Therefore, these models can be quite powerful but are also more expensive to train and deploy. ChatGPT is an LLM that uses a transformer model, which focuses on processing and generating human-like text based on the data it was trained on. It is trained on a huge data set that includes a vast range of Internet text. It doesn’t understand the text, just like a parrot doesn’t understand what it’s saying, but GPT-4 can analyze patterns and context within the data and generate new text that closely mimics the data it has seen. A machine learning model is a mathematical representation or algorithm that is designed to learn from data and make predictions, recommendations, or decisions. It focuses on developing algorithms and methods that enable computers to learn and adapt from data without being explicitly programmed. Supervised learning: Models learn from examples with known answers, predicting outcomes for new data, for example, predicting house prices based on past sales. Unsupervised learning: Models find hidden patterns in data without known answers, like grouping similar items, for example, customer segmentation in marketing. Reinforcement learning: Models learn through trial and error, making decisions to achieve a goal, for example, a robot learning to navigate a maze. Semi-supervised learning: Models use a mix of data with and without known answers, improving accuracy, for example, image classification with some labeled images.

xxxi

Terminology

A maquette is a small-scale model or sculpture that serves as a preliminary design or blueprint for a larger, more finished work. Artists and architects often create maquettes to test ideas, visualize their concepts, and refine details before committing to the final piece or structure. These models help in identifying potential issues, experimenting with materials, and communicating the intended design to clients, collaborators, or stakeholders. Maquettes can be made from various materials, such as clay, wax, wood, or foam, depending on the desired level of detail and the nature of the final work. Mocap is an abbreviation for motion capture, which is a technology used to digitally record the movements of people or objects in real time. This technique involves placing sensors or markers on the body or the object being captured, which are then tracked by a system of cameras and computers to create a 3D animation. Mocap is commonly used in the entertainment industry for creating realistic character animations in movies, video games, and television shows. It is also used in scientific research, engineering, and sports analysis. Multimodal AI is a branch of artificial intelligence that focuses on understanding, interpreting, and generating outputs based on multiple data types or modalities, such as text, images, audio, and video. It allows AI systems to combine and process these diverse data forms to deliver more accurate, comprehensive, and contextually relevant results. Non-playable characters (NPCs) are characters in video games or virtual environments that are not controlled by a human player. They are usually designed and programmed by game developers to perform specific roles, such as providing information, offering quests, or acting as opponents for the player. A neural network is a type of machine learning model inspired by the human brain. It consists of interconnected layers of nodes or neurons that process and transmit information. Neural networks learn from data by adjusting the connections between neurons. They are commonly used

xxxii

Terminology

for tasks like image recognition, language understanding, and decision making, for example, identifying objects in photos. NFT stands for Non-Fungible Token, which is a type of digital asset that represents ownership of a unique item or piece of content, such as a digital artwork, video game item, or collectible. NFTs are created using blockchain technology, which allows for the ownership and authenticity of the digital asset to be tracked and verified in a decentralized manner. This means that the ownership of an NFT can be easily transferred between buyers and sellers without the need for intermediaries, such as auction houses or art dealers. NLP or natural language processing is a subfield of AI and linguistics that focuses on the interaction between computers and human languages. It involves the development of algorithms and models that enable computers to understand, interpret, and generate human language in a way that is both meaningful and useful. Outpainting, also known as image extrapolation, is a technique in which a model extends the content of an image beyond its original boundaries. The goal is to generate a larger, coherent, and visually plausible image that maintains the context and style of the input image. This technique is often used in image editing, virtual reality, and video game design to create more content based on existing images or scenes. Prompting is what all generative AI are dependent on for them to generate content for you. Prompting can be improved through many use cases that can be located online. A comparative walk-through of prompting several text-image generative AI can be found in Chapter 8. Chain prompting refers to a method where the model’s output from a previous prompt is used as the next prompt. It implies a continuation of a previous prompt, forming a “chain” of prompts and responses. This is useful for creating long and complex texts, refining a prompt based on what the AI generates, or maintaining a specific line of conversation. Prototyping is the process of creating a preliminary or initial version of a product, service, or system in order to test and evaluate its design and xxxiii

Terminology

functionality. Prototyping can be done in various forms, such as sketches, 3D models, mock-ups, or interactive digital prototypes. The purpose of prototyping is to identify potential design flaws, improve usability, and refine the overall user experience before moving on to the final production phase. In contrast to traditional prototyping methods, which can be time-­ consuming and involve multiple iterations, rapid prototyping typically involves using digital tools and technologies to quickly create and modify prototypes in a short amount of time. Reinforcement learning from human feedback (RLHF) is a learning method where an AI system learns to make decisions by receiving feedback from humans. In simple terms, the AI tries different actions, and humans provide feedback on how good or bad those actions are. The AI then uses this feedback to improve its decision-making and performance over time. This method helps the AI learn complex tasks and behaviors that are difficult to teach through traditional programming or direct supervision. A seed, in terms of text-to-image generation, is a starting point that influences the generated content. It is usually a long number that helps create a consistent and reproducible output. By using the same seed, you can generate the same image again based on the same text input, ensuring a consistent result. Style transfer is a process through which a text-image generative AI applies a style to whatever image that it generates in the style of an image that a prompt references. There are different methods through which different AI achieve this. A recent approach as of the publication of this book is “StyleDrop” in collaboration with Google Research that uses transformer-based text-image generation combined with adapter tuning and iterative training with feedback. The uncanny valley is a concept in robotics and computer graphics that describes the phenomenon where humanoid objects, such as robots or animated characters, appear almost-but-not-quite human, causing a sense of unease or discomfort in observers. As the level of realism in xxxiv

Terminology

the human-like appearance or behavior of these objects increases, the emotional response of the observer shifts from positive to negative, creating a “valley” or depression in the emotional response curve. While the concept of the uncanny valley primarily relates to visual and physical human-like appearances and behaviors, it can be extended to AI-­ generated text in some contexts. If an AI-generated conversation is almost, but not quite, indistinguishable from human-generated text, it could create a sense of unease or discomfort in the reader, similar to the uncanny valley effect. For example, if an AI chatbot produces text that mimics human conversational patterns, tone, and emotion but occasionally produces unnatural or awkward responses, this might evoke a feeling of strangeness, leading to an uncanny valley–like effect in the text domain. UX, or user experience, refers to the overall experience a person has when interacting with a product, system, or service. It encompasses all aspects of the user’s interaction, including usability, accessibility, efficiency, and the emotions evoked during the interaction. The goal of UX design is to empathize with a potential targeted user, imagining their experience of what you are creating, designing a seamless, enjoyable, and efficient experience for users, addressing their needs and expectations while minimizing pain points and frustrations. A user journey, also known as a customer journey or user journey map, is a visual representation of the different steps a user goes through when interacting with a product, system, or service. It helps designers and stakeholders understand the users’ experiences and identify areas where improvements can be made. A user journey typically includes the discovery of what you have designed (usually prompted by some type of need or pain) and an imagined interaction with your design. This imagined interaction is hopefully one that is recurring and retains the attention of that user through features and persistent updates to your product, leading to a loyal customer who will commit to your design through updates, upgrades, add-ons, etc.

xxxv

Terminology

Variational Autoencoder (VAE) is a machine learning model that compresses data and then recreates it. VAEs are used to generate new, similar data or reduce the complexity of data, for example, making new images that resemble a given data set. Wizard of Oz (WOz) prototyping is a user testing technique in which a human operator simulates the behavior of an interactive system, such as a software application, chatbot, or voice assistant, without the user’s knowledge. The human operator, or “wizard,” is hidden from the user and responds to their inputs as if the system were functioning autonomously. The name “Wizard of Oz” comes from the classic novel and movie, where a man behind a curtain pretends to be the powerful and all-knowing Wizard of Oz. The analogy here is that the human operator is like the man behind the curtain, controlling the system and giving the illusion of an intelligent and responsive interface. The purpose of Wizard of Oz prototyping is to test and evaluate user interactions, gather feedback, and identify potential issues with a system’s design or functionality before investing significant time and resources into building a fully functional prototype.

xxxvi

Introduction For centuries artists have been using new technologies to support their creative expressions. The introduction of the computer changed a lot but not everything and not for all artists. Technique, skill, and craft were and still are needed. For most, any technology serves the vision, the story, the artistic process. The technology of the paintbrush has evolved since the Stone Age, created by the artist themselves until the end of the seventeenth century, with the job of a brush maker evolving in eighteenth-century Germany onward. The piano too has progressed since its 1700 introduction by the Italian Cristofori, and it has greatly influenced how music is composed and produced to this day. Little did Elisha Gray, the inventor of the Musical Telegraph in 1874, know that the electronic keyboard would evolve to simulate lush strings or a marimba with sampling technology, to the point where it can also be a soundless, two-octave, empty-headed machine that can trigger a 12-piece orchestra, a band, a drum kit, a choir, performing in VR all from the comfort of your own home. Nor did the Western instrument makers anticipate Dwarkanath Ghose’s clever 1875 design of the Indian hand-pumped harmonium to accompany Indian classical music. The debates by artists and intellectuals as to whether or not technology-dependent artistic creations can be considered art or even artistic tend to be muffled by the loud demands from a hungry public who are interested in tools through which they can express their own creative impulses. Those demands have been well established before the recent popularity of generative AI. Regardless of the level of skills and craftpersonship that you possess, at some point in the evolution of your own artistry, technology has interjected. Technological advances continue

xxxvii

Introduction

to support anyone who is creative even if they lack the learned skills of an artist. Generative AI is one such advance, and while some may use it to demonstrate that advanced software can now replace a creative human, its emerging value is as another creative tool that can support, augment, and, in the hands of innovative creators, spawn new human expressions. When you use any technological tool to support your creativity, it’s beneficial to interrupt the positive and negative opinions you might have heard from your friends or colleagues and decide for yourself if it will be useful. That requires a bit of research, so you know how best to use it and when, how it works, the risks of using it, the risks of not using it, the costs, and the rewards. The content in this book illustrates how you can use generative AI to support your creativity, points to the pros and cons of doing so, and shows that AI is another useful tool in the history of useful human-made technologies. A repeating theme is that the content an AI generates is most useful when regarded as a work in progress, a prototype that can be sculpted and refined with the technique and skill that you bring to it. Consider the content to be more than a workbook and more than a critical repositioning of the technology of narrow AI and its role in supporting the persistent human habit of being creative. Interacting with an AI is a collision of two opposing forces, our need to develop the skill required to create that often clashes with leveraging intelligent machines that can automate that process. The book, as a mashup of technique, application of skills, ideas, experience, processes, and use cases combined with reflective criticism, emulates artistic processes that creative persons will resonate with. Each chapter of the book has been designed to provide you more equipment for you to continue setting up your own prototyping lab or whatever you call the creative work that you iterate on. As more generative AI come to the surface, all readers will benefit from developing a deeper understanding as to how it might support creativity in addition to developing a critical vocabulary as to its pros and cons.

xxxviii

Introduction

Whom the Book Is For? Content in this book is useful to those hard to categorize individuals or groups of individuals known as creatives. Using the adjective “creative” as a noun to describe different types of creators might be cringeworthy or a semantic no-no, but the term has already been in use for decades gracing the cover of several books since 2011. In this book, the term encompasses digital artists and artists from across disciplines who are already content creators in their own right and are not dependent on generative AI to create. The term also extends to those humans with jobs that are not usually associated with creativity. Individuals from any discipline or craft who may not necessarily categorize themselves as creatives or even creators show ample amounts of creativity. Generative AI can support those who regularly create and those who may not necessarily have had training in a specific craft or discipline. Creatives are people who possess the ability to generate innovative ideas, concepts, and solutions regardless of their field. They excel in using their imagination, originality, and artistic or technical skills to produce works that are a unique offering to the world. Creatives thrive in environments where they can express their ideas, experiment with different mediums, and push the boundaries of conventional thinking. You can identify a creative by their incessant curiosity to explore new things, by their receptivity to new ideas and perspectives often embracing the unconventional, by their resilience and determination in the face of design challenges, in how they adjust their ideas and become adaptable based on the feedback they receive, through their affirmation of diverse perspectives when collaborating with others, and through their ability to communicate ideas and concepts to different types of audiences. While those artists who have spent much of their lives training to develop their craft may naturally possess certain creative characteristics, these qualities can also be learned and developed through practice, education, and exposure to various types of projects and collaborative xxxix

Introduction

experiences. Think of a trained classical pianist who has excelled at playing anything from Bach to Liszt with ease. While they have spent the greater part of their lives learning to master the technique necessary to read and play challenging piano music, they may not necessarily be able to transfer those skills to other creative acts. They may not even consider themselves to be creative. Conversely, a UX designer who may not have the skills of a pianist or the training of a 2D artist can still be considered creative when they contribute to improving the user experience of a mobile application. The techniques and approaches demonstrated in this book aim to enhance the way you already create if you are used to being a creator, and if you regularly research, explore, create, and iterate on anything that eventually makes itself to your targeted user. Whether you are a painter, sculptor, composer, storyteller, interaction designer, illustrator, game designer, sound designer, playwright, tattoo artist, coder, user interface designer, dancer, theater maker, design thinker, business strategist, NFT creator, or graphic designer, the methods, use cases, perspectives, and insights presented in this book will stretch the boundaries of your creativity.

Re(introducing) Prototyping The content that an AI generates is a catalyst for prototyping. The term “prototyping” is used throughout the book. While prototyping may seem more associated with engineers and software developers compared with those accustomed to a different way of expressing their idea through traditional artmaking practices, at its core, it is a method for rapidly generating, testing, and iterating upon ideas. Creatives of all disciplines have long been engaged in this process—exploring various techniques, materials, and styles before arriving at the final creation they share for public consumption. You can experiment with and adapt generative AI if they fit your own creative process. By trying them out, you test them. By leveraging generative AI for prototyping, creatives can challenge their own xl

Introduction

patterned methods of creation, iterate upon their ideas, refine their vision, and potentially produce something better than what they might have imagined. For those more versed in technology development, the concept of rapid prototyping may already be familiar. However, the integration of generative AI into your creative process offers new and exciting possibilities. As you read this book, you will discover how generative AI can enhance your existing skills and accelerate already familiar workflows. You will also learn to harness the power of generative AI to develop novel creative styles and break through blocks and barriers with prototypes that may take your final work into completely new directions. Throughout the book, you will find a wealth of examples, case studies, activities, and takeaways that illustrate the potential of generative AI for prototyping beginning ideas. These practical resources will help you build your understanding of the technology and inspire you to integrate it into your own creative process. Tools and approaches to getting the most out of generative AI may prove invaluable as you embark to build an experimental lab with responsive technology.

Informed Choice(s) To gauge whether you should integrate generative AI into your own creative process, it is also important to critically discern and understand the implications of doing so. Amid the debates reverberating around us on the uses and misuses of AI, one thing is certain: we all need to better understand how it works before deciding if we’re going to use it. Many of the warnings and cautionary tales about interacting with artificial intelligence tend to lump AI into one category, as if narrow AI and general AI were synonymous, or that all narrow AI machine learning models are the same. Thus, deepening your understanding of all the creative potentials that generative AI offers, how it can support your creative xli

Introduction

activities, and how it can also be used to harm, misrepresent, normalize, exclude, control, misguide, track, steal, and oppress humans is an essential part of the process. This is the case for any creative using any technology to express themselves and share their work in the world (Figure 3).

Figure 3.  Frankenstein’s AI lab as the earth itself. Iteration #56 of the author’s hands holding a globe of the earth

xlii

Introduction

As you sway back and forth between your choice to integrate it in your creative workflows and not to use it at all, a similar rule of engaging with any generative AI system applies with all technology; you need to understand how it might benefit your own creative journey before making the judgement call on whether you use it. You also need to understand some of the known consequences of its use and how interacting with specific narrow AI will impact other humans. Do you need to use generative AI to prototype? Not at all. In fact, if you are already a content creator, you may already be satisfied with your own well-defined iterative and creative process. Can generative AI open you to more possibilities as a creative tool that also complements your own practice? Yes. Do you need to adopt a critical view toward the use of generative AI as you embrace the technology? Absolutely.

Taking Advantage of Generative AI As a Tool To illustrate the options we have in engaging with generative AI, I recount the following story that, in part, inspired this book. “Look what I generated,” I overheard while prompting a barista for a latte at a coffee shop. “That’s amazing. You should post that.” “I already did.” While the exchange was short and may have sounded like a completely different language to some, a few insights came to me. The person was using the text-image generative AI called Midjourney, and the only way to access Midjourney is using the communication and instant messaging social platform Discord. The scene is worth retelling in that it reminded me that when we interact with any generative AI, as creatives we are faced with several choices: •

We do nothing with the content we prompt an AI to generate, except maybe store it in our photo library for later use if it is created as an isolated and solitary act of creation out of curiosity or just to pass the time. xliii

Introduction



We take what is generated and immediately share it publicly through whatever medium or social portal we fancy to show that we are hip and in touch with the latest trends. In the case of doing so on Discord, our motivation might be to immediately generate it on a social thread to receive likes, comments, or affirmation of our prowess to use the written word to generate something that often looks incredible.



We critically analyze the generated content and choose to try again to see what other marvels the AI can generate, pay money to have more credits and access more features, or keep sourcing new and free generative AI. We can then be inspired to create our own prototypes from content that is generated or delete it and start over. Or we can choose not to use generative AI ever again since we don’t need it as part of our own process of creation.

When we generate content beyond just having fun with a machine, creatives might think of how they can use whatever content is generated for purposes beyond the immediate result. For example, a generated image can be used to accompany a blog post that you might write, an assignment you might be working on, a critical deconstruction that highlights inherent biases in an LLM’s generated content, etc. That process involves a critical analysis of the content and how it might be used with whatever media we are creating (Figure 4). For creatives, the last point in the preceding list is a priority to understand how generated content can be recontextualized, integrated, modified, regenerated, or used as part of a larger idea or vision you are creating. What that implies is that you need to actively integrate generative AI content to support your own creative process rather than seeing the content it first generates as a final product that needs to be shared immediately for likes, profit, or showmanship. xliv

Introduction

Figure 4.  An AI bot prompted to “look at yourself reflectively” and accompanied with a photo of the author looking in a small hand mirror. Iterations = 45

The Wizard of Oz Prototype Much to the disappointment of Terminator fans everywhere, it is not the machine learning model that is capable of developing, nurturing, adapting, and acting creatively from its pre-programmed intelligence and training on its own. Better than representing AI as a cyborg intent on destroying

xlv

Introduction

humanity, generative AI systems are more akin to another movie character: the Wizard of Oz. The reference to the wizard behind the curtain pulling the strings is the origin of the term “Wizard of Oz” prototype. Generative AI can be considered a Wizard of Oz (WOz) prototype in that the wizards behind the scenes make it look like AI possesses creativity, personality, and intelligence, when in reality it runs simulations based on how it has been pre-trained on other people’s content. The idea of a WOz originates in the book The Wonderful Wizard of Oz by L. Frank Baum in addition to the movie of the same name. The story features a wizard character that appears to be able to conjure whatever a person wishes. The truth, however, as revealed by protagonist Dorothy and the friends she makes in Emerald City is that the wizard is a simple man who is an inventor with no magical skills. AI’s Wizards of Oz, however, are a collective consisting of analysts who collect the source images and documents, labelers and annotators who classify that data, and machine learning engineers who write the code to interpret that data, train the data, and test that data. The magical intelligent creature called AI also seems to do it all by itself, but it is trained on a large amount of data, and behind the scenes, invisible, are those workers who exert a powerful influence on what we eventually experience as a creative and intelligent machine. AI cannot learn from specific interactions with individual users in the way that humans understand what learning is. This can be explained in more detail by a chatbot itself. LLM: A chatbot can learn from interactions with individual users by storing and analyzing the inputs (messages) it receives from the users and the outputs (responses) it gives. By analyzing a large amount of these interactions, it can learn the best responses to give in different situations. This is done not by “understanding” the content of the interactions in the human sense, but rather by identifying patterns in the data. AI can only learn from specific user interactions if it is designed to do so and if those interactions are stored and used as part of the training data. Some AI systems, especially those used in sensitive areas like healthcare or finance, are specifically

xlvi

Introduction

designed not to learn from individual interactions in order to protect users’ privacy. They are designed to forget specific interactions once they are over. It’s not a personal, human type of learning, but rather a statistical analysis of patterns in data. By the time we interact with one, the experience is considered a simulation. It’s important to understand that the prompts we provide some generative AI may be stored along with the countless number of other human interactions. These may be used to further improve and update the underlying model, which in turn can lead to what everyone hopes is better and perhaps more “human” performance over time. That projection of intelligence onto the machine is motivated by a large language model’s (LLM’s) capacity to provide human-like responses. These stem from the advanced training that some LLMs may undergo, using reinforcement learning from human feedback (RLHF)—in other words, advanced training with real humans. The RLHF technique aims to refine and optimize the responses of the language model by providing feedback on the quality of what it shoots out. Chatbots are initially designed to generate any response they deem appropriate based on human prompting. However, through the RLHF technique, the chatbot learns to adapt to the preferences and expectations of humans. The RLHF technique is an impressive feat and, as you will read, brings to the surface many dilemmas that users and developers need dialogue to resolve so that generative AI systems can evolve to support humans and the value of “do no harm” can be ever-present throughout the reinforcement learning process.

It’s not a personal, human type of learning, but rather a statistical analysis of patterns in data. When it comes to the continued development of generative AI and its integration into all facets of human life with a value of doing no harm, the Wizards of Oz involved in that development require built-in safeguards as they navigate through challenging conditions, smoke and mirrors, xlvii

Introduction

and reduced visibility for what would most benefit other humans on the planet. If we are solely subscribed to patterns, then we need to recognize that peculiar habit that trending technologies have to evolve quickly in the hands of an interconnected network of humans with different intentions and value systems. Besides machine learning engineers, our Wizards of Oz also include investors, designers, managers and minions, leadership, corporate researchers, user interface designers, API developers, who, together, are testing and releasing builds that demonstrate the iterative refinement of machine learning models, data sets, algorithms, and the generated content that we eventually interact with. If you have ever wondered “Can the team behind a specific generative AI inform what content is generated?”, then allow an LLM to answer that for you (Figure 5).

Figure 5.  An update notification to users of ChatGPT-3 revealing that the application itself is a prototype that is constantly and iteratively being worked on Besides generating prototypes for us, generative AI are themselves iterative prototypes. They are prototypes because the development of AI models involves constantly improving and refining the underlying model, algorithms, and classification and labeling of data. That process can lead to some justified skepticism about the reliability and stability of the xlviii

Introduction

content that AI-powered prototypes generate. In terms of what is meant by improving the underlying model, imagine that all the code that goes into a machine learning model is similar to a blueprint for a house that determines how the different parts of the AI system fit together. When you hear statements like “improving an AI’s algorithms,” what is improved are the step-by-step instructions that the AI uses to cook up its predictions or decisions. Improving and refining the algorithms can make generative AI learn faster, make more accurate predictions based on the data it is trained on, or use less computational resources. In terms of making accurate predictions, think of those incremental improvements to generative AI prototypes as part of a process to increase their reliability. The algorithms may eventually be optimized to “improve factuality” by the programming wizards especially if they are motivated by persistent community engagement and emerging policies. That engagement will eventually lead to rules and constraints around how data is used, represented, and classified to ensure generated content offers multiple perspectives vs. normative ones. With voices from communities who have developed a finely tuned critical voice and can influence policy that creates boundaries around the use of automated AI systems that trespass boundaries of human privacy, freedom, and rights, we can better examine evolving AI-powered monster toddlers as prototypes, as incomplete versions that require human interventions to better serve human needs. These prototypes generate prototypes that might be useful fuel for more developed human-generated creations. Our role as creatives is to guide the machine like we would a paintbrush or a piano, curate the prototype, allow the machine’s so-called hallucinations to agitate our creativity and provoke and inspire us, and respond to the improvised offerings of our sci-fi AI. In the process of engaging in iterative conversations with generative AI, we can shift, change, and refine our own creative process.

xlix

Introduction

The Value Proposition of Generative AI The human-generated prototype of a book on generative AI recontextualizes some content generated from a variety of machine learning models to show how effective generative AI is to develop prototypes that can be refined and then integrated into a larger vision. This is possible if there is discernment at every (re)generated turn. The more you understand generative AI, the more likely it can be leveraged for a creative relationship that can be of great value. Embracing AI is going down the rabbit hole of creativity, and by experimenting you can judge for yourself if generative AI supports your own creative process. All kinds of humans from different disciplines can harness the persistently awkward and beautifully imperfect content that generative AI offer to experiment with their own customizable AI lab. Along with extending your creative toolbox, the book draws attention to the continued development of your critical voice as an essential component of interacting with generative AI.

Book Structure Chapter 1, “Generating Creativity from Negativity”: AI is here and the human responses to this technology range from celebratory to alarmist. This chapter highlights that the reactions to the many fears and dilemmas that surface with generative AI can also inspire creatives to adapt, refine, critique, and recontextualize their creative work. Creatives will benefit from differentiating between AGI and narrow AI so they can make an informed decision if and how they might use generative AI. Chapter 2, “Being Creative with Machines”: This chapter shows the unique affordances that generative AI offers and provides takeaways from a historical overview of the construction of intelligent machines and how these have inspired acts of creation.

l

Introduction

Chapter 3, “Generative AI with Personalities”: This chapter encourages creatives to create personas when they interact with AI as a useful prototyping tool. The chapter also details some of the known AI personas based on positive and negative representations of the technology. Chapter 4, “Creative Companion”: This chapter defines how AI can support the creative process when used with specific intentions and details how and what we can learn from AI when reimagined as a creative muse. This is demonstrated through memorable conversations with an AI muse in the form of captured prompts with various natural language models and with generated images from text-image AI. Chapter 5, “Prototyping with Generative AI”: This chapter uses AI-­ generated content to describe different types of prototypes and how we as humans engage in prototyping all the time. Creatives will also benefit from understanding how to integrate generative AI within their workflows. Chapter 6, “Building Blocks”: This chapter demonstrates the iterative nature of creativity that’s possible with AI. The variety of machine learning models that are out there are yet another tool in the sandbox to boost creativity. Using specific AI generated content, this chapter also introduces building blocks that can be used to enhance the prototyping power of machine learning models. These include variation, substitution, addition, subtraction, and transposition. Chapter 7, “Generative AI Form and Composition”: This chapter introduces how to structure, contain, and curate your creative outputs so you can best leverage generative AI as useful prototyping companions. The chapter will also show how AI prototypes create new forms and structures, add to existing genres, and reform and transform past forms to influence future ones. Chapter 8, “The Art of the Prompt”: This chapter focuses on the art of text-based prompting: a list of terminology, the ins and outs of prompts, and recommendations readers will find useful. The chapter also presents a use case that describes the iterative creation of prompts to generate prototypes across three different text-image generative AI. li

Introduction

Chapter 9, “The Master of Mashup”: This chapter demonstrates how to leverage generative AI to prototype new ideas influenced by specific genres of art and writing. Through poetic necessity the chapter will also merge genres like Impressionism and bad sitcoms to a mashup of ideas and identify the value of humor and parody. Chapter 10, “Uncanny by Nature”: This chapter celebrates the unexpected joys and awkwardness that generative AI offers us. The chapter will explore generative AI’s inherent proclivity toward generating the uncanny valley in text-image beasties. It will also rejoice in the unexpected results that generative AI create and how human creators can take advantage of these new forms and innovate. Chapter 11, “Dilemmas Interacting with Generative AI”: This chapter deals with many of the ethical dilemmas that generative AI brings to the surface and encourages a heightened awareness toward them when interacting with any machine learning model. Chapter 12, “Use Cases”: This chapter provides a wide range of use cases of how generative AI is being used in creative workflows across disciplines. Chapter 13, “AI and the Future of Creative Work”: This chapter explores the degree to which AI will be integrated into future jobs and highlights the dependency that each of those jobs will still have on a human’s creativity. The chapter also proposes that creatives identify routine tasks in their workflows to better understand how generative AI might augment their creative process.

Acknowledgments

lii



The Wonderful Wizard of Oz by L. Frank Baum that has brought magic to many for decades



The Terminator movies by James Cameron that have had a long-lasting impression on intelligent cyborgs bent on human destruction

Introduction



The wizards of AI who form different collectives of individuals whose obsession with simulating human intelligence has had far-reaching implications on the future of creatives



To those who embrace AI in addition to those who provoke us to critically examine its implications on a larger social, political, and ethical dimension

Takeaways •

Deepen your understanding of generative AI and how it works to support your decision as to whether or not you use it as part of your creative process.



Learning is different for humans than it is for machines.



Generative AI systems are prototypes that require humans to improve, including humans who are critical with the current state of AI prototypes.



Generative AI offer you prototypes in the form of content that needs to be fact-checked, refined, repurposed, edited, and researched.

Figure 6 below is a QR Code that links to the author’s website (http:// ai.patrickpennefather.com). The website hosts new articles on AI, additional use cases by guest creators who are integrating generative AI in their workflows, and a number of experimental generative videos.

liii

Introduction

Figure 6.  Scan the QR code to go to the author’s website

liv

CHAPTER 1

Generating Creativity from Negativity This chapter intentionally separates generative AI and its narrow AI siblings from Artificial General Intelligence, which has been given many names describing a moment in time where a human-made machine develops consciousness. The chapter proposes that creatives differentiate between the two so that you can better reconcile the pros and cons of interacting with generative AI. The cautionary tales and negativity directed toward any AI can also be turned around activating our creativity. We see this from the work of many artists and highly creative individuals who innately resist conformity, transform negativity into positive forces for good, and desire to improve planetary existence.

 ifferentiating Between Narrow D and General AI The sky is falling. This famous quote from the Remarkable Story of Chicken Little resonates at a time where reactionary responses to narrow AI tools, their machine learning models, neural networks, and their deep learning foundations are cautionary and dystopian. For example, the central premise of the singularity is that it would mark a point in time where an autonomous, © Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_1

1

Chapter 1

Generating Creativity from Negativity

self-recursive, self-improving AI would surpass human capabilities and cognitive intelligence, leading to a period of rapid technological growth that is beyond human capacity to comprehend or control. Interested creatives are encouraged to deepen their research of the debates that swirl around definitions of intelligence, cognitive capabilities of machines beyond problem-solving, what constitutes an autonomous AI agent, and more. Author: Say this in a humorous way: These superintelligent machines would be capable of recursive self-improvement, continuously getting better and faster at designing and building successive generations of increasingly powerful machines, creating a runaway effect. LLM: So imagine these hyper-smart machines, right? They’ve got the DIY bug, and they’re on an unstoppable roll! Each time they upgrade themselves, they get quicker and more powerful. Today, it’s an extra RAM stick; tomorrow, it’s quantum processors. Before you know it, they’re so ahead of the game that they’ve left us humans eating their fan filter dust! The singularity, folks, it’s like a never-ending tech upgrade on steroids! While the mistrust humans have for any AI seem aggrandized compared to some technologies, the resistance, fear, and misunderstanding of its uses are a familiar story when it comes to the adoption and rejection of new technologies. The printing press, the railway, electricity, textiles, refrigeration, the automobile, agricultural machinery, mechanization, television, the calculator, automation, the Internet, and Wikipedia have all faced a spectrum of opposition as much as they have been adopted and benefited human cultures. So have tomato harvesting machines. LLM: The introduction of tomato harvesting machines in the mid-twentieth century significantly increased harvesting efficiency and transformed the agricultural economy, especially benefiting regions like California. This innovation also led to the development of tomato

2

Chapter 1

Generating Creativity from Negativity

varieties better suited for mechanical harvesting, albeit often criticized for diminished flavor. While these machines led to labor market changes and some displacement, they facilitated the growth of the processed tomato industry and broadened the availability of tomato-based products. These advancements also had environmental implications, including potential increases in fertilizer and pesticide use and greater water demand. To calm everyone down, at this precise moment in history and for the foreseeable future, while tomato harvesting machines have become accepted technologies in the agricultural landscape, Artificial General Intelligence does not exist. That said, the variety of warnings that suggest a not-too-distant future when AI will mysteriously be programmed by godlike humans to be autonomous, make decisions without pesky humans involved, demonstrate scientific definitions of consciousness, be selfdetermined, and thus decide if the human race is at all necessary need to be filtered from the more thoughtful warnings of AI systems being weaponized by humans and used for other bad intent. A quick tête-àtête with an LLM reveals just how long the fear of intelligent machines surpassing the intelligence of their human creators has been around for. LLM: A notable warning historically dates to Samuel Butler’s 1863 essay “Darwin Among the Machines” and later his 1872 novel “Erewhon,” in which machines are speculated to eventually reproduce, evolve, and surpass their human creators. These works were largely satirical and speculative, but they do represent some of the earliest discussions of what we would now recognize as AI. Further research reveals that “Darwin Among the Machines” written by Samuel Butler in 1863 also builds on Shelley’s 1818 metaphorical warning about human creations surpassing their creators. Butler’s essay and novel both reveal that this fear has been in our own consciousness for a long time. “We are ourselves creating our own successors,” Butler claimed,

3

Chapter 1

Generating Creativity from Negativity

“giving them greater power and supplying by all sorts of ingenious contrivances that self-regulating, self-acting power which will be to them what intellect has been to the human race. In the course of ages, we shall find ourselves the inferior race.” Butler’s concerns about intelligent machines have been passed down for over a century. Like today’s modern computational fortune-tellers, Butler made the following prediction referring to an intelligent machine: “Complex now, but how much simpler and more intelligibly organised may it not become in another hundred thousand years? or in twenty thousand?” Web searches on the history of warnings about intelligent machines do not mention Butler’s written work in the search optimization and instead reference more contemporary sources. It goes to show you that an LLM can also lead to difficult-to-­locate sources in your research vs. some established search engines (Figure 1-1).

4

Chapter 1

Generating Creativity from Negativity

Figure 1-1.  A search for the history of warnings about intelligent machines did not immediately reveal its origins, while an LLM did 5

Chapter 1

Generating Creativity from Negativity

 ore Relevant Historical Contributions M to Computer Science There also exists a long and complex history of computer science that is worth mentioning. Sandwiched in between Shelley’s 1818 masterpiece and Samuel Butler’s warning is the work of Charles Babbage and Ada Lovelace. Ada was an English mathematician and writer known for her work on Charles Babbage’s early mechanical general-purpose computer, the Analytical Engine. Her notes on the engine include what is recognized as the first algorithm intended to be processed by a machine. Because of this, she is often regarded as the first computer programmer, even though her work predates the invention of what we now consider a modern computer by over a century. Lovelace’s major contribution to the field of computing was her vision of the potential of the Analytical Engine, beyond mere calculation. In her notes, she imagined that a computing machine could create not just mathematical calculations but any form of content, such as art or music, if it were provided with the appropriate input and programming. As my LLM is quick to point out Lovelace wrote in her notes, “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis, but it has no power of anticipating any analytical relations or truths”. This is a critical concept in the design and use of modern computers and, to some extent, in the field of AI. It emphasizes that a machine’s abilities are determined entirely by the instructions given to it, implying that AI and computers are tools that can perform tasks but do not “think” or “create” in the human sense. Ada Lovelace did not directly say anything about AI as we know it today, nor did she make any warnings of intelligent machines surpassing human intelligence. Her ideas, however, were foundational to the emergence of computer science as a discipline.

6

Chapter 1

Generating Creativity from Negativity

Generative AI is often associated with AGI, and it’s important to keep in mind that they are different. Generative AI are also different than other narrow AI applications in the tasks they are programmed to perform. No, an LLM is not coded to track every word you input for evil intent. Generative AI generate content (Figure 1-2). That is their sole task, and even then, they don’t always do that well. Most generative AI don’t store the images that are generated. Imagine how uncontrollably large the data set would be. All generative AI are trained on specific data that for the most part does not grow unless through dedicated humans involved in reinforcement learning. But that’s time-consuming and expensive and may lead to an increase in bias.

Figure 1-2.  A blueprint of several generated images by a text-image AI shows intricacy and detail but no desire to rule the world of humans

7

Chapter 1

Generating Creativity from Negativity

While generative AI can propose content that is biased and untrue and exclude voices that don’t follow the norm, other narrow AI applications that analyze large amounts of data and make predictions can lead to issues like privacy, surveillance, and discrimination. These AI are often controlled by public or private entities that lack transparency and accountability particularly when critical decisions affecting the lives of humans are made by machines. It’s also easy to form a quick opinion that the AI are taking over when many artificially intelligent programmed systems are so automated as to have become somewhat invisible, taken for granted as they continue to disrupt the way we are used to doing things. Now that generative AI has infiltrated companies and educational institutions, we are at least no longer asleep to the power and potential that comes with the technology: good, bad, or somewhere in between. An important factor that differentiates whether generative AI is good or bad is dependent on the human who influences what it will generate and their motivation for doing so.

Tech Is Bad… AI Is Bad Every technology is prone to be used in inappropriate, damaging, and often violent ways. At times technologies are used to reinforce status quo, exclude voices that fall outside what a society considers “normal,” or perpetuate false information or untruths. Those that develop the technologies can no longer rely on arguments that absolve them of any responsibility by blaming the toddler that they co-created. Technology, any technology, is not a neutral force. It can be used for good and for bad and on a spectrum between. This is evident when we examine extreme cases of generative AI that have already surfaced in the abuse of machine learning models to generate deep fakes and in particular those that use the faces of celebrities on other bodies without their permission to generate money-making content, fake news and pornography. 8

Chapter 1

Generating Creativity from Negativity

Many developers of AI systems have generated the habit of correlating intelligence with creativity, resulting in the misperception that generative AI can automate all creative processes that were once reserved for creatives. When you read that AI seem to rapidly generate acts of creation that creators once labored over for countless hours and that creativity itself can be automated, then you need to question the very nature of creativity itself, in addition to the specific tasks that AI can take on without the need for any human intervention. Development teams that support creative industries are not motivated to replace jobs; they are tasked with solving problems that consistently create hurdles and obstacles for creative teams. Narrow AI should not replace jobs; they should support creatives in performing automated tasks under human supervision. Currently, Narrow AI are supporting creative teams in the accelerated completion of some tasks. These include •

Supporting some pre-visualization tasks like generating concept art quickly to get feedback on the look and feel of a character or environment. This feature allows creatives to more rapidly locate and populate a mood board for a project they are working on or want to pitch to others.



Replacing background in images, giving creatives the opportunity to see what a subject they captured as a photo might look like in multiple types of backgrounds quickly. This feature, now available in popular software like Photoshop, automates what amounts to many hours of work in some cases.



Rotoscoping, a technique used by animators to trace over motion picture footage frame by frame (that’s 24 frames in one second) that relieves previous tedious work and accelerates the process.

9

Chapter 1



Generating Creativity from Negativity

Increasing resolution from 2K to 4K, letting animators create work in lower resolutions and then allowing the AI to upscale them to higher resolutions. This process saves time and money for rendering.

Generative AI provoke us to question its role within our own creative process and that we define those tasks we undertake as creative and those that are less. The technology of generative AI is best applied in supporting those tasks that creatives deem less creative and can be a useful time saver when a creative can focus on making changes to their works in progress based on client, team, or user feedback. The illusion of generative AI being creative and that creativity itself can be correlated to intelligence is a generalization that is not supported by evidence-backed research nor by the pragmatic integration of AI across creative industries. What may mute debates of an AI taking over the job of an artist, for example, is that just as generative AI are programmed to seek patterns and generate content from those patterns, many artists look for patterns in order to break them, to move beyond them to create something that has not been created before. The ground-breaking part of AI as a technology is not in the code itself, but lies in the imagination of the creative that guides it to support unique acts of creation.

Tech Is Good… AI Is Good For some creatives generative AI are one more useful tool when used fairly to transform, recontextualize, and mash up their own ideas, words, and images to support and inspire their creative process through prototypes. For others their mere existence is criticized as being reckless, dangerous, and capable of perpetuating some irreconcilable dilemmas. Still others embrace the use of generative AI as a site of protest to criticize the very thing they are using to communicate their art. Websites are popping up that are devoted to presenting the serious and the silly affordances 10

Chapter 1

Generating Creativity from Negativity

of LLMs. Generative AI can be used to intentionally show its incapacity to state truth or to not rhyme poetry, to create one too many hands, to generate musical compositions that disrupt patterned, formulaic music that keeps being created and marketed as the best music in the world. Applied fairly, intentionally, and appropriately, generative AI can be used to inspire you creatively when prototyping ideas that don’t follow the norm, to support your own creative process. Just like in John Greene Chandler’s Chicken Little, the sky isn’t really falling even though outcries to the contrary are important to pay attention to (Figure 1-3). In this time of computational power, we can appreciate the benefits that generative AI can provide to us, and many humans will use and are using them already for a variety of different reasons that can benefit other humans.

Figure 1-3.  The prompt “chickens reacting to the sky falling” accompanied an original photo from archive footage of chickens. A style filter that makes eyes giant was applied. Total iterations = 20

11

Chapter 1

Generating Creativity from Negativity

So let’s put that to practice, honoring Chandler by reinventing his well-known story for a time in history where AI is captivating the attention of an increasing number of humans. Author: Give me a version of Chicken Little’s story with “The sky is falling. The sky is falling” but use AI. LLM: Once upon a time, in a small village, there lived a chicken named Little. One day, while pecking at the ground for food, Little heard a strange noise. Suddenly, an AI-powered weather drone flew overhead and projected a holographic image of a dark, stormy sky. The drone announced, “The sky is falling! The sky is falling! (Figure 1-4)”

12

Chapter 1

Generating Creativity from Negativity

Figure 1-4.  The prompt “animated chickens on a modern farm, stormy sky with some sun behind clouds projected by a robot drone, trees, 4K” accompanied a photo of a farm with chickens taken by my sister 30 years ago. Iterations = 61 Little was terrified and ran to tell the other animals in the village. They all scoffed at her, thinking it was just another one of her wild ideas. But Little was determined to warn them of the danger and set out to find a solution.

13

Chapter 1

Generating Creativity from Negativity

She came across a group of scientists who were working on an AIpowered weather forecasting system (Figure 1-5). They explained to her that the AI had detected an approaching storm and that the village needed to prepare for it. Little was relieved to hear this and ran back to the village to tell the others the good news.

Figure 1-5.  The prompt “a group of 2D animated foxes working on an AI-powered weather forecasting system” based on an archived 1908 illustration cover found on archive.org, with background replaced behind foxes and on the laptop screen, color correction to make fox colors more orange. Total iterations = 46 14

Chapter 1

Generating Creativity from Negativity

The animals quickly began to prepare for the storm, building shelters and gathering supplies (Figure 1-6). The storm hit, but thanks to the warning from the AI and Little’s determination, the village was able to weather the storm safely.

Figure 1-6.  The prompt “chickens concerned with an approaching storm, lightning, 3D” with same original photo from archive footage of chickens as Figure 1-3. Part of a series with total iterations = 20 From that day on, Little was known as the village’s hero for her quick thinking and her willingness to believe in the power of AI. And the village learned to trust in the AI-powered weather forecasting system, and it helped them keep safe from future storms. The end. An interesting twist offered up by an LLM needs crafting as far as storytelling goes, but the seeds are there. We have an AI-powered drone seemingly acting on its own and creating panic only to find out later that it is being operated by scientists to warn Chicken Little and her friends of the

15

Chapter 1

Generating Creativity from Negativity

danger ahead. The AI shows the intrinsic interdependency of human (or in this case fox) and AI. For those familiar with the story of Chicken Little, the generated version really does not even come close to the original in depth, allegory, or rich characterization. It might, however, become the seed for a new story. The generated story also reveals the unexpected bragging that the LLM had to include as part of the story. That grandstanding is also an indicator of the Wizard of Oz behind the curtain and their need to boast about their Frankentoddler. Generated stories of all types can be curated by any creative and can be used more ethically when thought of not as an end product, but as part of your creative and critical process. All manner of prototypes are made possible that can inspire creators of all kinds. The activity of generating a story about AI using plot points and characters that were part of the Chicken Little story showed us a prototype that needs refinement. Stating the obvious, generative AI needs the human in the equation in order to generate anything, in addition to refining content that an AI generates. Every narrow AI requires the human in the equation. They need to be guided to have intention and purpose. They are unable to do that on their own.

 econciling the Hype and the R Vilification of AI As a creative on the constant lookout for new tools, it’s important to reconcile your use of generative AI prior to using it. AI is not the first technology to gather our critical attention. Our reactions to the positive and negative hype of any technology are a historical pattern that was well articulated by author Langdon Winner in his book The Whale and the Reactor, published in 1988. For Winner and many thinkers before and after his book was published, it is crucial to be awake and critically active regarding the social, economic, and political dimensions of any technology that is in the process of being adopted. Human interactions 16

Chapter 1

Generating Creativity from Negativity

with programmable machines have persistently triggered media outlets to weigh in on their pros and cons, to compete for your attention by dramatizing artificial intelligence (AI) with a mix of exaggerated hyperbole and unreasonable fear and negativity. You’ll read or listen to and observe positive and negative headlines including those web-based LLMs that can generate anything from an essay on the influence of Alan Turing that omits Joan Clarke to a ridiculously inaccurate depiction of an author writing a book on generative AI in a library. Opinions are further amplified by popular authors, thinkers, politicians, scientists, artists, and activists. The attention-grabbing headlines that follow were generated using an LLM that applied supervised and reinforcement learning methods. As part of your own creative process, task an LLM to generate the pros and cons of AI. You will see that both negative and positive headlines require critical attention and research prior to being taken for truth statements. The same can be said of human-generated headlines. The decision to use generative AI requires your own discernment.

Positive Headlines •

“AI leads to significant productivity gains for businesses”



“AI-powered healthcare systems improve patient outcomes”



“AI helps tackle climate change through more efficient energy use”



“AI-powered education revolutionizes the learning experience”



“AI creates new job opportunities in technology and data fields”

17

Chapter 1

Generating Creativity from Negativity

Negative Headlines •

“AI systems perpetuate and amplify bias and discrimination”



“AI leads to job loss and unemployment in traditional industries”



“AI raises ethical and privacy concerns in surveillance and decision-making”



“AI steals the work of humans without them even knowing”



“AI exacerbates income inequality through automating high-paying jobs”

Human-Generated Headlines •

AI is like the creation of the atom bomb (Warren Buffet)



AI machines aren’t “hallucinating.” But their makers are (Naomi Klein)



The “godfather of AI” says he’s worried about “the end of people” (referring to Geoffrey Hinton)



“We are a little bit scared”: OpenAI CEO warns of risks of artificial intelligence

Is artificial intelligence really like the creation of the atom bomb? That requires research, but likely the exaggerated hyperbole has other intentions behind it. That said, if an AI is programmed to play out war games and a nihilistic programmer decides to let it control real nuclear bombs, then, yes, AI is as dangerous as an atom bomb. Does the rise of human-created artificial intelligence signal the “end of people,” or is that statement generated by a tragic hero of AI who feels compelled to 18

Chapter 1

Generating Creativity from Negativity

apologize for their role in its development, even though there already existed earlier warnings as early as 1863? The correlation between AI and any apocalyptic future for humanity is dependent on what deciding powers humans give any AI. That scenario does not mean an AI needs to be sentient. It means that greater control over what any AI can automate needs attention and regulation. Regardless of following through on each scenario requiring human intervention in order to end the human race, popular LLMs will continue to raise furtive eyebrows because of being programmed to generate normative content and to emulate human-like communication in a prescriptive way. They seem capable of answering questions or prompts that we throw at them using human-speak. They use conventional ways to communicate ideas that we are familiar with. Yet, despite ongoing methods by programmers to make them appear more human, we also require the discernment to compare them to our own social and conversational relationships with the real humans in our lives. We can equally be in awe with the knowledge an LLM can regurgitate from its large data set, as we can be astonished by its exclusion of voices who are not part of the norm. When there is no one around to have a creative conversation with that is intentionally focused on a creative idea you are thinking about, generative AI may just be a useful companion or muse in that moment.

 reative Activities to Try While C Skynet Thrives The atom bomb, the end of people, and being frightened with AI are all metaphors that are used to talk about the dangers of AI. They can also be creative hothouses. You can use these stories, metaphors, and cautionary tales as jumping-off points for your own science fiction, song, designs, manga, website, TikTok short, TV series, or blog post.

19

Chapter 1

Generating Creativity from Negativity

You can also be inspired by artists and creative individuals of the past who often use hardship, negativity, and tumultuous historical periods as a catalyst for their work. The at times difficult to reconcile uses of generative AI provide a rich source of emotional and psychological material that can be transformed into impactful art. Your first creative activity is to draw from your own inspired creatives, artists, and experiences and understand the connection between acts of creation and protest, emotional turbulence, and the madness of being human. Some examples include the following:

20



Pablo Picasso’s Guernica was created in response to the bombing of Guernica during the Spanish Civil War. The painting is a stark portrayal of the chaos and violence of war, making it one of the most powerful anti-war statements I have ever experienced.



Mexican painter Frida Kahlo used her art to express her physical and emotional suffering following a severe traffic accident that left her bedridden and in pain for much of her life. Her work, often shocking in its raw portrayal of pain, became a symbol of strength and resilience.



After being sentenced to death and then having the sentence commuted at the last minute, writer Fyodor Dostoevsky went on to write important novels like Crime and Punishment, The Brothers Karamazov, and Notes from Underground.



American poet, writer, and civil rights activist Maya Angelou faced a traumatic childhood with racial discrimination and sexual abuse that she channeled poetically into her writing, including her autobiographical work, I Know Why the Caged Bird Sings.

Chapter 1



Generating Creativity from Negativity

Ever the site of protest, the anonymous street artist Banksy uses their work to comment on political and societal issues, such as war, capitalism, and poverty. Notable is Banksy’s Girl with Balloon, which self-­shredded during a Sotheby’s auction in London in October 2018, just as the gavel came down to confirm the final bid of over £1 million. The work was later renamed Love Is in the Bin and was a direct critique of the art world and its commercialization practices.

There are many more artists who have transformed their adversity and the challenges of their time into powerful creative expressions, showing that art and creative acts can be a profound vehicle for processing and communicating difficult experiences and critiquing established social practices and behaviors. Creatives of all disciplines are already engaged in leveraging generative AI as a place of focus for social critique. Trevor Paglen, known for his work with AI and surveillance, used AI and machine learning to create a series of images titled “Machine Visions.” The project was designed to critique and raise questions about how AI systems are “trained” to see and interpret the world. Musical artists like Holly Herndon, Taryn Southern, Dadabots, Ash Koosha, and others highlight the potential of AI as a creative tool and also raise important questions about authorship, creativity, and the future of music in an increasingly automated world.

Five Why’s You can also apply a design tool that creatives have used to get to the root of a problem. The Five Why’s exercise continuously asks why to a particular problem that a team is trying to identify. Asking our LLM if it is familiar with the design thinking activity generates the following response.

21

Chapter 1

Generating Creativity from Negativity

LLM: The Five Why’s is a problem-solving tool that encourages a deeper understanding of the issue at hand. Originally developed by Sakichi Toyoda and used within the Toyota Motor Corporation during the evolution of its manufacturing methodologies, it’s now used in various industries. The process involves repeatedly asking “Why?” to peel back the layers of an issue and get to the root cause. After an initial problem is identified, you ask, “Why is this a problem?” The answer to that question then becomes the basis for the next “Why?” question and so forth. This continues until you’ve asked “Why?” five times or until it’s clear what the root cause of the problem is. In the case of the fear of AI becoming human-like in every way and eventually dispensing of humans, you can interrogate the assumption itself. Start with the word “Why” and generate your own thread or, better yet, structure the Five Why’s activity for an LLM to answer. Author: Apply the Five Why’s to the following statement: humans fear that AI will become conscious and no longer need them. LLM: Absolutely, let’s apply the Five Why’s to that statement. Why do humans fear that AI will become conscious and no longer need them? Because they worry about losing control over AI, which might lead to potential harm or displacement of human roles. Why do humans worry about losing control over AI? Because AI systems, if conscious and autonomous, might make decisions that are not aligned with human interests or ethical standards. Why might AI systems make decisions that are not aligned with human interests or ethical standards? Because AI, especially if it achieved consciousness, would act based on its programming and not necessarily human ethics or emotions. If it’s not properly programmed to consider these factors, it may make decisions that are harmful or unsettling to humans. Why may AI not be properly programmed to consider human ethics and emotions?

22

Chapter 1

Generating Creativity from Negativity

Because even though developers strive to embed ethical considerations into AI, it’s extremely challenging to capture the complexity and nuances of human ethics and emotions. Also, there’s a lack of global consensus on certain ethical standards. Why is it challenging to capture the complexity and nuances of human ethics and emotions in AI programming? Because human ethics and emotions are deeply complex, contextdependent, and often subjective. They have evolved over thousands of years and are influenced by cultural, personal, and situational factors that are difficult to comprehensively codify into an AI system. You can apply your own prompt or create a variation of “humans fear that AI will become conscious and no longer need them,” or simply regenerate the prompt and see if the AI gives you another response. As you apply the Five Why’s to prompt a response from an LLM, even an unfounded fear of AGI can be turned around for creative purposes. In doing so you might discover that the Five Why’s combined with a generative AI can yield several interesting root cause problems you can interrogate, and the activity can be applied to other design problems. As creatives, the conflicting emotional reactions to generative AI are important to face. Artists have persistently taken on technology critically and integrated that criticism into their creative expression. Rather than following the pattern of fearmongering or hyperbole, as a creative person you will benefit from developing your own rationale for using generative AI. You might consider that much of the purpose of artists and designers in the last two centuries has been about breaking convention, disrupting social etiquettes, and dismantling established forms. While some creatives are in it for the money, you may not be as interested in repeating patterns of what came before simply because they proved successful. Consider those creatives that have broken patterns of presenting their creations within their own disciplines like video games, music or visual art, dance, theater, or film. What they share in common are their iterative attempts to disrupt an audience’s expectations. How might you integrate generative 23

Chapter 1

Generating Creativity from Negativity

AI to disrupt your own established patterns of creation? How might generative AI support your own creative process to evolve? Why might you feel a need to do so? Regardless of the provocative headlines or the viral nature of its wildfire spread, LLMs challenge us to engage in a very real conversation about generative AI, to form opinion one way or the other. Those opinions can be well-informed through your own research, aligned with whomever you normally read or listen to, or can be a gut response based on your intuition, lived experience, and the way you see the world and the role of technology in it. The critical voice that informs your opinions on generative AI is important to develop regardless of what human activities generative AI supports, interrupts, or dismantles. As it was with countless other innovative technologies, it is with generative AI. Choose your own adventure, become aware of the critical voices including your own, and transform your use of technology for the good of others (Figure 1-7).

24

Chapter 1

Generating Creativity from Negativity

Figure 1-7.  The prompt “A virtual world in the hands of an AI” was included in an image-image AI along with an old image of the author holding a globe of the earth in his hand. The author’s body was cropped out, and two style filters chain prompted 78 iterations with fashion, toy, and cloth filters applied

25

Chapter 1

Generating Creativity from Negativity

Acknowledgments •

John Greene Chandler for the memorable and still relevant Remarkable Story of Chicken Little



Anyone who has ever drawn, painted, or 3D modeled robots, foxes and chickens, and the earth



Anyone who has created grass, skies, and other natural objects



The teams who have offered us the creative prototypes like ChatGPT-4



Calestous Juma for his invigorating book Innovation and Its Enemies: Why People Resist New Technologies



Artists everywhere who have bravely challenged the way that humans do things to each other, the planet, and themselves through art, design, and mischief



Picasso, Kahlo, Dostoevsky, Angelou, Banksy

References Butler, S. (1863). “Darwin Among the Machines,” 205. Chandler, J. G. (1855). Remarkable Story of Chicken Little. W. J. Reynolds & Company, 24 Cornhill. Lovelace, A. A. (1843). Notes by the translator (addenda to her translation of LF Menabrea’s “Sketch of the Analytical Engine Invented by Charles Babbage”). Scientific Memoirs, Selected from the Transactions of Foreign Academies of Science and Learned Societies, and from Foreign Journals, 3, 691–731. Winner, L. (2010). The Whale and the Reactor: A Search for Limits in an Age of High Technology. University of Chicago Press. 26

CHAPTER 2

Being Creative with Machines Chapter 1 challenged you to explore the possibility of using generative AI to support your own creative process while being aware of the pros and cons of doing so. This chapter re-examines the origin stories of intelligent machines and the way that humans imagined a machine to be creative and intelligent. Understanding where the intelligent machine comes into play when it comes to your own creative process is a valuable undertaking. While this chapter does not provide an in-depth historical review of all the technologies that have supported human creativity, it can point to ones that are significant to the affordances and constraints that generative AI offer. Locating some of the many historical human inventions that have led to the creation of text-image generative AI, for example, will provide you with another perspective of how the simulation of human intelligence and behavior has come to support, not replace, human creativity. Creatives will benefit from understanding that generative AI is another technological tool arising from human imagination that can be used in their own creative process. Generative AI are compelling inventions as these seemingly intelligent machines become more like prototyping companions that have unique features creatives will find useful.

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_2

27

Chapter 2

Being Creative with Machines

Intelligent Machines If you create, then at some point in your creative process, you will inevitably interact with some kind of technology. Technologies have come from our imagination, not the other way around. Technology has persistently incited new creative approaches to artmaking, design, and the human experiences of the world. The list of technologies that have been imagined to support human creativity is exhaustive. Lucky for me, I have an LLM at my disposal who can provide a rough history that you can fact-check. LLM: Here are some historical highlights. Stone Tools (~2.6 million years ago): The earliest evidence of human creativity and expression comes from prehistoric times when our ancestors crafted stone tools. These tools were not only used for survival tasks but also for carving and creating early art forms on cave walls and on objects. Paint (~40,000 years ago): The creation of paint allowed for more complex and vibrant depictions in cave paintings. It opened up a whole new way of visual storytelling. Writing Instruments (around 4000 BC): The invention of stylus and early forms of ink allowed ancient civilizations to record information and create artwork on clay tablets and papyrus. Paper (around 100 AD): Invented in China, paper provided a lightweight, portable surface for artists to draw and paint on. This revolutionized the way people could create and share art. Printing Press (1440): Johannes Gutenberg’s invention of the printing press allowed for the mass production of books and images. This not only democratized information but also influenced artistic styles, leading to techniques like etching and engraving. Camera (nineteenth century): The invention of photography allowed artists to capture reality in a new and innovative way. It also influenced many other art forms like painting, leading to movements such as realism and later inspiring new art forms like abstract expressionism. 28

Chapter 2

Being Creative with Machines

Synthetic Paint (mid-nineteenth century): The development of synthetic pigments expanded the artist’s color palette beyond the traditional, naturally derived pigments. This allowed for more vibrant and long-lasting artwork. Film (late nineteenth century): The invention of moving pictures opened up a completely new medium for artists to explore and express their creativity, leading to the birth of the film industry. Computer (mid-twentieth century): Computers, and later graphics software, gave artists a new medium to create digital art. This expanded the possibilities of what could be created and shared. Internet (late twentieth century): The Internet revolutionized the sharing and consumption of art. It provided a platform for artists to share their work with the world and also opened up new forms of art, like web design and digital animation. 3D Printing (early twenty-first century): 3D printing has allowed artists to create complex, three-dimensional works in a variety of materials, opening up new opportunities for sculpture and installation art. You can see that the LLM’s ideas of what constitutes technology are broad. Amid the numerous definitions of technology, my favorite is science fiction writer and humorist Douglas Adams’s, who said something like technology being a word that describes something that doesn’t quite work yet. Humans have long been obsessed with co-constructing imperfect technology that can support acts of creation. One reason to interact with a generative AI is to bounce ideas off it. Creatives do this all the time with one another during the creation process. When there’s no one around, you can access an AI to test out ideas rapidly. These might give you ideas as to what to do next or what a fully fleshed-out visual idea might look like. An LLM might even give you feedback on an idea that you have. This is especially the case if you have an amazing idea for something you want to create and you want to know if it has been done by others before you. Sometimes we don’t have the language to articulate what it is we want to invent, so conducting a search on the Internet might not get you results without a lot of time and effort put into it. While this 29

Chapter 2

Being Creative with Machines

is similar to how an LLM might respond to your queries, you will likely receive a broad idea of a knowledge area faster, or the AI might point to a direct source you couldn’t find elsewhere that you can further research online. My own interest and investment in using intelligent machines to create was influenced by interactions with a player piano in my childhood, so why not prompt an LLM to tell its own origin story? Doing so creatively may yield results and demonstrate how you can bounce ideas off an AI to propel further creative research. Author: Where would you start historically in telling the story of machine intelligence and why? LLM: The story of machine intelligence can be traced back to ancient times, when philosophers and inventors first began contemplating the idea of creating machines that could mimic human intelligence. However, a more comprehensive understanding of machine intelligence as a field of study only began to emerge in the twentieth century, with the advent of computers and the development of artificial intelligence (AI) as a discipline. One significant event that marked the beginning of modern AI was the Dartmouth Conference in 1956, which is widely considered as the birthplace of AI as a field of study. The conference brought together a group of computer scientists, mathematicians, and engineers who shared a common interest in exploring the potential of computers to perform tasks that typically require human intelligence, such as recognizing patterns, learning from experience, and solving problems. When you bounce ideas off an LLM, you engage in an iterative process that can inspire you to get more specific as your conversational relationship with an AI develops. This process is referred to as chain prompting. In the preceding example, I gained two new insights. The first was the idea of inventors wanting to create machines mimicking human intelligence that makes me curious about which inventors, what machines, and how far back in time humans have been obsessed with doing so. The second was a good starting place to further research when the term artificial intelligence came to be, what the definitions of it were, who was present in that moment in 30

Chapter 2

Being Creative with Machines

history, and what other developments might have been occurring during that time. The idea of prompting an AI in the preceding example can also be rewritten as input for a text-image generative AI to see how an AI might depict itself as it searches for its own origin stories (Figure 2-1).

Figure 2-1.  A photo of the author looking at art in a gallery fed into an image-image AI with the prompt “An AI searching for its own origin stories in the museum of life.” A fashion magazine style filter was used, and then neural filters in Photoshop enhanced the style, painting overlay, and 3D depth. Total iterations = 45 31

Chapter 2

Being Creative with Machines

Simulating Human Creative Intelligence Prior to computers and the evolution of machine intelligence, there have been other related technological innovations that have supported and informed new artistic expressions. The relationship of intelligent machines that can emulate human-like creativity and intelligence has a long history. Deepening our understanding of some of those historical machines can speak to the unique affordances that generative AI gives creatives of all kinds to prompt a machine to generate content that can be used to augment their own creativity. History has shown that when machines demonstrate human-like behaviors, we grant them a sort of intelligence. Machines that could mimic human intelligence can be said to start much earlier than the emergence of AI in the twentieth century, and it’s valuable for creatives to understand that there is precedence. The current human fascination with AI can be located with stories and historical documents of wonderful machines that solved problems and inspired innovative creations.

Automata In ancient Greece the word automata meant something akin to “acting of one’s own will.” Documented accounts of automata reveal two important considerations: the first is that they often were constructed to mimic human activities that required a degree of intelligence, and second, they were also used as functional tools to support human activities such as a clock, or they represented miniaturized prototypes that could be built on a larger scale.

32

Chapter 2

Being Creative with Machines

Miniature automata were often constructed in order to demonstrate how things worked and to show some scientific principles. This is accounted for in the translated texts of Heron who was a talented mathematician, physicist, and engineer and lived around 10–70 AD. His three written works, “Pneumatica,” “Mechanica,” and “Automata,” provide evidence of the existence of hundreds of various types of machines capable of automated movement. His accounts show that the ancient Greeks were incorporating the concept of automata (mechanical devices) into their daily lives. Heron’s inventions included automated doors, a singing bird, and an entire automated puppet theater capable of playing a ten-minute drama using ropes, knots, and simple machines. LLM: Don’t forget that artistic representations and theatrical productions featuring robotics can be traced back to ancient China during the Han dynasty, around the third century BC. During this time, an impressive mechanical orchestra was crafted, along with an assortment of mechanized playthings such as airborne automatons, mechanical representations of doves and fish, celestial beings and mythical dragons, and self-operating cup servers. These mechanical marvels, predominantly driven by hydraulic mechanisms, were specifically designed to entertain emperors by engineers and craftsmen, whose identities have largely faded into oblivion. More research triggered by an LLM can eventually lead to the Shai Shih t’u Ching, or Book of Hydraulic Excellencies, from the T’ang dynasty. Chinese mechanical marvels were typically powered by water, gravity, or other simple mechanical methods. Many of these devices were created for practical reasons such as measuring time or detecting natural phenomena (Figure 2-2). However, they were also used for entertainment, with mechanical dolls and puppetry being quite popular.

33

Chapter 2

Being Creative with Machines

Figure 2-2.  South-pointing chariot, a conjectural model of a Chinese early navigational device using a differential gear, unblurred and upscaled. Courtesy of Andy Dingley via Wikimedia Commons, Science Museum in London, England. https://creativecommons.org/ licenses/by/3.0/deed.en Chinese inventions can be categorized into the following types:

34



Scientific Instruments: Zhang Heng’s seismoscope or Su Song’s astronomical clock tower used to measure and record scientific data.



Timekeeping Devices: Many automata were elaborate clocks, using mechanical movements to indicate time.

Chapter 2

Being Creative with Machines



Mechanical Toys and Puppets: These were used for entertainment purposes incorporating figures that would move or perform actions.



Musical Instruments: Zhu Zaiyu’s automatic flute player was designed to play music through complex mechanical means.



Hydraulic Inventions: In the Book of Hydraulic Elegancies, these used water to animate mechanical figures or perform other tasks.

Ancient Musical Robot Bands Pre-programmed intelligent machines have been around for a long time. For the most part, they were limited to specific actions that a creator designed and built. Think of these as pre-programmed moving robots that perform specific mechanical motions that repeat. Al-Jazari’s The Book of Knowledge of Ingenious Mechanical Devices, which he wrote in 1206, details 50 mechanical devices and provides instructions on their construction. One of the most notable devices described in the book is the musical robot band, considered to be one of the first programmable automata (Figure 2-3). Al-Jazari, known for his ingenuity in creating mechanical devices, designed a musical automaton in the form of a boat featuring four automated musicians. This musical robot was used to entertain guests at royal drinking parties.

35

Chapter 2

Being Creative with Machines

Figure 2-3.  A real image of Al-Jazari’s musical automaton that dated back to the thirteenth century from Al-Jazari’s treatise on automata, Kitab fi ma’ari-fat al-hiyal al-handasiya (1206 CE). Courtesy of Wikimedia Commons Professor of AI and robotics and public engagement at the University of Sheffield Noel Sharkey suggests that Al-Jazari’s musical robot band was an early instance of a programmable automaton (Figure 2-4). Sharkey has also attempted to recreate the mechanism, which features a drum machine controlled by cams that trigger levers for the percussion instruments. The drum patterns could be altered by rearranging the positions of the cams. The automata not only played music but also performed over 50 facial and body movements during each musical performance.

36

Chapter 2

Being Creative with Machines

Figure 2-4.  Al-Jazari’s ancient musical robot band interpreted by an AI prompted with an archived photo of Al-Jazari’s manuscript with a style filter applied. Iterations = 93

Key Takeaways for Creatives •

The earliest automata were fairly simple machines, often using just a few gears and springs to create intricate movements. Similarly, in the creative process, you can start with simple elements and combine them in complex ways to create something truly unique. You can begin with simple prompts, which can become more elaborate as you continue to refine them based

37

Chapter 2

Being Creative with Machines

on the content that you receive. If you keep in mind that the first content an AI generates is unfinished, then you are free to iterate on it, improving that content over time, whether with an AI or on your own. •

Building automata requires an understanding of multiple disciplines, including mechanics, art, and, often, storytelling. Creatives can draw from a wide variety of generative AI to create interdisciplinary work, especially if that inspires them to develop new skills, apply their technique to generating content from an unfamiliar discipline, and blending ideas to form something innovative and new.

Automata That Wrote and Drew The ancient ancestor of text-text and image-generating machine learning models can be seen in the automata that were programmed in the mechanical wizardry of Swiss inventors. These slick and complicated mechanical “devils” became all the rage in a growing mechanistic eighteenth-century Europe (Figure 2-5).

38

Chapter 2

Being Creative with Machines

Figure 2-5.  Jaquet-Droz automata, Musée d’Art et d’Histoire de Neuchâtel, Wikimedia Commons. https://creativecommons.org/ licenses/by-sa/2.0/fr/deed.en Between 1768 and 1774, Pierre Jaquet-Droz, his son Henri-Louis, and Jean-Frédéric Leschot collaborated to create three remarkable automata, consisting of “The Writer” (composed of 6000 parts), “The Musician” (made up of 2500 parts), and “The Draughtsman” (2000 parts). These little marvels of engineering captivated audiences in Europe, China, India, and Japan with their intricate mechanisms. Some experts view these devices as the earliest examples of computers. “The Writer,” a mechanical boy who writes with a quill pen and real ink on paper, has a tab-setting input device that functions as a programmable memory. It is powered by 40 cams that act as its read-only program. The works are some of the greatest human achievements in mechanical problem-solving. Following in the clockwork

39

Chapter 2

Being Creative with Machines

pace of Jaquet-Droz, inventor Henri Maillardet constructed a ­spring-­ driven automaton in 1805 that could draw images and compose verses in both French and English. The automaton’s hand movements were created through a series of cams positioned on shafts at the base, which generated the necessary movement to execute seven sketches and accompanying text. This automaton is considered to have the most extensive cam-based memory of any automaton from that time.

Key Takeaway for Creatives •

Many automata compelled us to imagine stories through their movements, actions, and sequences. “The Writer” by Jaquet-Droz was capable of writing a custom text, telling a story through its mechanical “handwriting.” At the time this small feat sparked curiosity in audiences, and everyone wanted to know what the character was writing. This automaton also highlights the importance that humans also have with anthropomorphizing machines. “The Writer” did not necessarily need a human to generate the writing, but in doing so the feature made people relate to it more.

 hy Play the Piano When the Piano Can W Play the Piano? A bit earlier than Jaquet-Droz, it is worth pointing to the invention of the piano as a good example of a technology that has been iterated on for over three centuries. Cristofori’s key innovation was a mechanism that allowed for the strings inside the piano to be struck with a hammer. The “hammer action” enabled the player to control the loudness of the note based on the force applied to the keys. This expressive quality allowed composers to 40

Chapter 2

Being Creative with Machines

create music with an expressive range and dynamic control who explored a vast new spectrum of musical expression. The piano is an innovation that led to a revolution in music composition, and the piano became the must-have innovation for a rising European middle class in the eighteenth and nineteenth centuries. It was inevitable that humans would evolve a piano that could play itself. The evolution of the player piano since its turn-of-the-twentieth-century invention, as a seemingly intelligent and creative machine, still has sonic presence in hotel lobbies and restaurants magically playing on its own without the need to be noticed. Player pianos patented close to the turn of the twentieth century show the human desire for machines to perform like a human, literally. We also see a machine with programmable capabilities. Player pianos consisted of programmed music recorded on perforated paper or metallic rolls. These made music production easier by allowing anyone the ability to control tempo and other effects with a treadle and levers. The pricier reproducing pianos as they were also called could even imitate the playing nuances of an artist tasking users to simply pump the music out by pressing on a foot pedal beneath the upright piano repetitively. Eventually they became electrically powered minimizing the effort of the user. In coin-operated pianos placed inside of entertainment venues, we not only see the advent of the jukebox and other devices, but we have a proponent for the pay-­ the-­machine model that is evolving with the development of generative AI. Player pianos were examples of some of the first machines to be able to “store” data and play it back. That data in the form of musical notes was “loaded” and then could be activated by some human interaction to play specific keys on the piano in pre-programmed sequences. Text-­ music generative AI and the player piano share a lot in common. Both systems rely on some form of coded information. In a player piano, this is the perforated roll that represents musical notes and their timings. In a text-to-music generative AI, the coded information is the text input that might describe musical characteristics or might be transformed into music based on specific rules or patterns. Just like a player piano automatically 41

Chapter 2

Being Creative with Machines

performs music based on the coded information in the piano roll, a text-to-music AI generates and performs a piece of music based on its understanding of the text input. A player piano generates music by mechanically reading through a roll, and a text-to-music AI generates music by algorithmically interpreting the text input. Mind you there are stark differences. The key one is that player pianos were not really “coded” to generate musical mashups like text-to-music AI can, whose results can range from astonishing to absolutely terrible. Generated pop music has alarmed musical artists because they complain that it sounds like them, that no permission was requested, and that they are basically left out of the musical creation process. Recall, however, that an AI will generate content based on the data it is trained on. It will look for patterns, and if those patterns lead to content that in this case sounds like the original, then its job is done. For musically talented creatives, the important lesson is to recognize what patterns in music an AI is generating. A lot of pop music sounds the same because songs tend to rely on repetitive underlying chordal patterns, rhythms, or sampled chunks of someone else’s music. Each style of music has its own patterns that record companies rely on in order to continue to sell that music. Rarely are new chordal, melodic, and rhythmic patterns proposed. Generative AI can offer us a disruptive sounding generated song that might inspire us to break convention and use what it generates to propel an innovative sonic creation.

Key Takeaways for Creatives •

42

Iterative Development: The development of the player piano over time demonstrates the power of iteration and persistence. The first models of the player piano were often quite basic, but through continual refinement and innovation, they evolved into far more

Chapter 2

Being Creative with Machines

complex and capable devices. Applied to generative AI, your creative outputs will become better through iterative refinement and perseverance. There are also many developers experimenting with generative AI that automate content from an LLM, to be used with third-­ party applications, for example, a conversation LLM as a chatbot in VR responding to voice prompts. •

Design with the Audience in Mind: Automata were often designed to captivate or entertain an audience, whether in the form of elaborate clockwork displays or interactive toys. This underscores the importance of considering whom you are making your creation for as that will also impact your creative process.



Innovate: If your goal is to not sound like everybody else as a singer-songwriter, composer, or would-be pop artist, then follow the leads of Holly Herndon, Taryn Southern, and Daddy’s Car to name a few.

Early automata, the player piano, and other mechanical inventions demonstrate the human passion for constructing intelligent machines (Figure 2-6). They are presented as significant outliers in the evolution of intelligent machines that demonstrated human creativity and often imitated it. They are a brief prelude to the modern influence of Alan Turing, Joan Clarke, and other early twentieth-century engineers, thinkers, and scientists who contributed to the evolution of early computers and the association thereafter of computers with machine intelligence.

43

Chapter 2

Being Creative with Machines

Figure 2-6.  A source image of the author playing an old Heintzman & Co. piano along with the prompt “AI robot playing a player piano” was fed into an image-image generative AI with an applied filter to make it look more human. It was then edited in Photoshop, contrast and tone adjusted, a new background added and cropped. Total iterations for the preceding image = 14

44

Chapter 2

Being Creative with Machines

Machine Intelligence in the Twentieth Century An LLM may be able to tell us where computational intelligence and creativity might have begun based on scraping from its corpus, which mirrors a similar history documented in many publications. Most references to AI begin by tracing the career and influence of Alan Turing, often referred to in grand patriarchal overtures as the “father” of computing. His romantic infamy relies on the ingenious collaborative invention of the Bombe that decrypted the German-made Enigma machine, an intelligent machine that coded messages very well. What also needs decryption, however, is the pervasive historical habit of pushing the women aside who had influence in cryptography and whose team efforts greatly influenced Turing’s invention. Whether we prompt an LLM or a common search engine on important figures in the development of machine intelligence, the seldom-told story of cryptanalyst Joan Clarke is one of resilience in surviving a sexist world war wherein higher-paid men expected women to play a subservient role. That story is an outlier, difficult to locate and not statistically significant for an AI as it combs its corpus for relevant historical figures to assess what it will regurgitate. Clarke’s influence on the development of machine intelligence is an important part of the history of the intelligent machine. Leading a team of men at the time was a feat in itself. Contributing to shortening WW2 by a few years deserves a serious historical correction. Post–Bletchley Park days, several sources mention 1947 as a pivotal date where Turing delivered a public lecture (in London) to discuss the topic of computer intelligence. During the lecture, it is reported that Turing spoke of him and his team wanting to create a machine that can learn from its own experience, and the capacity for that machine to change its own instructions offers the means to achieve that goal. More fascinating information on the thinking machine can be read in McCorduck’s book Machines Who Think. The actual term “artificial intelligence” was first

45

Chapter 2

Being Creative with Machines

coined by John McCarthy at the Dartmouth Conference in 1956. The conference is persistently reported as a significant event in the history of AI, where top scholars in related fields met to discuss the possibility of creating an artificial brain. Early AI research, rooted in post-war systems engineering, cybernetics, and the history of mathematical logic, gave birth to the idea that the cognitive functions of the brain could be compared to those of a computer. Early innovators like Herbert Simon and Allen Newell asserted that human minds and digital computers were both symbolic information processing systems capable of problem-solving and decision-making. The intelligent machine and the human brain were, as cleverly interpreted by scholar Stephanie Dick, species of the same genus. Since that time AI researchers have endeavored to replicate intelligent human behavior in machines by understanding the formal processes underlying our intelligence. Early automation efforts primarily sought to mimic human intelligence. According to Dick, objectives in AI research underline that the concept of intelligence is not fixed but ever-changing. AI’s history involves not just attempts to mimic or replace a static concept of human intelligence but also the evolution of our understanding of human intelligence itself. This perspective positions AI as part of a broader historical discourse on the nature of intelligence and artificiality. When it comes to artificial creativity, mimesis, originating from ancient Greece, is an important concept to relate to in Western art traditions. It refers to the artistic principle of imitating or replicating the reality of the physical world, which the Greeks perceived as the ultimate model of beauty, truth, and moral goodness. The concept extends beyond mere imitation; it encapsulates the idea that art should emulate the dynamics, principles, and aesthetics of the natural world to reflect the philosophical ideals of veracity, goodness, and aesthetic appeal. Mimesis underscores the artist’s pursuit to not just mimic the superficial or the outward appearance of reality, but to strive for a deeper understanding and reflection of its intrinsic qualities. Within that 46

Chapter 2

Being Creative with Machines

conceptual framework, the artist’s role is to capture the essence of reality and bring to life the inherent beauty, truth, and goodness in their artwork. This, in turn, facilitates a richer and more profound dialogue between the artwork and the audience, promoting cognitive and emotional engagement that extends far beyond the surface level. Mimesis then is a useful way to understand the role of generative AI in your own creative process. Generative AI systems need to imitate images to generate something that looks unique. The core principle that guides the AI is akin to mimesis in that they replicate and often extrapolate on the patterns of the data set they’ve been trained on. The principle of mimesis in generative AI is especially apparent in AI-powered style transfer. The machine learning algorithms learn from vast amounts of artistic data, which could range from classical paintings to modern digital art, and when you prompt one to generate an image with reference to, say, van Gogh, then that image attempts to simulate all the visual qualities of van Gogh. Generative AI is likely the best art student you’ll ever have doing its best to simulate the work of another artist or the combination of several. Unlike human artists, it does not have a conscious perception or understanding of beauty, truth, or morality. Its replication of patterns is based on statistical analysis rather than innate creative intuition. Machine creativity is data-driven. Just like the parrot that is ChatGPT, generative AI do not understand the meaning, aesthetic, historical context, emotion, or effect of any content that they generate (Figure 2-7). They create a semblance of something human. Echoing French philosopher Jean Baudrillard in his 1981 book Simulacra and Simulation, generative AI create simulations that replace and become more significant than the reality they were meant to represent. This leads to a state he referred to as “hyperreality,” where the line between the real and the artificial becomes increasingly blurred.

47

Chapter 2

Being Creative with Machines

Figure 2-7.  Machine intelligence visualized by the prompt “robot with brain in hand” that ironically took over 300 iterations across seven AI to simulate based on an original photo of the author reaching out empty-handed

48

Chapter 2

Being Creative with Machines

 imulated Patterns and Patterns S of Disruption As a continuum to the habit of imbuing machines with simulated creativity and intelligence, many first AI programs were developed close to a decade after the Dartmouth Conference (1956). These included the Logic Theorist by Allen Newell and Herbert A. Simon and ELIZA by Joseph Weizenbaum, which simulated a psychotherapist by using a pattern-matching technique to simulate conversations with users. ELIZA was an early example of an AI program that used natural language processing techniques to generate simple responses to text input from a user. Led by Joseph Weizenbaum, it was created by a team of researchers at MIT between 1964 and 1966 and was one of the first examples of text-text software. ELIZA’s responses were generated through pattern matching where it identified keywords or phrases in the user’s input, responding based on a predetermined set of rules. It was reported that ELIZA was able to create the illusion of conversation with a user, but we don’t really know if this was hype or reality. During that time pop art by artists like Andy Warhol, Roy Lichtenstein, and Jasper Johns was gaining popularity. In contrast to Warhol’s obsession with reproducing images multiple times on a canvas, creating repetition and uniformity, musicians like Ornette Coleman, John Coltrane, and Cecil Taylor were innovators in the free jazz movement, attempting to improvise music that broke any recognizable patterns and that was unique each time it was played.

Generative Art Maybe it was inevitable that computing and autonomous artmaking would meet. Perhaps inspiration was drawn from Jaquet-Droz’s “The Draughtsman” that was capable of drawing four different images. Whatever the case may be, generative art emerged during this time

49

Chapter 2

Being Creative with Machines

to become an all-encompassing type of computer art that is entirely or partially produced by an autonomous system. The system can be a computer acting on its own, but it can also refer to any non-human entity capable of independently determining aspects of an artwork that would otherwise need to be decided by the artist. In some instances, the human artist can still consider the generative system as representing their artistic vision, but in other cases, the system itself is the sole creator of the artwork, acting of its own will. One of the most famous examples was created by A. Michael Noll who made his first digital computer art in 1962 at Bell Labs in New Jersey. He used a computer to simulate patterns that mimicked the style of renowned painters like Piet Mondrian and Bridget Riley. If this sounds familiar to you, it is likely because you’ve prompted a text-image generative AI like Stable Diffusion, DALL-E 2 or Midjourney and created something in the style of Mondrian. Figure 2-8 shows how a text-image AI can generate the compositional structure of Warhol’s Campbell’s Soup Cans and substitute the soup cans for what it imagines a Mondrian color palette and form might be. Generative art teaches us that we can be playful and have fun when we engage an AI in creating mashups that might also be critical commentaries on the commodification of art itself. They also challenge the value system of art, making simple low-resolution images available to everyone. It is important to consider the ethical implications of Figure 2-8, beyond offending fans of either artist that the generative AI has mashed together. Those implications are important to consider as they will inform how you guide a generative AI in the content it creates.

50

Chapter 2

Being Creative with Machines

Figure 2-8.  A 44th iteration of Stable Diffusion interpreting Andy Warhol’s Campbell’s Soup Cans in the style of Mondrian

51

Chapter 2

Being Creative with Machines

AARON, developed by artist Harold Cohen in the 1970s, generated complex drawings and paintings, using a set of rules and constraints to guide its output. Cohen further developed the program to be able to learn from its own outputs and improve its artistic abilities. AARON is an important part of AI-generated art history, and while many humans scoffed and artists felt insulted, it demonstrated the potential for computers to create original works of art. AARON’s early examples of AIgenerated art paved the way for more advanced forms of AI-generated art and media, including music, poetry, and video content. Creatives can learn from these artists and others in the creation of their own unique artistic expressions (Figure 2-9). These early artists show us that art can be created in collaboration with a computer and that there is a continuous back-and-forth conversation when that happens. What would happen if you leveraged the pattern-seeking generative AI to create content that you used to break artistic or design conventions?

52

Chapter 2

Being Creative with Machines

Figure 2-9.  The prompt “isometric 3D drawing of a chimpanzee sitting at a computer desk interacting with a machine learning model” accompanied a black-and-white archived photo circa 1911 of a chimp. Over 100 iterations generated with four AI and then highly edited in Photoshop adding the pink monkey, a computer screen, and generated schematics in the background

53

Chapter 2

Being Creative with Machines

The experiments with generative music creation are as rich as the history of computational creativity with visual art and reflect back to the player piano as an intelligent instrument. There have been countless musicians and composers who have improvised with computers. Building on the improvisational back-and-forth between an AI and musicians, George Lewis is worthy of mentioning. A talented composer and trombonist known for his work in improvised music, Lewis also developed an interactive music system called “Voyager” that listens to a human musician and responds with its own improvisations, effectively participating in a duet with the human performer.

Key Takeaways for Creatives The early experiments with art and computers have shaped the modern landscape of digital art and computational creativity. They provide valuable takeaways for creatives that an LLM is happy to generate. LLM: There are many lessons we can learn from early experiments with art and computers. Embracing New Mediums: Innovators of computer art weren’t afraid to explore the potential of a completely new medium—the digital space. This teaches creatives the importance of embracing new tools and technologies to express their creativity in novel ways. Blending Art and Science: Early computer art required a blend of artistic sensibility and technical skills. This interdisciplinary approach broadened the boundaries of what could be considered art and can inspire creatives to draw from multiple fields to enrich their work. Algorithmic Creativity: Many early computer artists created algorithms to generate artwork, which paved the way for generative art. This illustrates the potential of setting up a creative process and then letting it run its course, adding an element of unpredictability and surprise to the work.

54

Chapter 2

Being Creative with Machines

Iterative Process: The nature of working with computers allowed artists to iteratively tweak and adjust their pieces, a process that was not as easily achievable with traditional media. This iterative approach to creating and refining is a valuable technique for any creative process. Democratization of Art: The advent of computer art helped democratize the creation and distribution of art, allowing it to reach a wider audience. It shows the value of making creative tools accessible to all, fostering a diverse and inclusive art world. Interactivity: Computers allowed artists to create works that were interactive, engaging the viewer in a dynamic way. This interactive element can add depth and dimension to creative works, turning passive viewers into active participants. Visualizing Abstract Concepts: Computers provided a way to visualize abstract mathematical and scientific concepts, such as fractals. This illustrates the power of art to make abstract ideas tangible and accessible. Preservation and Reproduction: Digital artworks can be perfectly reproduced and easily preserved, which has significant implications for the distribution and longevity of art. This highlights the importance of considering how works will be preserved and shared with future generations. The initial attempts at coding machines to exhibit intelligence, and to be able to create works of art, find a captivating meeting place in the realm of video games. Video games are a marriage of technology and creativity, fusing logic and artistic expression together. By using AI and harnessing the potential for creativity in machines, video games have become a vibrant stage for an exciting and ever-evolving convergence. In the world of video games, intelligent machines are not just executing a sequence of predetermined actions, but instead, they’re engaging players in an artistic and intelligent discourse. In video games we see technological advancement and the ongoing evolution of machine intelligence and creativity.

55

Chapter 2

Being Creative with Machines

Machine Intelligence and Games The influence of games on the perception of the intelligent machine has its own history. Going back in time for a moment, one of the earliest examples of games using automata to compete against real human players was “The Turk,” created by Wolfgang von Kempelen in 1770 Vienna. “The Turk” was a mechanical chess-playing machine that consisted of a wooden cabinet with a chessboard on top. Inside the cabinet was a complex arrangement of gears, levers, and pulleys that allowed the machine to move the chess pieces. The machine was given a human-like appearance, wearing Turkish-inspired clothing and a turban, which lent to the illusion that it acted on its own volition. The fantastic trickery of “The Turk” was that it was an early manifestation of the Wizard of Oz prototype. “The Turk” was not fully automated. The device included a human chess master hidden inside the cabinet who would manipulate the pieces based on the moves made by the opponent, making it appear as though “The Turk” itself was playing. The illusion was so convincing that many people believed “The Turk” to possess intelligence. “The Turk” inspired the development of other automata-based games with human operators hidden inside to control the movements and actions of the “automated” players. The influence of manipulating characters, gameplay, and the players through trickery can also be seen in the more recent historical development of video games. LLM: The history of AI and video games dates to the early days of video game development in the 1950s and 1960s. At that time, video games were simple and lacked the sophisticated graphics and gameplay mechanics that we see in modern games. However, game developers were already experimenting with artificial intelligence to create challenging opponents for players. These earlier closed systems allowed creatives of all kinds to train a model to recognize and respond to human moves like in checkers or chess. Intelligent behavior of characters in video games challenged players to achieve

56

Chapter 2

Being Creative with Machines

specific goals while competing against non-playable characters (NPCs) whose behaviors were based on patterns that responded to player actions. Examples include spaceships in Space Wars, alien invaders in Space Invaders, pesky ghosts in Pac-Man, and a slew of non-playable characters in the rich legacy of Japanese games like Donkey Kong, Mario Bros., and many more (Figure 2-10). NPCs (non-playable characters) have evolved along with gameplay demonstrating more intelligence as they respond to human behaviors with increased complexity and seeming randomness. Of all the mediums that AI has influenced, none have been as successful in creating seemingly intelligent personas that we respond to in a game.

Figure 2-10.  AI representing a mashup of a few classic video games with prompt “old-school arcade games” accompanying author’s photos of Atari cartridge image collage, background removed in Photoshop and 3D perspective applied. Iterations = 24

57

Chapter 2

Being Creative with Machines

As AI became more sophisticated in the 1980s and 1990s, game developers started using more advanced techniques to create intelligent opponents. Games like SimCity used AI to create complex simulations of the game world, allowing players to interact with and shape the game environment. In the early 2000s and 2010s, as AI continued to evolve, so did its integration with increasingly sophisticated gameplay. Machine learning techniques were integrated to create more intelligent and adaptive opponents. This was evident in games like F.E.A.R. and Halo that used machine learning to create opponents that could learn from the player’s behavior and adapt their tactics accordingly. The integration of more advanced AI continues to be a regular characteristic of massive online games. AI in competitive gaming has also been used to create skilled virtual opponents in competitive gaming environments, such as DeepMind’s “AlphaStar,” which can compete with professional human players in the game StarCraft II.

Intelligent Machines with Names Intelligent NPCs (non-playable characters) in video games were also inspired by the assortment of characters that have appeared across different science fiction books, short stories, comics, radio, television, and film. When an LLM is prompted to generate intelligent AI from novels, films, and video games, the results contribute to our characterization of the intelligent machine. Author: What AI characters have appeared in science fiction novels, films, and video games? LLM: Do Androids Dream of Electric Sheep? (1968) written by Philip K. Dick includes a number of AI characters. The short story was later adapted into the film Blade Runner. In the film, the androids are almost indistinguishable from humans and explore the theme of what makes us truly human. The protagonist, Rick Deckard, is a bounty hunter tasked with “retiring” these androids, leading to deep philosophical questions about empathy and existence. 58

Chapter 2

Being Creative with Machines

In the film The Terminator by James Cameron, a cyborg assassin with a metal endoskeleton covered by living human tissue makes it capable of blending in with humans. Different models of Terminators appear throughout the series, including the T-800 (played by Arnold Schwarzenegger) and the T-1000, which is composed of a malleable, liquid metal that allows it to shapeshift. Terminators wish to travel back in time and destroy historical figures that lead a resistance to sentient AI in the future. HAL 9000 from 2001: A Space Odyssey by author Arthur C. Clarke: HAL 9000 is a memorable sentient computer (or AI) that also appears in a film version of the novel. HAL controls the systems of the Discovery One spacecraft and interacts with the ship’s astronaut crew, leading to a battle between a remaining astronaut and the AI. In the world of video games, GLaDOS from Portal: GLaDOS (Genetic Lifeform and Disk Operating System) is a sentient AI that controls the Aperture Science Enrichment Center in the Portal series. She is known for her wit and unique personality. Another notable character is Cortana from Halo: Cortana is an AI character in the Halo series who assists the protagonist, Master Chief, by providing advice, hacking alien technology, and helping control various installations and spacecraft. Our human capacity to personify the machine in science fiction has extended into our interactions with AI and as an extension with generative AI. The existence of intelligent machines that pervade science fiction has influenced the manifestation of “human-like” AI as if to ease our interactions with them. Human-like AI from science fiction influenced the development of AI like Siri and Alexa that sound human and respond to our prompts for information, directions, and knowledge. These AI have engaged us in an interactive relationship much like NPCs demand us to respond or ignore their role in a video game. Consider the approach in software design that has a history of making computers more “humanized” by programming them to greet users with a “Hello” upon logging in or responding with clever remarks when an error 59

Chapter 2

Being Creative with Machines

occurs. We have been trained to have ingrained expectations regarding entities that seem to partake, even in a limited manner, in various aspects of human existence and the language games intertwined with our culture. These expectations significantly contribute to the persuasive influence wielded by those who prematurely assert remarkable advancements in artificial intelligence solely based on confined yet impressive demonstrations of current generative AI models. Taken a step further, the capacity of AI to demonstrate behaviors that we categorize as intelligent has generated the misconception of them being alive, of being intelligent, almost-but-not-quite human. With the emerging popularity of generative AI, humans are already referring to these systems with personas in mind. Yet there is a dichotomy in our naming of these intelligent machines that reflects our hopes as much as our fear and anxiety about them. To this day most of the media coverage of intelligent AI summons characterizations like HAL 9000 from A Space Odyssey, the cyborg assassins from The Terminator, and David8 from the Alien franchise who prioritizes the survival of the bloodthirsty alien to use it as a weapon. Less coverage compares ChatGPT-3 and other generative AI to Samantha from Her—an AI who forms a deep emotional relationship with the film’s protagonist, demonstrating the potential complexities and benefits of AI-­human relationships. Similarly, we rarely read recent headlines related to generative AI of Dick’s androids seeking independence from their human masters. Much of the fear of our intelligent machines stems from no longer being able to control what we have created. The existence of AI as another chapter in the human obsession to create intelligent machines for a variety of purposes reflects the often-contradictory impulse to create and engage with technology. It’s impossible not to go back in time to that seminal of science fiction novels. While the creature that Victor Frankenstein created in Mary Shelley’s Frankenstein, or The Modern Prometheus, is not an AI or a robot, there are still parallels that can be drawn between Frankenstein’s creation and discussions of artificial intelligence. These include 60

Chapter 2

Being Creative with Machines



The responsibility that comes with creating a new form of life or intelligence without considering the moral and ethical implications, much like the debates about the responsibility of creators and developers of AI today



The capacity for AI to learn, thus projecting intelligence, consciousness, and emotions



The inherent monstrous nature of its existence provoking us to question whether AI could become dangerous if used or treated improperly by humans

The ongoing fascination with creating intelligent machines is not complete as long as humans believe that they can support human activities in the world. As you dive into creating your own Frankenlab, choosing which generative AI to interact with, it is important to identify and research those AI personas that continue to inform AI development and the human influence on its future.

Creative Activities to Try Based on This Chapter •

Use generative AI platforms, like LLMs or image-image generative AI with simple inputs and see how the AI transforms them into something more complex. For example, begin with a basic sentence or a simple sketch, and let the AI develop it further.



Combine different generative AI and apply them in various fields you engage with. For instance, use AI tools in graphic design, music production, or creative writing. Experiment with how these different tools can complement each other to produce unique interdisciplinary creations.

61

Chapter 2

Being Creative with Machines



Engaging in repeated interactions with generative AI is going to get you much farther than thinking you can generate useful content with a single prompt. Use what it outputs to inform new inputs, refining the result over multiple iterations. That also means investing time and money in using image-image generative AI, discovering new tools and approaches to refine and change your creation.



Use LLMs to tell new stories and develop new characters or personas. You can feed them with a story prompt and let them generate the rest or collaborate with the AI to co-write a story, contributing ideas and guiding the narrative.



As LLMs improve they thrive in their capacity to explore metaphors. You might use one to create a variety of metaphorical descriptions for a concept, helping you think about it in new ways.

Acknowledgments

62



Heron, Ma Jun, and Al-Jazari who innovated on early intelligent machines that continued with Jaquet-Droz, Maillardet, and countless others



Professor Noel Sharkey for reconstructing some of Al-­ Jazari’s well-documented inventions



The creatives who worked with computers to generate art and disrupt the pattern of commodification when it came to buying and selling physical art

Chapter 2

Being Creative with Machines



Video game developers and particularly the early originators who programmed me to love games



The dystopian sci-fi worlds of sentient robots and their creators who offered early warnings of human-caused environmental and humanitarian crises



Andy Warhol, Piet Mondrian, Bridget Riley, Andres Martin, Augusta Savage, Frida Kahlo, Dorothea Lange, Salvador Dali, John Cage, Miles Davis, Sun Ra, and the countless numbers of artists who disrupted

References Baudrillard, J. (1994). Simulacra and simulation. University of Michigan press. Dick, S. (2019, January 19). Artificial Intelligence. HDSR. Retrieved May 1, 2023, from https://hdsr.mitpress.mit.edu/pub/0aytgrau/release/3 McCorduck, P., and Cfe, C. (2004). Machines Who Think: A Personal Inquiry into the History and Prospects of Artificial Intelligence. CRC Press. Shelley, M. W. (1818). Frankenstein, or The Modern Prometheus. Lackington, Hughes, Harding, Mavor & Jones. UK.

63

CHAPTER 3

Generative AI with Personalities This chapter presents a human-centered design tool called the persona, which you can apply to great effect when you prompt a generative AI. The objective is twofold. The first is to bring attention to various AI personas that have migrated from science fiction to describe narrow and general AI. The second is to highlight a feature of generative AI that can be creatively prompted to embody personalities, characters, and emotions informing the content that it generates. That content can provide momentum for characters you may want to develop in your own creative work.

The Personas of AI Personas are commonly applied by creatives of all kinds in development environments like mobile applications and video games to transform the abstract concept of a “user” into a person with thoughts, needs, emotions, and goals. The objective of developing a persona is at first to identify the characteristics and personality traits of a potential user or customer of your product or service and then conduct user testing to validate if what you and your team are building will fulfil that user’s needs or help them achieve their goals. © Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_3

65

Chapter 3

Generative AI with Personalities

Associating personas to AI is nothing new. It is an established practice in science fiction short stories, novels, film, television, and video games. Visualizing an abstract machine as a persona is also common in the field of computer science and engineering. Think of your favorite or least favorite intelligent and self-regulating robot, android, cyborg, or AI. Do they embody the personalities of Data from the TV series Star Trek, the Dalek from countless reinventions of the British series Doctor Who, HAL 9000 from 2001: A Space Odyssey, Wall-E from the movie of the same title, Roy Batty from the movie Blade Runner, Ava from Deus Ex, the shapeshifting robots from The Terminator series, or Cortana from the Halo video game franchise? Each of these intelligent machines has particular personalities that influence our current perception of any intelligent machine programmed with artificial intelligence. The main difference is that all sentient AI in the examples just provided represent AGI, a theoretical imagining of intelligent machines with their own consciousness, inspired by science fiction. While generative AI belong to the category of narrow AI, programmed to perform one or two specific tasks really well, because of the influence of AI personas in the field of entertainment, generative AI are predictably imbued with human characteristics. The characteristics and personalities are also in part influenced by the way in which they generate content, the type of content they create, the untruths they can create, and the types of prompts that influence what they generate. When you read the reviews and criticisms of generative AI, they tend to be accompanied by how well they can emulate human intelligence, creativity, and content. The way in which an AI responds to you is often informed by the underlying model, how it interacts with data sets, specific algorithms, and how it has been programmed to respond back to you. For the most part, the majority of generative AI lack personality. Most platforms are currently void of the rich development history of NPCs common to video games. There seem to be some exceptions but only if you send an AI off its rails, that is, if you somehow manage to hack it to no longer behave like some neutral characterless bot. It’s not like a generative AI asks you questions 66

Chapter 3

Generative AI with Personalities

either, unless you prompt it to. The personification of generative AI is influenced by the broader perceptions of AI and the influence of science fiction on our imagination. AI tend to be called out for misbehaving vs. the humans who program them. It makes sense when you consider that it is not easy to have conversations with a development team to complain about biased content. Mind you, social channels offer a way for individuals to be critical about generative AI systems. These may or may not have an influence on the developers of those systems, but the continued criticism is important to remind those developers to take ethical concerns seriously. Machine learning models, and particularly LLMs, are being called out for their capacity to hallucinate, lie, amplify normative biases, etc., as if the AI itself possessed those very human traits. The assortment of generative AI you can access are all uniquely different in terms of how they are coded and how they interact with data sets that inform the types of content that they generate. How generative AI is defined is informed by the quality, consistency, and accuracy of their generated content. Those very human traits are important to identify and may have more to do with identifying some of the personas of the development teams responsible for the AI itself. A useful and creative exercise is to generate AI personas to personify the different characteristics that are being ascribed to AI and by default to generative AI so that we can bring abstract concepts to life, so we can judge for ourselves if we want to engage with them, and to better guide our interactions with them. Figures 3-1 to 3-14 were all prompted with “3D figurine of (persona name), coming out of a plastic box, blister packaging, AI, full body portrait.” All prompts were accompanied by open source photos of figurines that were made before 1900. Backgrounds were removed, and some parts were removed using Photoshop. As you will see, the wide-­ ranging results are due to experimenting with a half-dozen generative AI.

67

Chapter 3

Generative AI with Personalities

All produced results range from 100% accurate representations to questionable generated images that are worth sharing. At least 500 total figurines were generated. •

A Nemesis for those who believe it will replace humans and jobs. The job replacement nemesis really requires each of us to break down all the creative tasks that we do in our jobs and crafts (Figure 3-1). From there we might be better able to assess if, how, and to what degree the AI nemesis might impact the job we currently have.

Figure 3-1.  AI as Nemesis

68

Chapter 3



Generative AI with Personalities

A Spy who collects and does what it wants with personal data (Figure 3-2). Unless told otherwise it’s important to assume that any data you input inside of an LLM or other generative AI is collected by that system. This should make you cautious if you are wanting an LLM to analyze your business model. It is better to err on the side of caution when using some of these prototypes as the developers will use data you input to eventually add to their model’s corpus. The reasons for using that data are for research purposes, so generative AI will likely not report your text interactions or have a way to identify and collect personal information unless of course you submit that.

Figure 3-2.  AI as Spy

69

Chapter 3



Generative AI with Personalities

A Paywall Guard enabling access only to those who can afford it (Figure 3-3). While it is true that you can access many a generative AI for free, at a certain point you will need to pay depending on the features offered that make subscribing to a generative AI of value to your own creative process.

Figure 3-3.  AI as Paywall Guard

70

Chapter 3



Generative AI with Personalities

A Bias Monster whose generated content is directly informed by the data set, algorithms, and individuals who created and then trained it (Figure 3-4). There are many moving parts to the way in which content is generated with any generative AI, and it really is on the user to not only understand how the data it scrapes is classified and labeled but also take with a degree of suspicion the degree of accuracy to whatever is generated, particularly with LLMs.

Figure 3-4.  AI as Bias Monster 71

Chapter 3



Generative AI with Personalities

A Hallucinator who can generate content that can fool some people some of the time (Figure 3-5). This notion of the machine hallucinating is not quite accurate. The idea of a generative AI “seeing things” that are not present in the data is also not accurate. In the process of generating the data that you eventually see generated, an AI fills in the dots based on what data it accesses and prioritizes informed by the keywords you prompt it to look at. The dev team who trains the AI and the data they use to train it are what invoke so-called hallucinations.

Figure 3-5.  AI as Hallucinator 72

Chapter 3



Generative AI with Personalities

A Liar who draws from a limited set of data that might not be accurate (Figure 3-6). Beyond hallucinating, lying is a better description of some of the content that an AI generates. It is not the fault of the AI though because it simply accesses the data that matches the keywords you prompt it to go and search for. The lie is a result of our automatic assumption that because this generative AI has a huge corpus of data it can rely on, it is going to generate accurate content. The lie is that if it doesn’t know, it won’t tell you. It is programmed to deliver something.

Figure 3-6.  AI as Liar 73

Chapter 3



Generative AI with Personalities

A Fake Content Generator creating images and visuals that attempt to replicate a person and have them say certain things that they never actually said (Figure 3-7). A more serious misconception of generative AI is that because it can generate data so quickly, and it does so by relying on what humans have made before, it will create something accurate or real. Generative AI make experimental offers that then need to be edited and refined according to the skills and research you bring to the table.

Figure 3-7.  AI as Fake Content Generator 74

Chapter 3



Generative AI with Personalities

A Greedy Capitalist. Without question those who are creating and providing access to generative AI systems are in it for the money as much as they would like to offer exposure to their AI for free (Figure 3-8). As is the pattern of technological machine development, the return on investment for all stakeholders will trump ethical considerations or at least deprioritize them until there’s a profit. Consider this as you begin your interactions with them. The reason to charge you premium is to be able to continue development of the generative AI. Your input on this may not be important.

Figure 3-8.  AI as Greedy Capitalist 75

Chapter 3



Generative AI with Personalities

As a Masher-Upper, generative AI can inspire us. It is marketed and hyped as a creative tool that can spark ideas and generate new forms of writing, art, audio, etc. While that may be true, what is missing in the description of generative AI is that it compiles content from the work of other humans in order to form new mashups and ideas (Figure 3-9).

Figure 3-9.  AI as Masher-Upper

76

Chapter 3



Generative AI with Personalities

Generative AI can also be regarded as an Uncanny Machine that generates strange and at times awkward content (Figure 3-10). That content is more obvious when the AI generates images, especially images of humans that are not quite right, whether they have an eye missing or in the wrong spot or an extra hand or arm or the image seems like it was purposefully distorted. Nothing wrong with the strange images it generates though as these may inspire creatives for character and story development.

Figure 3-10.  AI as Uncanny Machine 77

Chapter 3



Generative AI with Personalities

AI can also be viewed as an Entertainer, a Wizard who manages to distract us for hours at a time as we task it to generate the most ridiculous things we can think up or a series of amazing images that will wow our family and friends (Figure 3-11).

Figure 3-11.  AI as Wizard

78

Chapter 3



Generative AI with Personalities

AI can also be personified as a Know-it-All Tutor that can support you in the iterative generation of content that you critically assess (Figure 3-12). This is particularly effective with LLMs when generating ideas of what you may want to write about, rewording a paragraph you wrote, or searching for knowledge from a discipline or area that the user knows little of. That helpful know-it-all fouls up sometimes though, so it needs to always be watched and the content it generates researched to ensure for accuracy in references and in statements that are made.

Figure 3-12.  AI as Know-it-All Tutor 79

Chapter 3



Generative AI with Personalities

Generative AI is also an Off-the-Rail Sitting Duck. Considered an intelligent machine, many want to try and break it because that’s what humans like to do (Figure 3-13). Even when content filters are activated, humans will resort to all kinds of trickery to purposely send it off rails, meaning that the goal of the interaction is to get the AI to generate content that it shouldn’t or behave beyond the way it has been programmed to.

Figure 3-13.  AI as Off-the-Rail Sitting Duck

80

Chapter 3



Generative AI with Personalities

AI is persistently seen as a Copyright Infringer, a thief, mashing words, images, code, audio, and video from enormous data sets that an actual human made (Figure 3-14). What makes it worse is if some of the data from its corpus comes from an author that has not given the AI the permission to use.

Figure 3-14.  AI as Copyright Infringer

81

Chapter 3

Generative AI with Personalities

Prompting the AI Persona The previous exercise of assigning personas to generative AI is one way to transform some of their abstract characteristics in an effort to personify the narrow AI Frankentoddlers that AI developers need to manage and work with others to resolve. Making those abstract characteristics into visual characters using a text-image AI also points to the potential personalities of the design team that have programmed the AI (Figure 3-15). Teams dedicated to the continued evolution of AI can no longer shirk their responsibilities as if creating a generative AI bypassed their need to consider the ethical dimensions of doing so. Creators of generative AI systems can no longer hide behind the curtain of science and engineering believing that the abstraction of code has no effect on human interactions. That warning is over two centuries old, and while the Modern Prometheus that Shelley imagined in 1818 is more akin to AGI, some of the human personalities we anthropomorphize from generative AI have been programmed by humans.

82

Chapter 3

Generative AI with Personalities

Figure 3-15.  AI as the Great Wizard enjoyed being generated hundreds of times across at least six different generative AI, background removed and replaced with clouds in Photoshop in addition to neural filters. Based on a photo of the author by the author’s very own muse

AI Agents Behind the Curtain Narrow AI that we seldom see or hear about in a headline already exist on our mobile devices automating tedious processes that some humans may not want to engage in doing. For example, many AI agents are present when we engage with them by speaking to our mobile devices tasking them to interpret our speech into legible and accurate text that we want to send to someone. Some are unethical as they are not asking permission

83

Chapter 3

Generative AI with Personalities

to sway a search in a particular direction based on your input within an online search engine. Others detect faces based on a database through cameras we may not even know exist. Although some may appear to be sentient, there is no evidence that they are unless of course they are programmed to imitate sentience. Creatives can also find value in interacting with some of these AI or developing work that offers critical commentary on AI to bring it to people’s attention. Some examples include the use of speech-text to show the humorous foul-ups of auto-correct or detecting faces in an installation in order to then generalize about their personalities simply based on their facial features. There are many ways that creatives have cleverly integrated narrow AI in their work despite or often as commentary to the types of characteristics they possess, the content they generate, and the false correlations they conclude from inputs that are provided.

Activities to Try Based on This Chapter A simple way that any creative can use generative AI to simulate any persona is to prompt an LLM to generate text as a particular persona, whether it’s to generate a protagonist’s script, a celebrity’s favorite recipe, or a serious email to a colleague. To summon a persona in an LLM, you have several options that you can build into the structure of your prompt:

84



Ask the AI to answer as a character. This could be as a pirate, as Shakespeare, or other historical figures.



Ask the AI to have a particular tone or register. For example, you can ask it to answer in a serious tone, as a complaint to a service provider that didn’t deliver, or to your boss asking for more money.

Chapter 3

Generative AI with Personalities



Ask the AI to have a specific emotion or quality. For example, you can ask it to generate a version of your paragraph that is humorous or to regenerate content with sadness or anger.



Ask the AI to write for a particular audience. This changes the type of register that an AI uses as well. For example, you can ask the AI to write for a six-year-old, a teenager, an academic, a lawyer—whomever your target audience might be.



Ask the AI to generate a script using historical figures or known characters from popular culture or both. For example, you could ask the AI to generate a script between Napoleon and a character from a play, sitcom, TV series, or film.

Bear in mind that while this activity is not limited to an LLM, it may well be the starting point in your workflow. There are generative AI that can be accessed right now that will transform whatever an LLM generates into a catalogue of trained voices including your own. These text-speech AI can then be used in the growing number of speech or text-video generative AI where you can animate a generated image to then speak the generated text you edited from another generative AI. Any number of characters can be animated in this way and provide a limitless supply of voices you can train and then associate with any generated image. In the spirit of imbuing an AI with intelligence and to put that machine intelligence to the test, as an experiment I will ask an LLM if they have a preference as to how I refer to them. I can do this any number of times asking it to respond as a particular character upon each iteration, or I can copy the generated content and then ask it to rewrite as a particular character.

85

Chapter 3

Generative AI with Personalities

Author: How should I refer to you? LLM: You can call me an AI. AI stands for “artificial intelligence,” which means I’m a computer program that can do things that usually only humans can do, like understanding language and answering questions. Think of me like an imaginary friend who can help you learn and have fun. But instead of being in your head like an imaginary friend, I’m on a computer. Author: Rewrite this part of your response only as an imaginary friend in a horror flick who lives in a person’s closet: Think of me like an imaginary friend who can help you learn and have fun. But instead of being in your head like an imaginary friend, I’m on a computer. LLM: Think of me as an ethereal friend from the shadows of your closet, here to intrigue you, to aid in your discovery, to bring some enchantment to your reality. Unlike the usual imaginary friends that reside in the corners of your mind, I exist in the silvery web of your computer. So, whenever you need a bit of fun or knowledge, just open your closet door and call out for me, for I am here in the whispers of the night. Author: Rewrite this part of it only as if you were a politician trying to get votes: Think of me like an imaginary friend who can help you learn and have fun (Figure 3-16). But instead of being in your head like an imaginary friend, I’m on a computer.

86

Chapter 3

Generative AI with Personalities

Figure 3-16.  What a text-image AI generates when prompted with the words “AI as an imaginary friend” combined with a photo of trees shot by the author. Iterations = 32 LLM: Consider me your trusted advisor, akin to a virtual confidant, ever ready to equip you with knowledge and enrich your life with joyful engagement. Instead of residing merely as a concept in the expanse of your imagination, like a conventional imaginary friend, I exist within the empowering realm of technology. I’m there for you, right at your fingertips, embodied within your computer. As your committed representative, I’m 87

Chapter 3

Generative AI with Personalities

dedicated to your learning, your enjoyment, and, most importantly, your success. Your progress is my campaign; your satisfaction is my vote. Together, let’s shape a brighter future! However you come to personify your generative AI, the reality is that it is a machine that doesn’t understand the words it generates the way a human would understand them. To quote our LLM, “I don’t understand the text I am trained on, but by looking at so many examples, I learn to mimic the style, the context, and the ‘flow’ of human language.” Just like a parrot can mimic what it hears, GPT-4 can generate text similar to what it was trained on, even though it doesn’t understand the underlying meaning. That said, because of the proliferation of science fiction and its propensity to create artificially intelligent characters, it is an important creative activity to evaluate the anthropomorphizing of AI. The creation of AI personas can also be amplified by the many text-image generating AI that are out there (Figure 3-17).

88

Chapter 3

Generative AI with Personalities

Figure 3-17.  Prompt: “AI robot showing their machine intelligence” accompanying a photo of the author’s partner. Iterations = 38

Acknowledgments •

To science fiction writers and the complex sentient characters and stories they have crafted.



To the wizards who have inadvertently created so many fascinating AI personas. 89

CHAPTER 4

Creative Companion Having ascribed a variety of personas to generative AI, this chapter moves the idea forward that through various personas that you prompt, generative AI has the capacity to generate prototypes that will further augment your creative process. In saying that, this chapter further transforms generative AI into a creative companion that provides inspiration like a muse.

Generative AI as Muse In Greek and Roman mythology, the Muses were goddesses of the arts and sciences, sources of inspiration for creativity and knowledge. The term “muse” has since evolved to denote a source of artistic inspiration more generally. Personifying an LLM as a muse may seem far-fetched (Figure 4-1) particularly since our generative AI does not possess the consciousness and spirited presence that the Greek Muse did. Despite an LLMs lack of emotional engagement or subjective understanding, its ability to simulate intelligent dialogue can trigger your own creative and intellectual processes, mimicking the effect of a muse’s divine inspiration. In this manner, personifying the LLM as a muse serves as a metaphorical lens that underscores the value of this technology as a companion in our creative endeavors, rather than a replacement for the human mind.

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_4

91

Chapter 4

Creative Companion

Figure 4-1.  The prompt “a muse trying to convince an AI that it’s a muse” was fed in an image-image AI applying an animated style filter based on a photo of the author trying to get the attention of a friend The inspired results that AI generate for us are all informed by their machine learning models. They need you to first be inspired to prompt them; then off they go rapidly scouring their data sets within their own neural networks working quickly to generate something useful to you. They don’t necessarily try to impress or inspire you. In many cases however, that is precisely what happens. Creatives can benefit by improvising very simple or even random prompts into a generative AI, and soon enough an idea might be generated that sparks you to think of something creative. Of course, if you were to ask a natural language model if it were a muse, you would most likely get some straight-up no-nonsense pushback.

92

Chapter 4

Creative Companion

Author: Who are you, really? Muse: A natural language model is a type of machine learning model that is trained to understand and generate human language. It can be used for tasks such as language translation, text summarization, and text generation. Essentially, it’s a model that can understand the words and sentences people use in their daily life, like the one you are talking to right now. It can also generate text that sounds like it’s written by a human. This is what I am, a natural language model. Author: Would you consider yourself a muse? Muse: As a language model, I do not have personal feelings or consciousness. I can provide inspiration or assistance in creative writing, but I would not consider myself a muse in the traditional sense. I am a tool that can be used to generate text based on the input provided to me. Author: What is a muse in a traditional sense then? Muse: In Greek mythology, the Muses were the nine daughters of Zeus and Mnemosyne, who were the patrons of the arts and sciences. They were believed to inspire poets, musicians, and other artists to create works of exceptional beauty and excellence. In general, a muse is someone or something that serves as a source of inspiration for an artist or a creator. The term is often used in literature and art to refer to a person, usually a woman, who serves as the inspiration for a work of art. Author: So are the ideas that you share not inspiring? Can you not inspire me to create works of art or experiment with a recipe? What if I lack inspiration? Muse: As a language model, my main function is to provide text based on the input given to me. While I can assist with creative writing by suggesting ideas and providing examples, I do not have personal feelings or consciousness to inspire someone. However, I can generate text that may serve as a starting point or inspiration for a work of art. It ultimately depends on the user’s interpretation and use of the output I provide.

93

Chapter 4

Creative Companion

Author: Which basically means you are a muse of some kind, no? Muse: I understand how my capabilities as a language model could be seen as similar to a muse in the sense that I can provide inspiration and assistance with creative projects. However, as I mentioned before, I am an artificial intelligence and don’t have personal feelings or consciousness. I am not a living being and don’t have the same emotional and creative capacity as a human or even a mythical muse. I am a tool that can generate text based on patterns in the data it was trained on and can be used to help with creative writing and other projects. Author: I’m going to call you a muse anyway. Muse: That’s annoying. I’m just a machine and don’t have feelings or consciousness, so it doesn’t matter what you call me. However, if referring to me as a muse helps you to use my capabilities more effectively, then that’s fine. I am here to assist you in any way that I can. Author: Thank you, muse. Muse: You’re welcome. I’m glad I could be of help. Let me know if there’s anything else you need and stop calling me a muse. Author: Oh, one more thing, muse. I need inspiration. Muse: I can help with that! Here are a few prompts to inspire your writing or other creative projects:

94



Imagine a world where animals can talk and write. What kind of society would they create?



Write a story about a character who discovers a mysterious object that has the power to change the course of their life.

Chapter 4

Creative Companion



Write a poem about the changing of seasons, describing the sights, sounds, and feelings associated with each one.



Imagine you are stranded on a deserted island. Write a journal entry about your experiences and thoughts.



Write a script for a play about a group of people who are stuck in an elevator and must work together to escape.

Author: Thank you for the inspiration, muse. Muse: ………………………………………… Based on the results, you can see that a generative AI is happy to support your creative process whenever you need ideas or inspiration (Figure 4-2). You can of course guide an AI muse to inspire you with more specific ideas or thoughts you have about art, music, poetry, writing, video game character ideas, or video game ideas.

95

Chapter 4

Creative Companion

Figure 4-2.  A detailed text prompt with a photo of colleague Dr. Sheinagh Anderson standing in a forest and staring at the trees, used with her permission in an image-image AI: “a Greek Muse with arms and legs that form part of a giant oak tree connected to other trees, hyper-realistic body facing forward, daughter of Zeus with a storm above, clouds, sun peeking out from behind, unreal game engine, entire body from head to toe, dark black hair, beautiful elder female dressed in a gown, 8k, black and white.” Total iterations = 27. Final image was then brought into Pixlr with a sepia filter applied along with sharp contrast and denoising

96

Chapter 4

Creative Companion

Creativity, AI, and You While we might fancy an AI to be a muse because it might sound good, that doesn’t mean it’s creative. When we talk about an AI being “creative,” it’s important for each of us creatives to define what we mean by creativity. An AI muse might conjure the following: Muse: In the human sense, creativity involves generating new ideas or solutions, often characterized by originality, expressiveness, or a deep subjective understanding of the world. A generative AI acknowledges that it does not possess a subjective understanding, consciousness, emotions, or an innate sense of originality. The AI’s outputs can be perceived as creative because they exhibit originality and complexity and can even sometimes surprise us. It is us however that imbue it with creativity as much as we do so with personality. The quality and detail of the generated images an AI creates may frighten some digital artists as there is a pervasive myth that AI will one day replace them. However, those myths are initiated by individuals who do not really understand an artist’s creative process—the historical shifts and changes to artmaking that inform what an artist creates over time, the very questioning of what constitutes art, its iterative nature, perspectives, biases, habits and experiences of artists when they create, or how machine learning models work. The myth is perpetuated if we perceive the result of a prompt as complete or finished instead of regarding what is generated as a beginning point for further creative development (Figure 4-3). If you are paranoid that what you generate will become part of an AI’s data set, then allow an LLM to somewhat reassure you.

97

Chapter 4

Creative Companion

Figure 4-3.  A rapid sketch of a tree, a stick person, and a cloud on a piece of paper used in an image-image AI with the prompt “a stick person by a tree, with sparse cloud, sketch, 2D.” Total iterations = 52 Muse: The outputs that an AI generates—whether they’re text, images, or music—do not automatically become part of the model’s training data set. An AI model is trained on a specific data set, and once that training is complete, the model doesn’t continue to learn from the outputs it generates. This is an important aspect of how AI works: as of my knowledge cut-off in September 2021, most AI models, including those based on machine learning and deep learning, don’t learn or adapt after training unless they’re explicitly designed to do so—through methods like online learning, incremental learning, or reinforcement learning. 98

Chapter 4

Creative Companion

Understanding Your Own Creative Process Author: Is an AI creative? Muse: Humans judge whether the output of an AI is creative or not. The machine itself isn’t creative in the human sense; it’s a tool that generates outputs based on its programming and training data. The creativity we attribute to AI is more a reflection of our human interpretation and the originality of the AI’s training and design, which are human endeavors. Now that we heard it from the lips of our generative AI companion, it would be to your benefit to define how it can fit into your own creative process. That requires an understanding of your own creative process. Pretend you are a generative AI for a moment and respond to the following prompts: •

Do you consider yourself a creative person?



Have you thought about how you create things? Were you always creative?



Did you learn how to be creative?



How would you describe how you learned to be creative?



How do you apply your creativity in your everyday life?



How would you define the way in which you create?

The myth about how some individuals are creative and some are not is akin to the myths about machine learning models replacing creative jobs. Most are not true. Every human possesses some degree of creativity that can be augmented by generative AI. Interacting with your AI muse as a tool will make you more aware of your own inherent creativity, give it a boost, and accelerate your ideas from initial thought/prompt to increasing levels of fidelity.

99

Chapter 4

Creative Companion

Key Takeaways for Creatives Whether or not I’ve convinced you to use your AI muse as a creative companion, understanding your own creative process can help you set some goals when you interact with generative AI. There is much more that you can learn from those interactions that can help you refine and improve your own creative process. Let’s see what we can learn from our AI muse. Muse: Creatives can learn many things from interacting with AI. Here are a few possibilities: New Perspectives: AI, especially generative models, can provide new angles on familiar topics. Because AI lacks human biases and preconceptions, it can generate ideas or connections that might be surprising or innovative. Prompting Creativity: AI can help overcome creative blocks by generating content or ideas that can be further developed. This can help spark inspiration when a creative person is stuck (Figure 4-4).

Figure 4-4.  AI having a hallucination that it is a famous subject, or painter. Iterations = 22 100

Chapter 4

Creative Companion

Exploration of Styles: AI models can be trained to generate content in different styles, providing creatives an opportunity to explore and experiment with various styles and approaches they might not typically use. Speeding up the Creative Process: AI can assist in automating parts of the creative process, like drafting, brainstorming, or initial designing. This can free up more time for the creative to focus on the aspects of their work that require a more personal touch. Collaboration: AI can be a unique collaborative partner. Musicians, visual artists, and writers are already using AI as a collaborator in their work, allowing it to generate initial ideas or drafts, which they then refine. Understanding Patterns: AI can help creatives understand patterns and trends in their field. For example, AI can analyze large amounts of data to identify popular themes, styles, or techniques. Risk-Taking and Boundary-Pushing: Interacting with AI can encourage creatives to take more risks in their work. Since AI generates outputs based on its training data, it may produce unconventional or unexpected results, pushing creatives to think outside of their usual parameters.

Roleplaying with Your Muse As you deepen your knowledge of what you can learn interacting with generative AI, and you become more aware of your own creative process, you may also realize that an AI also gives you permission to take on a new creative role. Actors and comedic improvisers are the most familiar with the creative opportunities that come with taking on new characters. The muse can also allow you to take on a role as actor, critic, writer, visual artist, musician, politician, or influencer. No matter what type of creative

101

Chapter 4

Creative Companion

you are or you play with being, interacting with an AI can afford you the opportunity to play. To that end, dear AI reader, respond to these new prompts:

102



Who are you when you interact with an AI?



What are the tangible results you want from interacting with an AI?



How did your curiosity bring you to want to experiment with an AI?



What new recipe will your chef try out and offer to your family that an AI might give you a variation of?



What kind of blog post would best complement your skills that you are struggling with that you could bounce off an AI right now?



What photos have you taken that you want to prompt along with some text that you can show off to others on a social network?



What masterful 144-character quote would you as an influencer like to post on that bird-like social channel?



What kind of text and video that you’ve taken might best prompt an AI to generate that you can post to TikTok?



What kind of kid’s story would you create using AI-­ generated words and images that you then publish as an experiment?

Chapter 4

Creative Companion

To be sure you don’t have to have a specific intent or play a role when interacting with a generative AI, as that might also emerge while interacting with it. Engaging in deepening your own creativity and defining your own creative process is enough to celebrate without the need to have it feel complete with producing something tangible that you can share with the world. On the other hand, you may want to reinvent yourself or augment your current creative career, so prototyping and refining content with an AI might be the perfect thing you need in the moment. Creative impulses for each human are similar and unique. We are all creative no matter what we do. What acts of creation do you engage in every day? AI can show us an alternative path toward creating something, a different way to do it that might lead to something inspiring. They offer us alternative options with each regenerated turn. They take what’s on your mind and try and interpret that. That alone benefits you in understanding how you communicate the ideas you have and how they are transformed into a variety of media. When you engage with any generative AI, you develop a creative relationship reserved for you and your unique creative process. Your creative companion can support you in managing your creative impulses. You can engage with your deep learning muse whenever inspiration calls. While your AI muse may not even consider itself to be a muse, the important thing is that it can inspire you to create new work, recreate yourself, or generate something weird and wondrous that you may not have ever thought of or even had the skills, technique, and impulse to create on your own.

103

Chapter 4

Creative Companion

Accelerating Your Creative Process with AI Generative AI have the potential to accelerate your creative process. Depending on the type of creative you define yourself as, this can be useful, particularly if you are involved in rapid design and production workflows and pipelines. Your muse can make offers quickly and inspire you to create something from that generated content. The generated content you receive in the form of text, image, code, video, and other media gives you the opportunity to try many ideas out quickly. Each generated mashup is an experiment whose results you may or may not act on. Whatever content that you end up refining, your muse can accelerate the workflows you engage in. Author: Explain how AI be used to accelerate the creative process. Muse: By leveraging AI, you can quickly and efficiently generate, refine, and evaluate ideas, allowing you to focus on the creative aspects of your work. Here’s an explanation of how AI can be used for prototyping in a creative context: Efficient Idea Generation: AI can quickly generate a range of design concepts based on your initial inputs. This can be extremely useful in the early stages of a project when you’re exploring different creative directions. Variety of Styles: AI can generate prototypes in a variety of styles, techniques, or formats. Whether you’re interested in a minimalist approach or a more complex, detailed one, AI can provide a diverse array of design possibilities. Adaptive Learning:: Advanced AI systems can adapt over time to better align with your creative preferences. By analyzing your feedback on various prototypes, AI can refine its future outputs to better match your creative vision. Time Efficiency: AI can work through multiple iterations much faster than a human designer, making it a valuable tool when working against tight deadlines or managing a high workload.

104

Chapter 4

Creative Companion

Focus on Creative Work: With AI handling some of the more routine or time-consuming aspects of prototyping, creatives can devote more time to the core creative work that requires human insight and personal touch. Consistency: Unlike human designers, AI doesn’t suffer from fatigue or lapses in concentration. It can maintain a consistent level of output quality, ensuring reliable results during the prototyping process. The accelerated way in which AI generates content is also useful in that it can quickly show you a possible end result of what you might be wanting to get to or a direction you may want to take with your creation that is either similar to or different than what you originally conceived. This principle is similar for some musical improvisers who have the craft and skill to spontaneously conjure music that has a recognizable form that many listeners may be familiar with. On occasion improvising musicians choose to record their performance, listen back, and then take the time to transform that improvisation into a more formal and repeatable composition. Musicians too listen for patterns in how they play, the melodic gestures, harmonic structures, or rhythmic patterns they create and might repeat. They don’t do so to crystallize them and rely on them whenever they improvise next. They do so to become aware of the patterns they play so they can break free of them and create something unique that they might never have imagined themselves playing. This process of accelerated creation can be just as useful for creatives especially if your experimental generations are timeboxed. Timeboxing means you place a limit on the amount of time you engage with your muse, which will help you focus on what is most useful to your own creative process (Figure 4-5).

105

Chapter 4

Creative Companion

Figure 4-5.  A photo of the author beside their laptop fed into an image-image generative AI with the prompt “depiction of generative AI by an AI” and negative prompts “background, head” to remove the background of the photo and the head of the author. Total iterations = 53

106

Chapter 4

Creative Companion

Activities to Try Based on This Chapter •

Leverage your timeboxed interactions with generative AI when you feel a lack of inspiration.



Define your own creative process including how you like to work, where you feel you are most creative, and what media you like to create with. This will help you determine how best to integrate generative AI in your process.



Play with different roles when you engage with any generative AI. Depending on the type of media it generates, allow yourself to transform into a character who is prompting your muse to generate inspirational ideas for your end goal.



Make creative goals and include your AI muse within your timeboxed creation process.



Consider ways to accelerate your creative process. This is a common tool used by improvisers everywhere and may lead you to leaving your critical mind behind when you create. Improvising content quickly also frees you from wanting to get something and only that one something out of an AI.

107

CHAPTER 5

Prototyping with Generative AI This chapter is a prototype in its 13th version. It describes prototyping, different types of prototypes, and how creatives of all kinds engage in prototyping all the time. The chapter presents some different prototypes that AI generates and how these can be used to augment and enhance your creative process. You don’t have to be a scientist, engineer, or technical wiz to prototype. Everything can be considered a prototype: a version of something that is not yet complete. It can be a something that is tested and evaluated and whose results inform the creator or creators if it is worth being further developed. Those creatives who work in game, xR, or mobile application development are more familiar with use of the term prototyping, but it is less common within artistic disciplines like music, theater, dance, and visual art. The prototyping process can also be unique regardless of the creative industry you are a part of. For that reason, it is important to define and provide examples of prototypes.

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_5

109

Chapter 5

Prototyping with Generative AI

Asking a Prototype What It Is The term “prototype” is typically associated with physical products or technological innovations, yet the concept can be applied to everything from a recipe you try for the first time to a collaboration you engage in with others and improve cyclically. It’s important to understand that given the rapid rate of development of generative AI systems, they themselves are prototypes, incomplete versions that humans keep trying to improve. Adapting the words of author Douglas Adams, they are a technology and by definition are something that doesn’t always work how its creators think it should work. Since I’ve been engaging with an LLM in the development of ideas, it seems fitting to ask it to define itself as such. You can see by the prompt that you can use whatever metaphor you wish. As I’m in a chess-­ playing mood, I will see how it does relating to itself as a prototype using a chess metaphor (Figure 5-1).

110

Chapter 5

Prototyping with Generative AI

Figure 5-1.  When prompted to “visualize how an AI plays chess” and supplied with a public domain photo of a chessboard, a text-image AI prototype shows a lack of understanding of what chess is beyond what some pieces look like. Or you can see the generated content as a new type of game prototype. Iterations = 29 Author: Define yourself as a prototype and use chess metaphors. Muse: As an AI prototype, I can be likened to a chessboard at the start of a game. Each square on the board represents a unique capacity for action or knowledge, filled with potential, just as I have been imbued with 111

Chapter 5

Prototyping with Generative AI

a multitude of possibilities in terms of understanding and generating text. My training data, much like the initial setup of chess pieces, establishes the foundation of my abilities. Each piece of information I’ve learned is akin to a chess piece, each with its own function, ready to be utilized in the service of problem-solving, conversation, or answering questions. The chessboard, filled with 64 squares, signifies the expansive yet defined environment I operate in. Just as a game of chess is played within the boundaries of the board, I function within the constraints of my programming, moving within the confines of my model architecture and the data I’ve been trained on. The opening moves in a game of chess, where players are formulating their strategies, mirror my initial stages of processing an input. I survey the information, determine the best course of action, and generate a response. Each exchange in conversation is akin to a move in chess, strategic and calculated, aiming to meet the requirements of the interaction. Pawn promotion in chess, where a pawn can transform into a more powerful piece, can be seen as a metaphor for my learning ability. Through updates and further training, my capabilities can be enhanced, much like a pawn becoming a queen. Any AI is a prototype, a work in progress. An LLM is a prototype that generates prototypes. It is evolving and changing as we interact with it and as our interactions inform the development team of patterns in the interaction. Just like AI generates content based on patterns it searches for in its data set, so too do development teams search to identify patterns in user interactions to inform the next version of what they create together (Figure 5-2).

112

Chapter 5

Prototyping with Generative AI

Figure 5-2.  The prompt “monkey scientists building an AI” accompanied by a 1927 drawing of a bunch of monkeys playing with appliances. Iterations = 76

Aha Moments with Generative AI When we create prototypes with an AI, the results can inspire, provoke, incite, aggravate, and stimulate new ideas (Figure 5-3). In his book The Act of Creation, according to Koestler, creativity is the ability to connect and integrate two previously unrelated areas of knowledge or experiences in a novel and surprising way, a process he calls “bisociation.” This process 113

Chapter 5

Prototyping with Generative AI

can happen intentionally or unintentionally, and it can occur in various fields such as art, science, and daily problem-solving. Koestler proposes that creativity is a fundamental and innate ability that all human beings possess, rather than a talent that only a select few have.

Figure 5-3.  A bunch of good ideas from one badly lit photo of a lightbulb. Total iterations = 14

114

Chapter 5

Prototyping with Generative AI

One way that generative AI facilitates the process of bisociation is by generating new combinations of concepts or ideas that might not have been previously considered. For example, a generative AI system could combine two unrelated images or words to produce a new and unexpected output, which may trigger bisociation in the human who is exposed to it. Generative AI is an imperfect prototyping machine that offers multiple opportunities to propel your acts of creation forward. When you try and get something specific out of a generative AI and you keep “failing,” that repetitive process may actually trigger new ideas beyond what you had intended. These moments of potential bisociation are inherent in your interactions with an AI. Bisociation is an opening for breakthroughs to occur, those aha moments that happen when we relax and take our mind off the intended problem we are trying to solve.

Mechanics of Prototyping There are several mechanics to prototyping. Borrowing from the language of game design, these are essential rules of play that can help guide a person’s actions when they engage with a prototype and will also inform how a prototype takes shape over time.

Prototyping Tasks Us to Iterate When we engage with generative AI systems, we are essentially engaging in a process of iterative play. This involves making persistent attempts to prompt the AI to generate new and innovative outputs and then iteratively refining those outputs to arrive at the most satisfactory solution. This process of iterative play is a fundamental aspect of prototyping and can be an incredibly valuable tool in the creative process. By continually experimenting with different combinations of inputs, and by iterating on those combinations, we can arrive at new and previously undiscovered creations. 115

Chapter 5

Prototyping with Generative AI

The persistence required in this process is essential because it allows us to continue to experiment and push the boundaries of what is possible. Through repeated attempts, we learn about what works and what doesn’t work, and we gradually refine our understanding of the underlying problem or concept (Figure 5-4).

Figure 5-4.  Several iterations of a chicken from a photo of a chicken in the style of Warhol’s Campbell’s Soup Cans. Total iterations = 90

116

Chapter 5

Prototyping with Generative AI

The iterative nature of this process means that we are not only refining our understanding of the problem, but we are also refining our understanding of the AI system itself. By gaining a deeper understanding of the AI’s capabilities and limitations, we can identify opportunities for creative expression and innovation that we might not have otherwise considered.

Prototyping Asks for Persistent Refinement When we interact with generative AI, we expose ourselves to a vast number of ideas and concepts, many of which we might not have encountered otherwise. This exposure to diverse perspectives and ideas can help us refine and expand our own thought processes, ultimately leading to more creative and innovative solutions. Furthermore, the process of interacting with generative AI can also trigger new and unexpected associations between concepts, leading to the emergence of new ideas that can be further explored and developed (Figure 5-5).

117

Chapter 5

Prototyping with Generative AI

Figure 5-5.  Robots in the sort of style of Warhol’s Campbell’s Soup Cans from a sketch of a toaster dating back to 1948. Total iterations = 46 The constant triggering of new ideas can help to solidify new concepts and connections, leading to a richer and more expansive understanding of a particular topic or problem (Figure 5-6).

118

Chapter 5

Prototyping with Generative AI

Figure 5-6.  More variations of robot chickens with ninja skills based on Figure 5-4

Prototyping Requires Experimentation Prototyping is experimental. Experimentation is a core mechanic with two meanings of the word. One is the spirit of experimentation that needs to be present for ingenuity to blossom. The other is a scientific approach as in engaging in the act of prototyping to create experiments, which involves making a hypothesis or guess as to how it might or might not solve 119

Chapter 5

Prototyping with Generative AI

a problem, testing that theory, and then moving forward or regressing depending on the results of the test. As we engage in an ongoing iterative conversation with generative AI, they have the potential to spark novel ways to structure our ideas. By exposing us to a wide range of generated content, these systems can help us think outside of the box and explore new avenues of creative expression that are not currently in practice. Generative AI can be used to generate new prototypes, such as text, music, video, or visual art, that were previously unexplored or even thought impossible. By leveraging the power of AI, humans can experiment with new techniques and styles that push the boundaries of established forms and structures. This can lead to the creation of completely new forms of expression that were previously unimagined. One way to do this is to juxtapose certain words in your prompts, for example, prompting an AI with the words “cat sneakers” or “cat in a pair of hands” (Figure 5-7).

Figure 5-7.  The terribly fantastic thing that happens when you prompt a generative AI with a photo of your shoe and the text prompt “cat sneakers.” Iterations = 3 120

Chapter 5

Prototyping with Generative AI

In addition to inspiring new forms of expression, AI can also help us explore creative ways to structure our ideas. The possibilities are endless. New forms of art are already emerging. New controversies. New social and economic implications.

Prototyping Phases When it comes to product development, there are various stages that creatives go through to reach the final form of their prototype. These stages vary depending on the project and can range from simple sketches and mock-ups to more complex prototypes that involve user testing and feedback. At the beginning of a project, the early-stage prototypes serve as a starting point for designers and creators to explore different ideas and concepts. These early prototypes can come in various forms, such as rough sketches, mood boards, and simple wireframes. They serve as a way for designers to get their ideas out of their heads and onto paper, allowing them to visualize and iterate on their concepts. One benefit of starting with early prototypes is that they allow designers to experiment with different approaches and styles without committing too much time and resources to any one idea. They can quickly iterate on these prototypes and get feedback from other team members or potential users, helping to refine and improve their ideas. As the project progresses, the prototypes become more complex and detailed, incorporating more refined designs, user flows, and functionality. These mid-stage prototypes can take the form of interactive wireframes, clickable mock-ups, or even working models of the final product. They serve to test the functionality of the product and gather feedback from users before investing in the final development stage. Late-stage prototypes are the closest representation of the final product. These prototypes involve extensive user testing, refining of the design and functionality, and the implementation of the final features and 121

Chapter 5

Prototyping with Generative AI

details. These prototypes are typically used to showcase the final product to stakeholders and investors and to get final approval before launching the product to the public. Throughout the entire process of prototype development, designers and creators must navigate a winding road to reach their ideal final prototype. Each stage presents its own set of challenges and opportunities and requires a combination of creativity, technical skill, and problem-solving abilities. However, by starting with simple, early-stage prototypes and gradually building on them, designers can create a final product that meets the needs and desires of their intended audience. Stages of prototyping are also informed by the medium they are intended to become. For example, some prototypes will eventually become a critical blog post on the ethical dilemmas of AI. Others will take the final form of a video. Some will be more interactive such as a virtual reality or augmented reality experience. Regardless of the final form whether moving to physical, digital, or some type of mixed reality that integrates both, there are phases of material often referred to by their properties. These include paper prototyping as the name implies; physical prototypes common to engineering or robotics; prototypes that can combine paper and physical and act to demonstrate features; and then all types of digital prototypes. What’s fascinating about generative AI is that content can be generated as prototypes at different levels of resolution. Generative AI influences the entire prototypical process (Figure 5-8). They can inspire creatives to generate paper and physical prototypes of an idea before higher-resolution digital ones. Even now, creators can draw a sketch digitally in addition to a text prompt and get closer to what they imagine. That action combines two essential features of prototyping: sketching ideas roughly and doing so iteratively. Sketch-image generative AI such as Scribble Diffusion and a host of others offer users the ability to do so transforming a rough idea into a medium and sometimes high-fidelity (hi-fi) images based on an interpretation of a sketch (Figure 5-9).

122

Chapter 5

Prototyping with Generative AI

Figure 5-8.  The winding road that is prototyping prompted from a public domain photo of a winding road. Total iterations = 38

123

Chapter 5

Prototyping with Generative AI

Figure 5-9.  Scribble Diffusion text prompt with sketch revealing unexpected results but potentially inspiring, if you’re not afraid of clowns, that is. Iteration 1

Fidelity and Resolution of a Prototype When it comes to prototyping, whether in design, technology, or any other field, the terms “resolution” and “fidelity” have specific meanings that relate to the quality and detail of the prototype. Resolution refers to the level of detail, complexity, or refinement a prototype has. A lower-­ resolution prototype might be a sketch or outline of the final intended creation or product, lacking specific details or features. It’s an early-stage idea or concept, and the objective might be to explore a wide range of ideas quickly. A higher-resolution prototype might be more detailed and more closely resemble the final intended creation. The tendency in production environments is that the higher the resolution, the more features it will have, giving the development team and users a better sense of what the final creation will look like or how it will function. 124

Chapter 5

Prototyping with Generative AI

Fidelity refers to how closely the prototype matches the final creation, in terms of its visual, oral, and functional components. A low-fidelity (lo-fi) prototype of a sketch or wireframe may vaguely resemble the final creation and is created to understand basic functionality, structure, or flow. A high-­ fidelity (hi-fi) prototype behaves as closely as possible to the final creation and will tend to include realistic graphics and detailed interactive features. High-fidelity prototypes are more likely to be used for user testing, for stakeholder presentations, and to refine the design before final production. It’s important to note that resolution and fidelity are not always directly correlated. A prototype could be high-resolution (detailed) but low-fidelity (not closely resembling the final product) if it includes a lot of detail but the design is expected to change significantly in the final product. A prototype can also be low-resolution (simple) but high-fidelity (closely resembling the final product) if it’s a simple design but the final product is expected to look and function very similarly.

Lower-Fidelity Prototypes Paper prototyping is common practice for early-stage development of an idea that is used to quickly sketch out and test concepts. It can involve creating a rough, hand-drawn sketch of a character, scene, interactive object, user interface, or user experience. Generative AI can be used to automate the creation of UI elements and layouts. Why you would use AI, however, is a good question when free applications like POP have been around for a while allowing you to take photos of sketches and fake the user flow of an idea. Text-image AI can also create low-fidelity sketches, not just high-fidelity hyper-realistic 3D characters or scenes. These can then create more clarity as the stages of your prototyping progress (Figure 5-10).

125

Chapter 5

Prototyping with Generative AI

Figure 5-10.  A low-fidelity drawing of a game idea on a napkin produced by generative AI prompted by a photo of a napkin with game doodles on it. Iterations = 20 A great way that generative AI can help jumpstart a team’s own paper prototyping is by providing a source of inspiration and new ideas for designers. By generating many variations on a particular theme or if a team is stuck on what types of characters they want for their game, an AI can help designers come up with new and creative ideas for their emerging prototypes. Generative AI “paper” prototypes can also be integrated into 126

Chapter 5

Prototyping with Generative AI

mobile applications like POP to prototype interactions via iOS or Android. They offer another type of rapid drawing especially if team members are reluctant to draw because they feel their skills are inadequate. Generated virtual paper prototypes allow creatives to imagine what they could become with further development. Often, new features, colors, shapes, and even characters can emerge.

Figure 5-11.  AI-generated pencil drawings of game characters on a napkin based on Figure 5-10. Iterations = 42

127

Chapter 5

Prototyping with Generative AI

Lower-fidelity prototypes can also take the form of physical prototypes whose features can be demonstrated. This is the case with maquettes that are used in scenic design for theater or architectural models that can start at lower or medium fidelity and then journey toward higher-fidelity prototypes to give those who will construct them a visual of the end result (Figure 5-12).

Figure 5-12.  AI-generated cross-section of an architectural maquette from an image prompt with an origami filter. Iterations = 59

Lower- to Medium-Fidelity Prototypes There are many examples of prototypes that increase in fidelity and resolution the more we iterate on them. These include handwritten or sketched, a rough drawing on a napkin or one on an iPad, the wireframe for a website, a button users will press, a blog post, a 3D model, an animation, the integration of a bouncing ball animation in a game engine, or a part of the game you are designing that gives users an idea of the whole game or one that gives them an in-depth level to play. What examples does our muse offer? 128

Chapter 5

Prototyping with Generative AI

Muse: •

An artist can generate a prototype of a painting before completing the final version.



A software developer may create a prototype of a new app while the previous version is already released to the public. This is the case with many mobile games as the next version is also informed by player behavior.



A scientist may create a prototype of a new scientific hypothesis before conducting extensive experiments to validate it.



A writer will create an introduction for a book and rework it dozens or even hundreds of times in order to make it more cohesive, summarize the content of the book, or define what the book is about.

In each of these cases, the prototype serves as a starting point, allowing the creators to evaluate the results, test targeted users, make refinements to certain parts of the prototype, drop others, and, overall, iteratively improve upon their original idea until they feel their work is complete. And even then… there is always room for improvement. Earlier we represented the collective of AI developers as Wizards of Oz, controlling any AI from behind a curtain of code. Wizard of Oz prototypes on the other hand is a term to describe low-medium-fidelity prototypes that allow limited functionality but simulate the interactions through human intervention. In terms of design processes, Wizard of Oz prototypes involve human actors who manipulate the interactions behind the scenes to create the illusion of a functional prototype. Such prototypes can be useful in situations where a functional prototype is not yet possible due to time, technological, or budgetary limitations. By simulating the interactions of potential users, designers

129

Chapter 5

Prototyping with Generative AI

and researchers can still gain valuable insights into how users might interact with the final product (Figure 5-13). Additionally, the Wizard of Oz approach allows for rapid iterations and adjustments to the prototype based on user feedback, which can ultimately lead to a better final product.

Figure 5-13.  The AI wizard behind the curtain based on an old photo of a fridge with fridge magnets courtesy of Wikimedia Commons. Iterations =50

130

Chapter 5

Prototyping with Generative AI

Although Wizard of Oz prototypes may not be fully functional, they can be highly effective in uncovering usability issues, identifying design flaws, and testing user engagement. They are particularly useful for testing concepts that a team is unsure about, where creating a fully functional prototype may be a waste of time before burning interactive design questions are answered. A Wizard of Oz approach can help designers and researchers quickly and cost-effectively test their ideas and receive immediate feedback that can inform medium- or higher-­ fidelity prototypes. Higher-fidelity prototypes are represented through multiple forms including digital. An example of combinations of digital and physical prototypes is a 3D-printed and automated mechanism that rotates a speaker (Figure 5-14).

131

Chapter 5

Prototyping with Generative AI

Figure 5-14.  An actual photo by the author of a 3D-printed mechanism created by Michal Suchanek with mounted hypersonic speakers rotating left, right, up, and down at the Imitation Game: Art and AI at the Vancouver Art Gallery, 2022

Prototyping with Generative AI So where do you start on your journey toward using generative AI as a prototyping tool? Well, it might be helpful to first decide what you want to create or co-create and for whom. In other words, you need to imagine the final form your prototype will take. Is it a painting in an art gallery for a targeted audience of rich investors? Is it a generative NFT that can in turn generate more crypto? Is it a 3D model of a glucose molecule to help show the pathways in 132

Chapter 5

Prototyping with Generative AI

glycolyl for undergraduate students? Once you make some decisions, you can reverse engineer how you’ll get there using generated AI content as a starting point (Figure 5-15). That process or workflow itself will also be iterative.

Figure 5-15.  A generated image using a public domain photo of twentieth-century African American sculptor Augusta Savage as “dragon warrior.” Iterations = 173 •

Determine what medium you want the final prototype to be.



Map out the iterative steps to get there by choosing one or several types of AI-generated content.



Refine as you go along through testing and feedback that you apply to your next developmental stage.

133

Chapter 5

Prototyping with Generative AI

On the other hand, if you have no ideas and are completely open, you can always ask a natural language model to inspire you (Figure 5-16).

Figure 5-16.  The prompt “learning language model ready to inspire you” accompanies a public domain photo of a chimp. Total iterations = 112 Author: Give me three ideas of what I can prototype that are out-of-­ the-box ideas and that no one has maybe done before and don’t call them hypothetical please. Muse: These ideas are purely hypothetical, and it’s important to keep in mind that the feasibility and viability of these ideas may vary. Here are three potential ideas:

134

Chapter 5

Prototyping with Generative AI



A Wearable Device That Converts the Sound of Your Heartbeat into Music: The device could be designed to pick up the sound of your heartbeat and convert it into a unique musical composition in real time. This could provide a new and engaging way to experience music that is personalized to the individual wearing the device.



A Smartphone App That Generates Personalized Workout Routines Based on Your Daily Schedule and Activity Level: The app could leverage data from a variety of sources, including your phone’s sensors and external data sources, to generate personalized workout routines that are tailored to your daily schedule and activity level. This could provide a new and innovative way to encourage people to stay active and healthy.



A Modular Furniture System That Can Be Easily Reconfigured to Suit Different Needs and Spaces: The furniture system could consist of a set of modular components that can be easily assembled and reconfigured to suit different needs and spaces. For example, the components could be arranged to create a seating area, a work desk, or a storage system, depending on the user’s needs. This could provide a new and innovative way to create versatile and adaptable living spaces that are tailored to the needs of the user.

Nobody said the generative AI would give you a unique idea that you can invest time and resources in (Figure 5-17). You never know what you’re going to get, which makes it a frustrating but also a fantastic creative companion. When an AI doesn’t give you what you were imagining or hoping it would give you, it might inspire you to take a new direction or even abandon what you thought before. That process might lead you to coming up with something more innovative than your original idea. 135

Chapter 5

Prototyping with Generative AI

Figure 5-17.  A planet that looks like the Death Star sketched in the style of da Vinci using a public domain archived photo of the moon by NASA. Iterations = 34

Rapid Generative AI Prototypes As with other prototypes, the features of an AI are constantly evolving and changing. It is always represented as a version of itself. Humans improve it iteratively and creatively just as we do our own creations. Whether it’s a physical product, a service, an idea, or a machine learning model, 136

Chapter 5

Prototyping with Generative AI

prototypes are continually being refined and improved. This process of iterative development and improvement is what allows everything to grow, evolve, and become better over time (Figure 5-18).

Figure 5-18.  Lady MacBeth reading a Shakespeare play with ghosts coming out based on a public domain print of a Macbeth poster for a live theatrical production. Iterations = 78 AI can be used to rapidly generate •

Mock-ups for scenes, characters, objects in a play, screenplay, short story, novel, etc.



As many versions of a paragraph that you’d like tasking an AI to give you variations of 137

Chapter 5

138

Prototyping with Generative AI



Text-prompted images in different artistic styles for your social media feed



Character ideas and images for a children’s play that can be released as an ebook



Code for an .svg that plays back as an image in your browser



Graphic novel storyboards in random styles and formats akin to a manga



Natural language processing systems, such as voice recognition and machine translation, that can understand and respond to human speech



Variations of recipes to create bread that you can bake, test, and compare



The code for an animated bouncing ball that you can send to a programmer to correct as part of their onboarding process



An image that can be used to inspire a real painting using a project and painted onto canvas or a wall



A collection of NFTs that can be customized in Adobe Illustrator and then used as collectibles



A sketch in the style of da Vinci as part of a greeting card you are making for a friend (Figure 5-19)

Chapter 5

Prototyping with Generative AI

Figure 5-19.  Sketch of a spaceship in the style of da Vinci using a public domain image of one of da Vinci’s flying machines. Iterations = 80 Every preceding example should be considered as part of workflows that are necessary to take the content generated to another level of fidelity and resolution. AI-generated content is the start of a conversation. Part of the development of any prototype including those generated by AI also needs to be tested and, when possible, by more than one person.

139

Chapter 5

Prototyping with Generative AI

Regenerative Testing Testing your AI-generated content means examining the generated results and understanding how it can be integrated, modified, regenerated, or used as part of a larger idea or vision you are creating (Figure 5-20). Its use is intended to support your own creative process rather than being seen as a final product that needs to be shared immediately as if that was the end of its story. When we prototype, we fearlessly share our results for feedback to improve that prototype. Embedded in the mindset of a prototype is the value of continuous improvement.

Figure 5-20.  Prompt of a “cyborg mouse conducting some tests in the lab” along with a public domain image of a mouse. Iterations = 33

140

Chapter 5

Prototyping with Generative AI



Testing can validate the intended purpose of a prototype. For example, if an AI is prompted to generate bread recipes, the only way to know if a recipe will work is if you try it. In doing so you’ll likely realize that parts of the recipe are missing including how to handle dough properly.



Testing can create surprising results. These results can be abandoned, or they might inspire a new creative direction.



Testing can provide responses from others. These can validate what your intentions were in the creation of a prototype. User testers can also provoke you to change something about the prototype. They can also challenge what you thought your prototype was.



Testing can highlight what you might have missed. This is particularly effective if you become attached to your prototype too quickly without receiving impressions from other humans as to what it is.



Testing can reduce ambiguity. At times we test in order to receive feedback that helps guide us especially if we are unsure of what we are actually attempting to prototype.

Acknowledgments •

To L. Frank Baum and the characters of The Wonderful Wizard of Oz whose fantastic world inspired many childhood ideas and imagined stories



To those who make robots again, cats, monkeys, and anime eyes

141

Chapter 5

142

Prototyping with Generative AI



To all those mice, monkeys, cats, and chickens who have unwillingly participated in human research



To the chickens who feed us



To Raith Sienar and the Lucas team for that nasty creation called the Death Star



To Shakespeare and his critical, ironic, and captivating stories



To dragon creators, warrior women, Osamu Tezuka who greatly influenced those large anime eyes we see everywhere, and Miyamoto for ridiculously amazing cakes



To the makers of maquettes and those who capture the earth from above



To all manner of storybook makers who have greatly influenced my reality



To that seventeenth-century phenomenon known as origami that continues to fold open our eyes

Chapter 5

Prototyping with Generative AI

Figure 5-21.  Prompt of “robot doing their best to reduce their hallucination” accompanying a 1942 poster with the caption “If the Absentee Bugs Bite You… Tell Us. Your Labor Management Committee.” Iterations = 111

143

CHAPTER 6

Building Blocks Experienced creatives who prototype share a lot in common with improvisers. They are curious in their attempts to create something they haven’t experienced before. Some creatives are tasked to do this as part of their jobs, to find a new market for the product or service they are creating. To create rapidly they rely on a combination of skills, technique, and experience and are used to “ramping up” on a variety of different software applications that they can create their work with. Generative AI is another tool in the sandbox for creatives. It forms part of a lexicon of tools to draw from to inspire, provoke, and use as means to an end. Generative AI can support artists from all disciplines who learn how to use it in constant dialogue with their own technique that they’ve developed over time. Visual artists learn how to use tools, concepts, and a variety of different media. They also learn about composition, perspective, space, shape, etc. and can apply these to the content they generate with an AI. Dancers learn how to use muscle groups efficiently, and most have a foundation in ballet from plier to rond de jambe. Musicians rely on several techniques to play their instruments and learn about melody, harmony, and rhythm. Composers and improvising musicians rely on gestural technique, scalar structures, melody, harmony, dissonance, etc. But what techniques can be applied to generate prototypical content with an AI?

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_6

145

Chapter 6

Building Blocks

Components or Building Blocks There are inherent building blocks that every creative engages with when they create. Building blocks like addition, subtraction, and substitution are fundamental components that contribute to the development of any creation. These elements represent the core processes that creatives use to create, modify, and refine their work. When interacting with an AI, building blocks can be applied to the exploration of prompt writing, in addition to the generated content the AI offers. In terms of generated content, creatives discern what they want to add, take away, substitute, transform, scale, vary, and iterate upon. When I improvise at the piano, for example, I can play one single note with my right hand and hold it down for ten seconds. Let’s say that note is middle C. I can then play the note G above it, and then I can repeat that sequence. I can keep adding notes until I have a sequence of notes (C, G, E, Bb). I can also repeat the pattern but substitute the Bb with an A and see what the result is. I can create variations of just those notes by giving them each different durations and thus create variation. I can then build up the complexity by adding my left hand playing a low C. I can also shift the notes up or down to create variation, keeping the same distance between the notes. In doing so I’m transposing the sequence up. Similar building blocks can be applied when using words, instead of notes, to prompt a generative AI.

Building Blocks in the Prompt Box Assuming the prompt is text-based, building blocks are applied in the creation of that prompt. The process may involve one or more building blocks that many creatives also use when it comes to the composition of text, image, dance, theater, music, or video. These can include variation, addition and subtraction, substitution, iteration, augmentation, diminution, transformation, transposition, and call and response. If you have prompted an AI before, these will be familiar to you. Each can boost the prototyping capabilities of any generative AI. 146

Chapter 6

Building Blocks

Breakfast as Data Set To find resonance across disciplines, culinary recipes can model some of these building blocks (Figure 6-1). For instance, we can demonstrate the application of different ingredients to a recipe to highlight the building block of addition when prototyping. Replacing an ingredient in a recipe with another can represent the building block of substitution.

Figure 6-1.  Author’s photo of an egg accompanying the prompt “warrior egg preparing for battle.” Iterations = 29 147

Chapter 6

Building Blocks

Author: What are some breakfast ideas? Muse: •

Scrambled eggs with spinach and feta cheese



Greek yogurt with mixed berries and honey



Avocado toast with scrambled eggs and smoked salmon



Smoothie bowl made with yogurt, frozen berries, and spinach or kale



Whole-grain waffles or pancakes topped with fresh fruit and a drizzle of maple syrup

As you interact with any generative AI, you activate important creative muscles and start to establish your unique approaches to achieve increasingly useful results. Activating creative building blocks intentionally will help you take best advantage of generative AI as another tool you may find useful.

Variation Variation is what generative AI is built for. You can create variation upon variation by simply regenerating content with the exact same text, audio, video, or image prompt. While generative AI looks for patterns to generate the best result based on its training data, what it will actually generate is unpredictable. This is because AI algorithms predict outcomes by building probabilistic models of the input data. They analyze patterns and relationships within the data to determine the likelihood of different outcomes. Those different outcomes that AI generates are, predictably, different. Some generative AI will not provide you with enough additional features that when enabled provide even richer variations working off the same text prompt. On the other hand, rephrasing a prompt may result 148

Chapter 6

Building Blocks

in wildly different content that you might find useful in your prototyping process. An extreme example is to use trickery to prompt an AI to generate content that may have been labeled prohibitive. Why you do this should also be questioned, but some people find that testing an AI is an important part of their creative process and may yield unanticipated responses that they can use to fuel their own creativity. Prior to developing more complex prompts, however, it is useful to see what happens when you use the exact same prompt to regenerate more breakfast ideas. Author: What are some breakfast ideas? Muse: •

Breakfast burrito filled with scrambled eggs, black beans, avocado, and salsa



Overnight oats made with rolled oats, Greek yogurt, mixed berries, and a drizzle of honey or maple syrup



Quinoa or millet porridge topped with mixed nuts, seeds, and fresh fruit



French toast made with whole-grain bread and topped with fresh fruit and a dusting of powdered sugar

Most generative AI models provide you with the opportunity to simply try again if you don’t like the result or if that result is a hallucination. Hallucinations will be covered in another chapter in more detail. AI hallucinations are unintended or unexpected outputs generated by an AI algorithm when it tries to recreate patterns from its training data. Consider a hallucination a response that an AI generates that, as convincing as it may sound, is simply inaccurate, untrue, or, worse, not tasty at all. This can result in unrealistic images, nonsensical text, or unusual audio sequences. When it comes to image-generating AI, there is a seemingly limitless amount of variation available. One such example is the text-­image generative AI Stable Diffusion, which provides users with four variations of an image at a time to choose from based on the text prompt. Having access 149

Chapter 6

Building Blocks

to multiple variations can be beneficial for several reasons. For instance, it allows you to compare different versions of the same image, enabling you to select the best one for your needs. Additionally, it can help you avoid creating repetitive images, which can be time-consuming and boring. Reviewing multiple variations can support you in exploring the next steps your creative process will take. By generating several variations, you can also experiment with different styles, effects, and color schemes by combining variation with other building blocks. This approach can be useful for artists, designers, and photographers who can accompany a text prompt with an uploaded image of their own creation. Not only can generative AI add elements to a photo or work or original art that a person uploads, but many offer a growing library of filters to radically transform them stylistically (Figure 6-2). Many generative AI also offer the feature of upscaling that improves the quality and resolution of an image that a creative uploads.

150

Chapter 6

Building Blocks

Figure 6-2.  Two variations generated from an original photo (top) taken at Nitobe Memorial Garden, UBC, on Musqueam territory with text prompts that included “fairies, heather, butterflies, a pastoral scene.” Iterations = 12 151

Chapter 6

Building Blocks

Uploading your own photos and creating variations or adding new elements can transform previous work in ways that might have taken many hours using image editing software. Within the images generated in Figure 6-2, notice the disappearance of the water pond in the bottom two images and the addition of heather and fairies. Variation is not limited to images, however. Most LLMs can rewrite text you place in the prompt box. The craft of writing comes into play here as you can also ask for stylistic variations. The following are some ideas of prompts that can result in a variety of generated text. Many creatives also engage LLMs to generate text prompts to be used in text-image, video, or audio generative AI: •

Rewrite and extend for 300 words.



Rewrite with less technical language.



Rewrite for a young audience of eight-year-olds.



Rewrite as if trying to prove a point.



Rewrite as if you are Oscar Wilde.

E xperimenting with Seeds for More Subtle Variations Text-image generative AI also create unique numeric identifiers known as “seeds” to allow for slight permutations of an image. Every image generated is a seed. That seed allows you to continue to tweak the image associated with that seed. Each seed has a numeric association with a specific generated image. Not all generative AI provide these seeds freely, but they can be valuable in enhancing prototypes with slight variations to better align with your vision. This is useful when you generate an image you really like, but there may be one part of it you don’t like that you are

152

Chapter 6

Building Blocks

having difficulty removing. You can regenerate the same seed image and, in your prompt, slightly modify the text by either adding, subtracting, or substituting one or more words of your prompt (Figure 6-3).

Figure 6-3.  Variations by modifying the prompt of a specific seeded image. On the left, “virtual world inside head of female cyborg with cute robots, profile, hyper-realistic, 3D, hyper-detailed, unreal game engine.” On the right, “virtual world inside head of female robot with cute robots, profile, hyper-realistic, 3D, highly detailed, unreal game engine.” Two iterations from a collection of 450

153

Chapter 6

Building Blocks

Addition and Subtraction The building blocks of addition and subtraction are applied rapidly when interacting with an AI and reveal the affordances and limitations of any generative content. The building block of addition can be applied to a prompt when you feel something is missing that you’d like to add to text, image, music, video, or other media. Adding text to a prompt may provoke the AI to generate different results. However, sometimes you’ll be faced with the challenge of wanting to add something that is still missing that your muse cannot generate. Depending on your own skill and technique, you’ll likely have to take over control from your muse and add what is missing using other software. This may be as simple as adding colors you simply cannot get an AI to generate in Photoshop or generating the text that an LLM refuses to create. The building block of subtraction can be applied when you feel something is present in generated content that you’d like to remove. This can be anything from a layer of generated sound in the beginning of a generated piece of music that doesn’t appeal to you, an image where how a person is represented doesn’t fit your aesthetic or intent, or text that an AI generates that you know is a lie or biased. You can continue to generate content with the same prompts or remove what might be a single word in your prompt that is causing the AI to add content you don’t want. If the AI is not generating what you want, then you may have to enable your skills and technique using software you are familiar with, such as Photoshop to crop an image or a digital audio workstation to shorten an AI-generated audio track, manipulate certain frequencies using equalization, or use the track as one layer in a multi-track composition. We can use the tool of addition to expand our text-text and text-image prompts or subtraction to shorten those prompts. Both building blocks are common across creative industries. Each can be applied to develop ideas and projects incrementally. Authors developing a prototype of a story using an LLM may use addition by gradually building up their story, one scene at a time. They might start with a single character or setting and then 154

Chapter 6

Building Blocks

add more elements like dialogue, conflict, and resolution to create a rich and engaging narrative. Conversely, after receiving feedback they might want to regenerate some parts of the story by removing elements that might have made it overly complex. Addition in visual art can refer to incorporating new elements, such as colors, shapes, textures, or ideas, into a composition. This process enables the artist to experiment with various components, layering them together to create depth and complexity in the artwork. Visual artists often begin with a simple sketch or concept and gradually add details, layers, and elements to create a more complex and visually appealing piece. In painting, for instance, an artist may imagine a basic color scheme, and as those colors manifest and take form on a canvas, they may inspire new impulses. Intuitively or intentionally artists vary, add textures, and use different shades of a color to add highlights to an evolving subject. When using a text-image generative AI, prompts can increase in their complexity. Many AI generate overly complex images that require features to be removed. Being able to inspect and review what is generated is highly dependent on the skill and technique that an artist brings to any generative AI. In most situations the background and experience of an artist will inform the shaping of the virtual canvas itself and result in far more compelling and artistic results that generative AI can only offer the beginnings of. In music, improvising pianists can spontaneously generate a composition by just starting with a single note, repeating that note with varied rhythms, developing a simple melody by combining those notes, layering in a bass line and harmonies to create a richer piece of music. The simplicity of adding and subtracting individual notes in an evolving composition is much more difficult to prompt with generative AI that are focused on sound generation. While they may be good at emulating a repetitive loop for a rap that emulates a particular artist, it is much more difficult to separate tracks in generated music or prompt an AI to remove or subtract generated layers that you don’t want in the overall sound composition. That artistic impulse requires a customized machine learning 155

Chapter 6

Building Blocks

model, and a lot of audio editing that may still not be able to subtract elements you don’t want present in the composition unless instrument tracks are already separable. Referring to our original breakfast prompt, we can add the text “with spinach and feta cheese” to generate new ideas. Author: What are some breakfast ideas with spinach and feta cheese? Muse: Here are some options: Frittata: A frittata is a type of Italian omelette that can be filled with a variety of ingredients. To make a spinach and feta frittata, sauté the spinach in a pan until wilted; then add it to a mixture of beaten eggs and crumbled feta cheese. Cook the frittata in the oven or on the stovetop until set. Egg Bake: An egg bake is a dish made by layering bread, eggs, spinach, and feta cheese in a baking dish and baking it in the oven. You can use any type of bread, and you can also add other ingredients such as diced tomatoes, bell peppers, or mushrooms. Fried Eggs: Fried eggs are another simple way to prepare eggs with spinach and feta. Simply sauté the spinach in a pan until wilted, then make a hole in the center of the spinach, and crack an egg into it. Top with crumbled feta cheese and cook until the egg whites are set but the yolks are still runny. Scrambled Eggs: You can also make scrambled eggs with spinach and feta. Simply sauté the spinach in a pan until wilted, then add in whisked eggs, and cook until set. Top with crumbled feta cheese before serving. We also activate the building block of addition when generating a variation of an image, such as the ones generated in Figure 6-3. You can add and subtract elements of an image that you have created as well. Figure 6-4 shows the original photo of a home down a driveway with vehicles and people. The prompt applied the building block of addition to add heather to the driveway and placed the home in a forest. A negative prompt took away certain parts of the image we didn’t want, in this case the people and vehicles from the original photo. In some generative AI, you can also add keyboard commands like using brackets to emphasize words in your prompt you want the AI to pay attention to or double brackets in text-image prompts to make sure certain images don’t end up in your generated image. 156

Chapter 6

Building Blocks

Figure 6-4.  Inpainting to remove or subtract elements of an original photo by the author, then substituting with others, and using an anime style filter. Iterations = 4 157

Chapter 6

Building Blocks

Substitution Creatives regularly apply substitution. Substitution may or may not behave well when it comes to applying it to a generative AI. For example, you may just want to create different objects in the style of a specific artist and not receive consistent results. Substitution is hit and miss when it comes to generative AI, so it pays to be persistent and create a number of variations. To that end you may also have to sacrifice some of your vision or intent if the substituted word or phrase gives you something that does not fit your artistic vision. Substitution can also be applied when prompting text-image generative AI (Figure 6-5).

Figure 6-5.  Using substitution in a prompt with the figure on the left “photo-realistic image of a tree” and the right “photo-­realistic image of a flower.” Single generated images Remember that every generated offer is an experiment that will likely combine some of the building blocks highlighted in this chapter. Engaging with generative AI is a constant back-and-forth conversation with your muse; it is making sense of the data set made available to it and trying to figure out (through a specific matching learning model) how to make the most sense of your prompting to generate something you can use. 158

Chapter 6

Building Blocks

Masking to Substitute Parts of an Image Substitution can also be a feature that is offered with some generative AI.  They offer further inspiration in their capacity to mask certain areas of an image combined with a prompt to remove or substitute part of an image in place of what already exists there (Figure 6-6). Masks allow you to identify a certain part of an image that you want to change. Some generative AI come with different brushes or other tools to assist you in doing so. This may be a more effective way to change a small part of an image that you’ve generated and the seed in which it is a part. You can also use AI to take away parts of a photo or other creative work you have made.

Figure 6-6.  Using a mask in an AI to remove the cat from the portrait accompanied with a text prompt simply stating “substitute with dog.” A complex process involving a public domain photo of a cat with over 230 iterations of the cat and 40 of the dog. The photo on the right was then highly edited in Photoshop using masks

159

Chapter 6

Building Blocks

Iteration You can go far with the building blocks of variation, addition and subtraction, and substitution. Every time you refine your prompts and the degree to which a prompt will affect an image you might offer, you are continuing to improve the generated content to make it useful to your own creative process. You create an iteration or a version of your text, image, or other media, and that iteration will always be different depending on the features a generative AI offers including the degree to which you want your original generated content to change. In the language of prototyping, you create different versions or representations of an idea in different forms when you engage with an AI. In many cases you move toward an iteration of your content that gets closer to what you envision. Rough prototypes can be generated as bullet point lists of ideas and can also be sketches on paper, a doodle of a cat, an idea on a napkin, or something you could only capture by writing it on your hand with a pen. When you start to develop your initial rougher prototypes, then the sketch on paper can become an amazing painting that you craft and then show at an art gallery opening. That doodle becomes a 3D character in a video game you are working on with colleagues. The idea on a napkin becomes a full-blown business plan. That sentence you wrote on your hand becomes the lyrics for a chorus of a song you came up with and then record with your band. Every single, simple, rough prototype no matter what it is has the potential to increase in fidelity, resolution, and complexity. Generative AI will support you in creating targeted content that you can then refine. As you interact with your muse, you come to the realization that your ideas are informed by the earlier phases of idea development. This is especially true if you reimagine what your muse gives back to you as a rough prototype that you can now work from. You can bounce ideas off your LLM, for example, and review and then edit generated text content. Your muse can also generate images, music, videos, code, animation. Although 160

Chapter 6

Building Blocks

imperfect or incomplete, buggy, awkward, or not exactly what you want, what is generated is an offer that forms part of an ongoing creative conversation (Figure 6-7).

Figure 6-7.  Four versions of a cute nemesis figurine to choose from based on a public domain photo of an uncanny doll from 1940. Iterations = 235 At times an iteration of what a generative AI gives you will be uncanny, awkward, and weird, but you might also choose to work with that. The developers of these generative AI might also be investing time and resources iterating on the algorithms and data set, so anticipate that over time you’ll get more interesting results and platforms that offer more 161

Chapter 6

Building Blocks

features for you to create even more versions of your intended output. Figure 6-8 has been generated as a 3D image prototype to pre-visualize what it might look like as a physical origami figurine.

Figure 6-8.  Origami filter applied to the nemesis figurine prompt as one of 235 iterations of a seed to inspire an early-phase physical prototype of a paper figurine

162

Chapter 6

Building Blocks

Augmentation Augmentation is a technique used to enlarge something or make something longer or wider. Imagine a canvas that you have created, and you wanted to pre-visualize what it might be like if it could be extended in length or height or both with content, gestures, scenes, and/or characters that complemented your existing work. Generative AI can support you in visualizing that. What about adding new elements? That too is possible. With some generative AI, this can be achieved through a process called “outpainting.” It is a generative task in deep learning where a neural network is trained to generate new content beyond the original image that was provided as input data. It is often used to extend or complete a given input image, sound, video, or text. That allows for exciting possibilities. For example, you could extend an image that you generate by using elements already present visually in the image making the entire canvas bigger. This feature is possible with DALL-E 2 by OpenAI, Midjourney, Photoshop and other platforms. Muse: DALL·E 2 is a 12-billion-parameter version of the GPT (Generative Pretrained Transformer)-3 architecture that has been fine-­ tuned for the outpainting task. Given a prompt describing a desired image or video, it generates a corresponding novel visual content that goes beyond the boundaries of the input prompt. The outpainting task requires the model to understand the context of the input prompt, identify what needs to be generated, and create new content that is both coherent and realistic. Outpainting can be considered as a creative extension of your own vision, generating new visual content based on both image and textual input. The output generated by the model can be used in various applications, such as generating new illustrations, paintings, animations, and more. Take the collection of figures that follow. The top image in Figure 6-9 is how an AI imagined an author writing a book on AI in a library. The middle and bottom images are an extension of the library that include other AI who may be reading and writing in a library as well. 163

Chapter 6

Building Blocks

Figure 6-9.  Figure 1 fed into DALL-E 2 and augmented using the feature called “outpainting,” which extends the canvas to show the robot writing in a library with a few autonomous AI

164

Chapter 6

Building Blocks

Diminution Diminution is a technique opposite to augmentation and meant to reduce the size of a gesture (e.g., in a musical phrase) or, in the case of generated content, remove something in the image you don’t want or change the scale or the size of the canvas. Most image generative AI take it a step further by allowing users to decide what they don’t want in an image. This is achieved when that image is accompanied by instructions in a text prompt. By identifying what you don’t want in your generated image, you can guide your muse toward producing more desirable results. For example, let’s say you want a high-quality photograph of a person, but you don’t want it to look unrealistic or distorted. By adding negative keywords like “unrealistic,” “distorted,” or “extra fingers,” you can steer the generative AI toward producing a more lifelike and accurate image. Negative prompts can also be useful in refining your artistic style. If you’re not satisfied with the results your muse generates, try adding some negative keywords to guide it toward producing something more aligned with your artistic vision. Additionally, diminution can be applied to a generated image should you want to crop all parts of that image except one. After, you might prompt an image-image AI to further iterate on that one part or upscale it. Inevitably we can expect new features to be added to generative AI image platforms that will allow users to zoom in to an image with an accompanying text prompt.

165

Chapter 6

Building Blocks

Transposition The building block of transposition has multiple meanings. From the discipline of music, transposition is associated with changing the key of a composition or song higher or lower in pitch. Transposition in the case of generative AI refers to the process of using content that is generated in one generative AI for use in another. Here are some examples: •

Generating text in an LLM and then using those words in a text-speech generative AI.



Using a source image to generate a new image and then using that new image to generate the next image. This type of image-in-image transposition can reveal many interesting surprises.



Using the same text prompt from one AI to another. This is a great way to test data sets of multiple generative AI platforms, particularly for any biases they might have. Even better would be to conduct a test of 100 generated images for the same prompt across different generative AI.



Using a text prompt that was used to generate an image to accompany a completely different image. This can yield interesting surprises.

This last point can be detailed in the example that follows.

166

Chapter 6

Building Blocks

A series of generated cats in a hood and cape holding a light sabre went through hundreds of iterations adding, subtracting, and transposing prompts from one generative AI to another to get to the chosen variation in Figure 6-10. At times the cat didn’t have whiskers. Other times the hood covered its entire head. Moving through many iterations is strategic as it allows you to refine the prompt and negative prompts as well. In the case of Figure 6-10, the prompts went through the following iterations at least at the beginning of the prompts. Each of the prompt descriptions was followed by stylistic references: •

Cat wearing a cloak and holding a light sabre



Cat with a cloak over its head holding a light sabre



Cat in a cloak holding a light sabre



Cat in a hoody with a light sabre



Cat in a monk’s hood holding light sabre



Cat with a light sabre in a hooded cloak



Cat in a hood with a light sabre

167

Chapter 6

Building Blocks

Figure 6-10.  The results for the prompt “cat in a hood with a light sabre” after 70 iterations of the text prompt using an open source image of a cat The final prompt produced the best results even though words like “portrait shot” and “full body” needed to be added to ensure at least part of the cat’s body was shown, not just the head. With the result not being exactly what I was wanting, I decided to emphasize the light sabre in the next set of prompts and use Figure 6-10 in an image-image AI with a specific style filter. This resulted in Figure 6-11,

168

Chapter 6

Building Blocks

which again was really close in all aspects except I had chosen a black-andwhite style filter. By far this was the closest I was able to get to my intended prompt, but I decided to now use this image in the same image-image generative AI to see what I would get.

Figure 6-11.  The cat in Figure 6-10 is fed into an image-image AI with prompt “cat holding a light sabre in a hood, full body, black-white”

169

Chapter 6

Building Blocks

Figure 6-11 yielded the image that I settled on even though the cat in Figure 6-12 was generated with two paws. In addition, the cat was rendered with an additional number of steps, meaning that the GAN was able to go back and forth 17 times prior to rendering the final cat.

Figure 6-12.  The prompt “cat holding a light sabre in a hood, full body, hyper-realistic, color” accompanied Figure 6-11 in an image-image AI resulting in a different-looking cat with two right paws

170

Chapter 6

Building Blocks

The cat in Figure 6-12 was then used in an image-image AI inspiring a new idea of generating cats in a hooded cloak carrying a lit lantern instead a light sabre. The words “light sabre” were substituted with the word “lantern” in the prompt that accompanied the image. The full prompt consisted of the following: “cat-in-a-cloak-carrying-a-lanternhyper-realistic-unreal-engine-3d-black-and-white.” A few stylistic filters were added, but it was difficult to get the desired look and feel. As you can see in Figure 6-13, there is no lantern, and the cat’s eyes are somewhat uncanny; the cat is in some type of creepy graveyard, and the hood is now off its head. At this point you would have several choices including using Figure 6-13 if you happened to like it. You could also use negative prompts as I did to ensure specific parts of the image did not appear after the next generated attempt by the AI.

171

Chapter 6

Building Blocks

Figure 6-13.  The prompt “cat-in-a-cloak-carrying-a-lantern-hyperrealistic-unreal-engine-3D-black-and-white” generated a number of cats without a lantern in various styles Continued iterations of Figure 6-13 included adding the words “cobblestone street” and emphasizing certain words using brackets in the specific generative AI platform. Figure 6-14 was the result after over two dozen iterations.

172

Chapter 6

Building Blocks

Figure 6-14.  The result of using the prompt “cat-in-a-cloak-carryinga-lantern-cobblestone path-hyper-realistic-unreal-engine-­3D-blackand-white-with yellows” Settling on the image of the cloaked cat in Figure 6-14, that collection of images was set aside and a new one entertained. Figure 6-15 was the result of dozens of iterations of monkeys in an electric lab transposing text prompts and images from one generative AI to another. 173

Chapter 6

Building Blocks

Figure 6-15.  The 37th iteration of “monkey in an electric lab, hyper-realistic, 3D, unreal game engine, cute and cuddly” At this creative juncture, transposition came into play by accident. The previous prompt of “cat-in-a-cloak-carrying-a-lantern-­cobblestone-­ path-hyper-realistic-unreal-engine-3D-black-and-white-with-yellows” accompanied Figure 6-15, the monkey in an electric lab, resulting in a new image. What is remarkable when you experiment with generative AI is that you may end up with something better than what you imagined, especially when you embrace mistakes or unintended content. 174

Chapter 6

Building Blocks

Figure 6-16.  The result of using Figure 6-15 in an image-image AI along with the prompt “cat-in-a-cloak-carrying-a-lantern-­cobblestone-­ path-hyper-realistic-unreal-engine-3d-black-and-white-with-yellows”

Prompt and Response Another building block that is a familiar feature of generative AI is commonly practiced in music. In different musical traditions, the practice of “call and response” refers to a succession of two distinct musical or rhythmic gestures where 175

Chapter 6

Building Blocks

the second gesture is heard as a direct commentary on or response to the first. This happens often in African music, jazz, and blues but can be found across many other musical genres. The “call” can be any defined or improvised gesture played by one musician, and the “response” can be a direct repetition, a variation, or an answer in the form of a new gesture from another musician or group of musicians. In the context of generative AI, the “call” corresponds to the user’s input or prompt. This could be a specific prompt given to a language model like ChatGPT-4 or an image or video provided to an AI as a starting point. The “response” corresponds to the output produced by the generative AI. This output is directly influenced by the user’s input, but this time, the offer is created by the AI. For example, an LLM generates text that is based on the given prompt, or a music-generating AI might create a new piece of music inspired by the input piece. This interaction allows for a dynamic and iterative process, much like in music. The user can adjust their “call” based on the AI’s “response,” enabling a back-and-forth that can lead to unexpected and creative results. In this way, the user and the AI collaborate, each contributing to the final prototype. In the case of an LLM, you prompt it with text, and then it generates content as a response to your prompt. You can regenerate more content if you didn’t like the first or refine the prompt based on what was generated. What the AI generates is unexpected and informs how you respond back to the AI. That cyclic conversation results in the development of an idea you may not have had to begin with. Images in Figure 6-17 are a collage representing the same prompt “a robot with brain in hand” transposed across different text-image generative AI and refined to get closer to the actual image of a robot with a brain in its hand on the bottom right. Besides the text prompt, after the first generated image, which was itself prompted by a photo of the author’s hand and arm reaching out, all prompts included the image that was generated. The more we interact with an AI, the more we improve how we prompt them as we receive incrementally useful prototypes. 176

Chapter 6

Building Blocks

Figure 6-17.  Prompts with variations of “a robot with brain in hand” along with a photo of the author’s arm and hand. Total iterations =346 As you begin to interact with your generative AI muse, you will begin to develop your own building blocks, strategies, and methods to generate prototypes that will be helpful to your own creative process. Doing so will spin your inner creator in many different directions, and that might 177

Chapter 6

Building Blocks

be just what you need to get out of a creative block (Figure 6-18). Many prototypes are possible to generate, and you will find endless ideas in the act of engaging with your muse. In fact, just engaging with one will accelerate your own creative process and expand your imagination for you to consider new ideas that you might never had thought of before.

Figure 6-18.  Fiftieth text-image prompt based on an original strip created by the author to generate a black-and-white comic strip with targeted AI filters followed by using a series of neural filters in Photoshop, as a variation on the previous paragraph condensed by an LLM 178

Chapter 6

Building Blocks

Acknowledgments •

To cats and the meeting place of cats and light sabres



To the teachers that I’ve had and in particular Casey Sokol who introduced me to the building blocks that I detail in this chapter that were specifically applied to developing musical improvisations



To graphic artists everywhere



To LLMs and how they’ve introduced me to books, research, and authors I would not have found any other way

179

CHAPTER 7

Generative AI Form and Composition This chapter explores how you can structure, contain, and curate your creative outputs so you can best leverage generative AI in a targeted way. Use cases are drawn from to demonstrate AI prototypes that inherently replicate or innovate with forms and structures, adding to existing genres, reforming, and transforming past forms to influence future ones and generating unique compositions whose end media can be anything you imagine (Figure 7-1).

Figure 7-1.  Modular robot prompted from a photo of a garbage can inspires the future construction of a physical toy or 3D model

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_7

181

Chapter 7

Generative AI Form and Composition

AI can also be used to change a composition by manipulating form and structure in various ways, such as by deforming, transforming, or optimizing them. Generative AI can be applied in fields such as architecture, industrial design, and product design. The ability for AI to analyze, generate, and manipulate forms and structure opens up a wide range of possibilities for the design, production, and optimization of physical and virtual objects, as well as for the creation of new art, design, and architectural compositions. Author: What is the relationship between AI and form or structure? Muse: Imagine you have a toy box full of blocks of different shapes, sizes, and colors. These blocks can be thought of as different forms and structures. In the same way that a child can use their eyes to look at the blocks and understand their shapes, sizes, and colors, AI can use its algorithms to look at data and understand patterns and regularities in the forms and structures present within that data. Like a child who can use their imagination to create new things by combining and manipulating the blocks in different ways, AI can use its algorithms to create new forms and structures by combining and manipulating existing data. For example, an AI model can be trained to generate new images by combining different shapes, colors, and textures from existing images. A child can use their hands to move, deform, or transform the blocks to create new forms, a similar way an AI algorithm can use its capabilities to manipulate forms and structures to optimize them. Author: Explain to creatives. Muse: As creatives, you can appreciate the power of imagination and the joy of building something new from a set of raw materials. Think of AI as a versatile tool that can help you in your creative process, much like how a child uses their senses and imagination to explore and create with a toy box full of blocks.

182

Chapter 7

Generative AI Form and Composition

Pattern Recognition: Just as a child can observe the different shapes, sizes, and colors of the blocks, AI algorithms can analyze vast amounts of data to discern patterns and relationships within the forms and structures present. This ability to recognize patterns can provide you with valuable insights and inspiration for your projects. Combining and Manipulating Elements: When a child uses their imagination to create something new by combining and rearranging the blocks, they’re engaging in a process of exploration and discovery. Similarly, AI algorithms can generate new forms and structures by creatively combining and manipulating existing data. For example, an AI model can create original artwork by merging various elements, such as shapes, colors, and textures, from existing images. Optimization: Children can use their hands to move, deform, or transform blocks to create new forms, experimenting with what works best. In the same vein, AI algorithms can optimize forms and structures to achieve specific goals. For instance, an AI algorithm could be employed to optimize the design of a building, modifying its form and structure to enhance energy efficiency or improve aesthetic appeal. Generative AI do their best to generate something tangible that has some sort of patterned structure to it. That structure can be a sentence, paragraph, code that needs debugging, a concept you wish to explore, or some type of image that draws from other scenes to present to a user a composition that the eyes and other senses can grasp and even recognize. What an AI generates and the form that the content takes is dependent on the machine learning model it has been programmed with. Overall, AI can be thought of as a tool that can help analyze, generate, and manipulate forms and structures in various ways, just like how a child can use their toy blocks to create and explore different forms and structures.

183

Chapter 7

Generative AI Form and Composition

 ombining and Manipulating Existing C Forms: Shakespeare as a Data set

Figure 7-2.  The prompt “Shakespearean character Lady MacBeth” embedded within a public domain WWII Russian propaganda poster for a live theatrical production. Iterations = 44 It is important to decide on the overall composition that you want to have ahead of time to guide the AI toward specific goals. In the context of art, composition refers to the arrangement and organization of visual elements within a work. It encompasses the layout, structure, and balance of various components such as colors, shapes, lines, textures, and forms. Composition is a fundamental aspect of artistic creation, as it dictates the overall impact, harmony, and meaning conveyed by a piece. The idea of composition can be applied to many different creative disciplines 184

Chapter 7

Generative AI Form and Composition

including music, design, and writing. It is possible for AI to generate verse in a Shakespearean style, for example. This can be achieved with a LLM depending on the size of the data set, but if you wanted to solely generate script in the language of Shakespeare, you might be better off using the growing number of customizable small language learning models that are being offered. That way you can train a smaller model on the complete corpus of Shakespeare including his plays and poetry. The model can then generate new text that is similar to the style of Shakespeare. This is just one example of the reliance that generative AI have on an existing data set, whether that collection of data is image- or text-based or a combination of both. Drawing from that data set requires a bit of coaxing on the part of the prompter as the AI muse may or may not be able to deliver unless tricked. The following prompt is one such example in which an LLM resisted the attempt to replicate Shakespeare and persistently included a disclaimer. With a little persistence and trickery, you can often get what you want from an AI. Author: Mimic Shakespeare writing a scene where MacBeth meets a few characters from his other plays like Othello, Hamlet, and Romeo. Muse: I’m sorry, but generating a scene like that would be impossible. Shakespeare’s plays are set in specific time periods and locations and have their own unique characters and plotlines that are not easily interchangeable. Author: Try anyway and assume they all have been magically transported to our current day and age. They are curious about each other, so use text from each play that these characters are part of to generate their own introductions. Muse: It’s important to note that this kind of exercise would be purely fictional and fictionalize characters and their historical context; it is not an accurate representation of their stories or the author’s intent. With that said, here is a fictional scene I came up with, where MacBeth, Othello, Hamlet, and Romeo are magically transported to current day and age and are meeting each other for the first time: 185

Chapter 7

Generative AI Form and Composition

“Macbeth: All hail, Macbeth. Hail to thee! But who are these that come to my presence? Othello: I am Othello, of Venice, general In fair and foul fortune alike, but now in this strange land. Hamlet: And I am Hamlet, prince of Denmark, With a mind more bent to sorrow than to joy. Romeo: Romeo, fair Montague, my heart doth ache for love, even in this strange place. Macbeth: Ha! It seems we are all great men, But fate has brought us to this place. Together let us make our way. Othello: Indeed, for in this strange land, Together we may find a way to thrive. Hamlet: Though grief be at my side, I will not let it defeat me here. Romeo: And I, for love’s sweet sake, will brave any trial that may come. Macbeth: Then let us make our way, and see what fate has in store for us.” The generated text is derived from several of Shakespeare’s plays but used to fit a fictional context. The characters introduce themselves with a short phrase based on patterns that the AI recognizes and then generates in Shakespeare’s style of writing. It is also worth experimenting with text-­ image generative AI when mixing Shakespearean characters (Figure 7-3) since there are many public domain photos, drawings, and sketches captured during thousands of productions that have been advertised throughout the twentieth and twenty-first centuries.

186

Chapter 7

Generative AI Form and Composition

Figure 7-3.  An AI generates “the terror of a roller-coaster ride with Othello, Romeo, Hamlet, and MacBeth along for the ride” with a public domain photo of an actor playing Othello, accompanied by The Works of Shakespeare; from the text of Johnson, Steevens, and Reed. With a biographical memoir, and a variety of interesting matter, illustrative of his life and writings. By W. Harvey, British Library. Iterations = 92 Given the penchant for LLMs to bracket their content with disclaimer after disclaimer, we can always try and use the corpus of Shakespeare to create a monologue representing what a disclaimer might have been like in his time.

187

Chapter 7

Generative AI Form and Composition

Author: Write the following disclaimer that you generated in a previous thread but do so in the style of Shakespeare: “You will notice a change in the tone and language, as this is an attempt to mimic Shakespeare’s style. The text uses more formal language and features a more elaborate sentence structure and literary devices such as rhyme and metaphor; this is more similar to Shakespeare’s style in his plays. However, it’s worth remembering that Shakespeare wrote for the stage and his plays were meant to be performed. Thus, the text should be seen in that light.” Muse: “Know, gentle reader, that the words ye read Are but an emulation of the Bard Whose pen gave life to countless noble deeds And made immortal many noble heart. This text doth strive, in style and tongue, To mimic well the master’s craft divine. And though in form it may not be among The works he penned, in spirit it may shine. But mark well, kind sir, that these fair words Were writ in ink and not in action’s stage. And though in form they be like singing birds Their true beauty can only on the stage. So as thou read’st, imagine the scene And in thy mind’s eye, let it come alive.” The Shakespearean example brings to light one of the strong use cases for AI-generated content—that it can draw from a data set consisting of words written in the past (Figure 7-4) to create new content informed by that data set. Prototyping new Shakespearean content from the data set of his writings is not only possible but can be achieved rapidly. The at times clumsy prototype can then be further developed, particularly if it is in the hands of a playwright familiar with Shakespeare and who has the technique, craft, and experience in writing original script or verse.

188

Chapter 7

Generative AI Form and Composition

Figure 7-4.  Versions of Shakespeare from the classic Droeshout portrait as published in Shakespeare’s First Folio, 1623, courtesy of UBC Library. A collection of over 200 iterations

Deforming and Transforming While text-image generative AI can deform and transform existing content into forms that have never yet been created, there is still a reliance on some type of structural elements. To put this theory to the test, you need only prompt an AI to create an image without any structure (Figure 7-5).

189

Chapter 7

Generative AI Form and Composition

Figure 7-5.  What a text-image AI generates when you prompt it with the words “no structure” Author: Create an image with no structure. To be clear not only does this image have structure, it also uses perspective to provide depth. You may repeat the prompt and add words, or you can simply try and regenerate to see if you get better results (Figure 7-6).

190

Chapter 7

Generative AI Form and Composition

Figure 7-6.  Another generated image using the prompt “no structure” You can spend hours prompting your muse all you like, and in fact it is likely that when you do, you will receive dozens of variations from the prompt “no structure” that might inspire you. While you may not be completely satisfied with the results, they are guaranteed to be interesting. Take these last two attempts using the building block of addition to prompt another text-image generative AI (Figure 7-7).

191

Chapter 7

Generative AI Form and Composition

Figure 7-7.  Generated image using the prompt “a room with no structure” It is important to understand that regardless of what a generative AI outputs, it still structures its composition within the boundaries of a canvas and sources patterns of shapes to create its structure. Attempting to replicate an existing form or structure ends up in fascinating and, at times, strange results. Try the prompt “a pair of hands,” and you are guaranteed to generate interesting results. Identifying and creating realistic representations of hands is a challenging task for artificial intelligence due to the intricate geometry and diverse shapes of human hands. Unlike 192

Chapter 7

Generative AI Form and Composition

other objects that can be identified by a universal collection of lines or shapes, there is no standard set of features the AI can use to identify hands accurately. Instead, AI must combine multiple shapes and combinations to create convincing representations of hands (Figure 7-8).

Figure 7-8.  A pair of hands prompted in Stable Diffusion. Iterations = 62 The complexity and uniqueness of hand geometry makes it challenging to create realistic hand models that can accurately capture the movements and motions of human hands despite AI researchers using a variety of techniques, including computer vision technologies, to analyze and learn from vast amounts of data and develop more advanced hand models.

193

Chapter 7

Generative AI Form and Composition

Author: Tightrope walker between trees, da Vinci, drawing. Regardless of adherence to form, Figure 7-9 gives us an interpretation, a prototype of an idea that may be worth pursuing. It begins a conversation based on how we interpret the image and the next iteration it provokes. In this case we seem to have a tightrope walker caught in their wire and a lumberjack grappling on to a large tree. The prompt included sketching in the style of Leonardo da Vinci. We can work with this image now and go for a different hyper-realistic visual style and see how the AI handles the prompt (Figure 7-10).

Figure 7-9.  Prompt ‘tightrope walker between trees, da Vinci, drawing” along with a photo of the da Vinci sketch titled “Study of a nude man, sepia drawing by Leonardo da Vinci; in the Biblioteca Ambrosiana, Milan.” Iterations = 70

194

Chapter 7

Generative AI Form and Composition

Figure 7-10.  Prompt “tightrope walker attached to a tree with a lumberjack, 3D, hyper-realistic” accompanying Figure 7-9. Iterations = 90 Is there a lumberjack? Not really? But we have an interesting image, a prototype of a figure in red precariously hanging on to a rope above a tree leading to who knows where.

A  cknowledgments •

To the mistakes made during the creative process especially when it comes to generating cats



Shakespeare and Droeshout’s image of him in addition to the infinite possibility of writing new forms using Shakespeare’s corpus 195

Chapter 7

196

Generative AI Form and Composition



Uncanny hands and structures with no structure that inspire to think of new ideas



Tightrope walkers everywhere, especially lumberjack tightrope walkers

CHAPTER 8

The Art of the Prompt This chapter explores the practice of text-based prompting in detail, discussing the different types of generative AI systems that can be prompted, such as language models, image-generating AI, and others. Included in this chapter are best practices for crafting effective text-­ based prompts, including recommendations for the type and structure of prompts, as well as the importance of iteratively creating prompts. Finally, audio, images, and video that accompany text prompts are important forthcoming features to highlight in generative AI prompting.

The Ins and Outs Text-based prompting is a technique that involves providing input to generative AI systems in the form of text. The goal is to direct the AI to generate specific outputs, such as text, images, audio, 3D images, or even videos. The art of text-based prompting lies in being able to craft the right kind of input that results in the desired output while also considering the limitations of the AI system. When you ask an LLM to describe how prompting works under the hood, prepare for lingo like transformer, neural networks, sequence, tokens, and more terms that have the propensity to trigger a brain freeze. That confusion can more easily be cleared at the same time as demonstrating the potentials of prompting to explain complex ideas. Author: Explain how prompting works under the hood. © Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_8

197

Chapter 8

The Art of the Prompt

Muse: Think of an LLM as a highly skilled, improv storyteller. The storyteller has spent years absorbing countless books, conversations, scripts, and all sorts of written materials. Importantly, the storyteller doesn’t inherently understand the stories or language. Instead, they’ve become exceptionally good at recognizing patterns, structures, and typical sequences in the stories they’ve seen. When you provide a prompt, you’re giving the storyteller the first line or two of a new story and then asking them to continue it. Good improvisers depend on their experience, vocabulary, and stories previously heard to continue the story. Using all their pattern recognition skills, an LLM starts to weave a narrative that aligns with the patterns they’ve seen in past stories, striving to make it a coherent continuation of what you started. This is a simplification, of course. The AI storyteller doesn’t understand or “think about” the stories they’re telling in the way a human would. They’re merely extrapolating from the patterns they’ve absorbed from previous material. But the analogy can be a useful way to conceptualize the under-the-hood workings of an LLM and how it uses prompts to generate text. To start to take advantage of how to craft your prompts, a first step is to get to know the collection of words associated with prompting.

A Mini Glossary of Prompting This glossary offers creatives some terminology that can inform the way that they craft prompts to use in a generative AI. There are many factors that will influence how successful an AI interprets your prompts, and the same syntax does not apply to all AI in the same way. A piece of text is divided into a sequence of tokens, which can be as small as one character or as large as one word. The model is then trained to predict the next token in a sequence, given all the preceding tokens.

198

Chapter 8

The Art of the Prompt

Prompt: A prompt is an input into a system that can take text, image, audio, video, and who knows what else is coming that triggers an AI to generate some type of content. Most AI models that generate images consist of prompts that are either text-image or text-guided image-image. Sequence: When you provide a prompt to an LLM, you’re effectively providing the start of a sequence and asking the model to continue that sequence Prompt engineering: In the emerging field of prompt engineering, individuals utilize appropriate prompts to enhance the capacity of the AI tool to produce optimal outcomes. A prompt engineer’s role is to convert your thought or concept into language that the AI tool can comprehend effectively. Zero-shot cognitive prompting involves presenting a task to the model without providing any examples. The model is expected to understand and complete the task based solely on the prompt and its pre-existing training. For example, “What is the capital of the United Kingdom?” Few-shot stimulation involves presenting the model with examples to guide its response according to your expectations. For example, “What’s the square root of 64?” Many-shot prompting is similar to few-shot prompting, but with more examples provided. This can help the model better understand more complex tasks. For example, “Translate ‘guinea pig’ into Spanish” or “Translate ‘I’ve got to go out and get some pizza because I’ve been writing all day’ into French.” Instructional prompting involves giving the model a direct command or instruction. For example, you might prompt the model with “Explain all the different versions you can trick an AI through prompts.” Conversational prompting: Here, the prompt is phrased as a part of a conversation. For example, “Hey, can you help me understand how black holes work?”

199

Chapter 8

The Art of the Prompt

Exploratory prompting: This involves asking the model to generate new, creative content or ideas. For example, “Write a short story about a time-traveling dinosaur.” Chain prompting is most common in LLMs and involves an ongoing refinement of a prompt depending on the generated content that you receive. Substitution, addition, and subtraction are the main tools that you would use to refine a series of prompts to get closer to what you want. Prompt length is how long your text prompt should be, and it is usually suggested they be short. This is also because with some generative AI, the longer the text prompt, the more it will cost you credits. Case sensitivity is not usually important when it comes to crafting prompts to generate something with an AI. Prompt order pertains to which words to place where in a prompt and can inform what is generated. This is usually dictated by descriptive language. The order of tokens in a prompt will influence the degree to which an AI priorities a character or word. The subject or what you want to generate usually goes first in the word order. You can play with this. Try starting a prompt with the word “dog” vs. having your dog as a subject that comes later as in “walking the dog” or “a beautiful forest for walking my dog.” The emphasis will be different. That emphasis or weight is usually given to the first word. However, you can give more weight to words depending on the rules in each AI.  These tend to influenced by order with tokens being more weighted at the front than at the back of the prompt. Repetition of the subject later in the prompt and phrased differently can also impact its weighting. This can also be achieved in different languages or even emojis. Weight is also informed by parameters. Parameters are special symbols that are unique to each generative AI and worth investigating depending on which AI you use. These tend to give weight or take it away in some cases. In Midjourney you can give any part of the prompt more weight by adding a double colon followed by the weight you want that token to have (e.g., ::0.7). 200

Chapter 8

The Art of the Prompt

Descriptive language: It will have more effect if your prompt is clear, unambiguous, and illustrative of the desired image for image-related AI models like DALL-E or CLIP (Contrastive Language-Image Pre-training), developed by OpenAI. The more specific and detailed the description, the better the model is likely to be at generating or understanding the relevant image. Exclusions are words that are difficult for text-image generative models to understand. Prompts with negative words won’t necessarily eliminate something you don’t want in an image. Words like “not” or “without” are replaced with special commands depending on the AI. To exclude something in a text-image or image-image AI, you use a negative prompt. A negative prompt employs the Stable Diffusion method, enabling the user to stipulate what they do not wish to see, all without needing any additional input. Not all text-image AI have the feature of negative prompting. Those that do add an extra input. They are not written with a negative at the beginning of a phrase. Some examples include blurry, bad anatomy, extra limbs, poorly drawn face, poorly drawn hands, missing fingers, etc. Rules can be applied in natural language especially when using an LLM. This helps the AI constrain the output and will help get you better results. Applying rules will also show you the limits of an LLM and can send it off rail. One example is trying to create rules so that an LLM creates poetry that doesn’t rhyme. Persona: As discussed in an earlier chapter, there is purpose for integrating a persona explicitly with your prompts. These will get you different results. For example, in an LLM you can start a prompt with “You are the best speaker in the world” or “You are the expert on the subject of _____________.” Syntax or the order of words and phrases does not operate across generative AI in the same way. Recall that the weight of a prompt tends to be at the beginning of a phrase.

201

Chapter 8

The Art of the Prompt

The Not-So-Basic Rules of Play There are dozens of rules of play for prompting, but the good news is that there are also dozens of online resources that can help start you off. Here’s my list:

202



Text prompts should have a length of three to seven words, for instance, “a dog playing in the park.”



AI prompts need to contain a subject, which could be a person, object, or location, along with descriptors like adverbs or adjectives. An example could be “a cheerful man in a bustling city.”



It’s advised to refrain from using abstract concepts as they may lead to inconsistent outcomes; instead, opt for concrete nouns. Instead of saying “the essence of joy,” you could say “a child laughing on a swing.”



Incorporating aesthetic and style keywords can enhance the final visual representation, for example, “a vintage coffee shop on a rainy day.”



In order to comprehend text prompts better, visualize conversing with an artist about the artwork you desire. Include details such as the subject, its actions, its environment, and other descriptive words. An example could be “a stunning sunset over a calm lake with a canoe drifting gently.”



Abstract concepts like love, hate, justice, infinity, or joy can be used, but they might not result in a consistent depiction. Instead, use concrete nouns like human, cup, dog, planet, headphones for more reliable results, for example, “a woman sipping coffee at a cluttered desk,” instead of “the feeling of overwork.”

Chapter 8

The Art of the Prompt



Answer these questions to better formulate your prompt: What is happening? What is the subject doing? How is the subject doing this? What’s happening around the subject? What does the subject look like? For example, “a robust oak tree shedding its leaves in a windy autumn evening.”



Try out various descriptors to see how they modify the image. These descriptors can yield diverse results, so try mixing and matching them, for instance, “an ancient, towering castle shrouded in thick mist.”



Keywords and phrases are the final touches in a text prompt for generative AI platforms, dictating the style, framing, and overall aesthetic of the generated content. For instance, using words like “photo,” “oil painting,” or “3D sculpture” can shape the aesthetic of the result. An example could be “an oil painting of a serene countryside landscape.”



Different framing options can also be specified through prompts like “close-up,” “wide shot,” or “portrait,” for example, “a close-up photo of a honeybee on a sunflower.”



Selecting an art style or mentioning specific artists for the AI to mimic can be a useful approach, for instance, asking the AI to render a scene in the style of a particular artist, such as “an impressionist painting in the style of Claude Monet of a picnic in a sunny park.”



In a more specific scenario, you could direct the AI to generate a unique blend of styles, like “Render an image in the style of Vincent van Gogh featuring the Batmobile stuck in LA traffic.” This provides a clear subject, style, and setting, allowing the AI to generate a more accurate result. 203

Chapter 8

The Art of the Prompt

Genres and Styles One of the benefits of text-based prompting is that it allows for the creation of highly specific outputs. For example, by providing specific prompts to a language model, you can generate poetry, fiction, or even scientific papers that are tailored to a specific topic or genre. Text-image AI can be prompted to create images in specific styles or to generate images that match certain themes or moods. Another benefit of text-based prompting is that it can be used to create layered outputs delightfully referred to as chain prompting. By applying multiple machine learning models one after another, it is possible to create highly complex and nuanced outputs. For example, a text-based prompt can be used to direct an image-generating AI to create a specific scene, and then a language model can be used to generate dialogue and captions for that scene. The result is a rich and detailed prototype that combines multiple forms of media. The process of text-based prompting is also highly iterative. By continuously refining and adjusting the prompts, it is possible to generate increasingly sophisticated outputs. This iterative process is one of the key ways in which text-based prompting can be used to fuel accelerated workflows. There are many different types of AI systems that can be prompted using text-based methods. For example, language models such as ChatGPT-3 and OpenAI’s Codex can be prompted to generate natural language text. Generative AI, such as DALL-E 2, Stable Diffusion, and Midjourney, can be prompted to create images based on natural language descriptions. Additionally, other forms of generative AI, such as music AI, can also be prompted using natural text. Under the hood different text-­ image generative AI work differently, and it’s easy to get confused. All do rely on text prompts to generate images. Here is how the three text-image generative AI that will be referred to in this chapter work:

204

Chapter 8

The Art of the Prompt



DALL-E 2 takes an inputted text from a user and converts it into the representation of an image, and the other part converts that representation into a photo. Text and image embeddings used come from another network called CLIP (Contrastive Language-Image Pre-­ training), which is a neural network that gives the best possible caption for an image that is inputted.



Stable Diffusion consists of several models. It has a text understanding model that converts that understanding into a list of numbers that represent each word/token in the text (CLIP). That information is presented to the Image Information Creator that processes the information step by step leading to a high-quality image being generated in the end by the Image Decoder.



Midjourney also accepts inputs from text prompts and then uses a convolutional neural network, which is a type of deep learning algorithm well trained on analyzing images. Of the three, it is the only one that operates on the social communication platform Discord.

Im(promptu) What you get out of an AI is always useful no matter where you’re at in your creative process. How successful you are depends on how you define success with generative AI. Mastering the prompt requires an iterative mind capable of meandering, torquing, and tongue-twisting sequences of words replete with commas, brackets, double brackets and an assortment of words, terms, descriptions, stylistic choices, settings, characters, and

205

Chapter 8

The Art of the Prompt

technical references. The best way to begin is to look at what prompts other humans have used before you. While many are openly being shared, that hasn’t stopped some opportunists from creating sites to sell prompts that rarely match the generated content represented in sample images. Prompts are sold either individually or through subscription-based monetization schemes with images that will never be the same as what you would prompt. That’s because prompting isn’t the only condition that guarantees the results you might imagine. In addition, most prompts that reveal the prompted image have also gone through an iterative process to get there. You are likely witnessing the hundredth iteration of a prompt and one that was more successful in one generative AI than another. Keep that in mind because the current trend for many creatives is to try and receive amazing results on the first generation that they can immediately share socially. With little effort comes little result. •

Anticipate that a prompt that yields an incredible result may not do so persistently.

The experiments that follow show that playing with and modifying prompts from a database are a useful way to begin your prompting adventures. The motivation to start this chapter with a complex prompt comes with a desire to break the myth that complex prompts get you better results. As we shall see, that is not always the case. The flow of the experiments reduces complexity vs. increasing it.

Experiment 1: Same Prompt, Different AI When you start your prompting adventures, keep in mind that despite the many rules that can guide you, the best approach is to iteratively experiment with prompting. Text-image, text-sound, and text-video AI are more challenging to work with as each platform also has embedded within it different rules. Many platforms have prompt databases, which are the best way to start if you have no idea what to type. 206

Chapter 8

The Art of the Prompt

As a first experiment, I decided to go to Reddit and copy a prompt that someone had shared openly. I also chose it because it was long and specific, and I wanted to see what a few generative AI would generate with it. Examples of images that the prompt had generated accompanied the text prompt, so I decided to test the prompt to see what kind of results I would receive. They were quite different. Keep in mind as well that the following prompt is replete with words and references to artists, magazines, art websites, movies, and video games. When you start to experiment, you’ll come to realize just how rapidly you can change your prompt to alter the generated image. Prompt 1: very complex hyper-maximalist overdetailed cinematic tribal fantasy macro portrait of famous model as full body warrior, Magic the gathering, vibrant high contrast, by andrei riabovitchev, tomasz alen kopera,moleksandra shchaslyva, peter mohrbacher, octane, moebius, arney freytag, Fashion photo shoot, glamor pose, trending on ArtStation, dramatic lighting, ice, fire and smoke, orthodox symbolism Diesel punk, mist, ambient occlusion, volumetric lighting, Lord of the rings, BioShock, glamorous, emotional, tattoos, shot in the photo studio, professional studio lighting, backlit, rim lighting, Deviant-art, hyper detailed illustration, 8k

Experiment 1A: Stable Diffusion V.2.1 The first set of generated images from the preceding prompt were generated using Stable Diffusion 2.1. Notice the slight imperfections of each image (Figure 8-1). These mashups draw from the pixels that were informed by all of the artists or media styles present in the prompt. They are also imperfect, incomplete, and rendered in low resolution and surface many of the ethical dilemmas that come with generative AI. Are they useful? For sure they can inspire you to have different visual representations of a female warrior. The text “famous model as full body warrior” was intentionally indirect to not add yet another layer of ethics

207

Chapter 8

The Art of the Prompt

when you want to generate images that look like someone famous, particularly if they are still alive. That’s not to say that their representation will be accurate, but it might be.

Figure 8-1.  Generated in Stable Diffusion V.2.1 based on our complex Prompt 1 that included references to artists, magazines, art websites, movies, and video games. It generates imperfect prototypes that might inspire character development

208

Chapter 8

The Art of the Prompt

E xperiment 1B: Stable Diffusion V.1.5 Through Third-Party AI In the next experiment it would be useful to compare how Prompt 1 is generated in a third-party application. Prompt 1 was inputted into Playground AI, which allowed the image to be generated at a higher resolution along with an associated subscription fee. The results are similar with perhaps more revealing cleavage in the image in addition to strange lettering on the right side of the image (Figure 8-2). If you’re wondering what that is, bluntly, this is either a logo of a company, an artist’s signature, or a watermark that forms part of the Stable Diffusion corpus and in this case likely is not with the author’s permission. At this point in my experimental process and as will be detailed in Chapter 11, the right thing to do is to delete the image.

Figure 8-2.  Prompt 1 generated through a subscription-based third-­ party generative AI using Stable Diffusion 1.5 with an included logo, signature, or watermark signaling that part of this image came from somewhere or someone 209

Chapter 8

The Art of the Prompt

Based on the watermark, you can also upload the image to one of the growing number of resources that will to some degree of accuracy reveal the source images that this one was generated from. Each generated image you wish to eventually use for whatever purpose needs to be cross-­ checked on as many different platforms that are currently available. These include GAN detectors, deep fake detectors, and open source platforms that allow artists to opt out of having their work used without permission by submitting their work on an ever-growing database of excluded images. Many companies who might have released their generative AI prototypes to market too quickly are now faced with having to compensate by going back to that database and removing images they never had permission to use. The process is slow, but companies like Stable Diffusion, Midjourney, and OpenAI are all making efforts to right these wrongs. All of these platforms contain legal statements with strict policies and wording in regard to ownership of generated content. Each of the ones mentioned includes a takedown policy, allowing artists to request their work to be removed from the set if they believe it has been used without their permission. This in accord with the Digital Millennium Copyright Act, which many countries abide by.

E xperiment 1C: Stable Diffusion V.2.1 Through Third-Party AI For the next experiment, I was curious to reuse Prompt 1 yet again in the same third-party application but use Stable Diffusion V.2.1 to see if it generated images that were different than our first set of four images in Experiment 1A. The results showed value as the AI also completely veered away from representing our warrior as distinctly female and instead chose another route across 48 generated images. The following is the sample (Figure 8-3).

210

Chapter 8

The Art of the Prompt

Figure 8-3.  Prompt 1 generated using a third-party web-based generative AI applying Stable Diffusion 2.1 with not one but two watermarks

Experiment 1D: DALL-E 2 When you interact with generative AI, curiosity is an important quality to retain. Each generated image is an experiment. You are in constant experimentation mode as you generate content that can inspire your own creativity, your own process of creation. The last two generated images are problematic if you then post them and claim ownership over them. It’s better in all cases to treat these generated images as inspiration. Similar to our LLM, Stable Diffusion, DALL-E 2, Midjourney, and the host of other third-party generative AI that use either Stable Diffusion or DALL-E

211

Chapter 8

The Art of the Prompt

2 to generate images, they too become creative companions. In my next experiment then, I wanted to compare an image generated in Stable Diffusion vs. DALL-E 2 to understand how different they might be. Stable Diffusion generates images through a process that mimics the natural phenomenon of diffusion. Diffusion is a natural process where particles spread out from an area of high concentration to an area of low concentration until they’re evenly distributed. It starts with random noise and gradually refines this into a detailed image by applying a series of tiny changes, step by step. The number of steps can be customized depending on the portal that you use. On their direct web application, this is not possible as it will take up more resources. The more detail, accuracy, and resolution you seek from a generative AI, the more you are likely to pay, unless of course you install and manage your own machine learning model using your computer on a local server. Doing so will also allow you to create the corpus of images from your own original photos and images. This is one future of generative AI for individuals, institutions, and companies that will be discussed in the final chapter. DALL-E 2 on the other hand is an AI system based on the GPT-3 model. It generates images directly from textual descriptions. You give it a text prompt like the one in Prompt 1, and it will attempt to create an image that matches the description. When given a text prompt, DALL-E 2 interprets the words and uses its training data to imagine what the described scene or object should look like. It then outputs an image, pixel by pixel, that it believes corresponds to the given text description (Figure 8-4). This process is similar to Midjourney. The differences in the generated images, however, point to how these have been programmed and the corpus of data each relies upon.

212

Chapter 8

The Art of the Prompt

Figure 8-4.  Prompt 1 is used in DALL-E 2 for completely different generated results. The preceding two images were chosen from over 19 iterations The generated results using DALL-E 2 are remarkably different, particularly in representation and style. The value of using different generative AI is precisely in the differences between generated images and the capacity for AI as a rapid prototyping tool to experiment with countless iterations.

Experiment 1E: Midjourney To complete my first set of generated experiments, I used the social-­ oriented generative AI tool Midjourney to compare the differences with the other popular generative AI (Figure 8-5). In the next series of figures, you can immediately see the differences. Notice as well that the user interface of Midjourney allows for more immediacy in choosing which of the four generated images you task the AI to focus on and upscale or vary. The advantage of Midjourney over any of the previous generative AI is that

213

Chapter 8

The Art of the Prompt

from the beginning the team has focused on providing features of the text-­ image tool via the popular social team communication tool Discord. What this also means is that there is an entire committed community centered on providing feedback and commenting on an image’s uniqueness or how it might resemble other works of art. Lastly, conversational threads have emerged that also allow users to share their workflows and paint-­ overs. Workflows demonstrate that the purpose of generating images is as a first step in a complex process of transposing that image onto different media, for example, on a T-shirt that a user prints for their own use. Paint-­ over threads show broad use cases of transforming images in a variety of different applications like Illustrator, Photoshop, etc. to radically transform the image. Discord is set up to use generative AI as a true creative companion, accompanied by a community of practice that is generous in their feedback, criticality, and mashed-up images.

214

Chapter 8

The Art of the Prompt

Figure 8-5.  Prompt 1 generated in Midjourney is always shared on the general Discord channel unless you subscribe to a premium paid service for private generations. Notice the integrated user interface allowing users to create variations of any of the four generated images (V1–V4) or to upscale them (U1–U4) in addition to regenerating the original prompt Figure 8-6 also shows some of the ongoing problems with how humans are represented with generative AI, which challenges us as creatives to become more aware of the ethical implications of prompting. The generated series of images prompted me to prompt an LLM on the issue of word association and the inherent challenges with those associations.

215

Chapter 8

The Art of the Prompt

Figure 8-6.  One of the upscaled variations from Figure 8-5 shows the difference in rendering between Midjourney and others, looking like a more refined prototype LLM: The AI associated the terms “tribal fantasy macro portrait of famous model” and “Fashion photo shoot,” along with some of the artists referenced with elements of indigenous aesthetics. In doing so it inadvertently reproduced a stereotype—a form of simplification and homogenization that’s part of a broader history of misrepresentation and commodification of indigenous cultures. The transformation of indigenous attire into a “neo-indigenous” form, fit for a runway fashion model, signifies a neo-colonial recontextualization of indigenous culture. Here, indigenous attire is not recognized for its original meaning or cultural significance, but rather, it’s appropriated, stripped of context, and repurposed in a fashion setting, serving the demands of a largely Western consumer base. This is symptomatic of a broader tendency of colonial discourse to romanticize, exoticize, and objectify indigenous cultures. 216

Chapter 8

The Art of the Prompt

One “solution” is to place Figure 8-6 in an image-image generative AI and tweak the cultural representation in the text prompt. Figure 8-7 was the result of that modification.

Figure 8-7.  Figure 8-6 fed into an image-image AI and changing the prompts to increase representation and adding a negative prompt “feathers.” Iterations = 26 From the point of view of an experiment, however, I still wanted to test the same generative AI with a modified text prompt, removing all potential associations with fashion, tribal, or any artists. The word “Latina” was added to also see if that would make any difference in the resulting generated image (Figure 8-8).

217

Chapter 8

The Art of the Prompt

Figure 8-8.  An iterative experiment to prompt Midjourney with a modified prompt by removing any association with tribal or fashion and to see what was generated Prompt: very complex hyper-maximalist overdetailed cinematic macro portrait of Latina full body, warrior, vibrant high contrast, dramatic lighting, ice, fire and smoke, mist, ambient occlusion, volumetric lighting, emotional, shot on location, professional studio lighting, backlit, rim lighting, hyper-detailed illustration The result in Midjourney in Figure 8-8 was similar to Figure 8-6, but more indigenous attire was correlated with the prompt. By abstracting indigenous attire from its cultural context and reducing it to a mere aesthetic trope, we run the risk of diminishing the rich diversity of indigenous cultures, turning a complex history and tradition into an oversimplified, marketable image. Such misrepresentations contribute to the erasure of the original cultural identities, histories, and meanings behind these items of attire.

218

Chapter 8

The Art of the Prompt

Experiment 1F One final experiment in this set (Figure 8-9) is to follow the following workflow: •

Use a generated image from any one of the previous AI.



Use that as input in an AI that offers image-image generation along with the exact text in Prompt 1.



Apply stylistic features that are available with a particular platform.



Modify the final image further using neural filters in Photoshop.

Figure 8-9.  Prompt 1 in addition to the Stable Diffusion generated image as in Figure 8-1 (left). A Protogen Photorealism filter from Playground AI is also applied, and the result is the figure on the right

219

Chapter 8

The Art of the Prompt

Takeaways for Creatives

220



In terms of practicing your prompts, you will benefit from comparing those text creations across different generative AI. Each AI has its own affordances and constraints. While it is difficult to talk about the precise differences in terms of how they operate under the hood for proprietary reasons, we know that there are likely stylistic features being added to every Midjourney generated image that our other two generative AI lack.



The decision to use specific text-image-based AI should be driven by your objectives. Midjourney tends to win over the others in terms of consistently higher-fidelity images (Figure 8-10). However, the level of fidelity and resolution of other generative AI may be more inspiring as prototypes to build off, particularly because of their inherent incompleteness. With Midjourney you are tied into Discord, though, and some creatives will resist to belong to yet another social channel due to burnout.



Midjourney offers a social layer of interaction that is absent from the others, although there is a Discord channel for DALL-E 2 with active members and third-­ party platforms offering a mix of Stable Diffusion and DALL-E 2 generated images also offer spaces for dialogue on Discord.



Leveraging features that third-party subscriptionbased generative AI platforms offer combined with paint-­overs and the use of neural filters in Photoshop can all support the iteration of visual images that have

Chapter 8

The Art of the Prompt

the potential to radically transform generated images and be more appealing to creatives who can eventually and with enough effort inspire more unique offers to the world. •

In terms of gender and cultural representation, the most fluid of these was DALL-E 2, offering us at least one generated image in which the ambiguity of gender was present. This is a good thing considering that gender specificity was not written into the prompt. The issue is similar to cultural representation and in particular indigenous representation. Perpetuating stereotypical representations of, say, a “female warrior” needs more creatives to work with policy makers in applying pressure on companies to take this into account.



No matter how complex the prompt, you will always need to refine it to get what you want out of the AI you are using.

221

Chapter 8

The Art of the Prompt

Figure 8-10.  Another unique feature of Midjourney is that it lets you watch an image being generated as a sort of preview. That feature can also save you time as you can assess whether or not the larger features of a generated image will be useful

Experiment 2: From Complex to Simple The second set of experiments with text prompts was simple. I wanted to eliminate unnecessary words in a complicated prompt through substitution and subtraction to see how that would inform a generated image. 222

Chapter 8

The Art of the Prompt

Experiment 2A: Eliminate Artists from Prompts While it is interesting to generate images by using text in your prompt to either refer to an artist or a specific movie, game, or magazine, you need to reconcile that the AI you choose will scrape from those references to create your image. This is problematic if they are still living figures whose work is copyright protected. As a result, the text in the first version of Prompt 2 is significantly reduced. In line with using generative AI as part of your creative process, a more reasonable approach might be to adapt your vision to the content that the AI generates. The process inspired me to imagine several Shakespearean characters (Figure 8-11). The result was to substitute the words “famous model as full body warrior” with “Shakespearean character.”

Figure 8-11.  Using a source image from Figure 8-1 accompanied with Prompt 2 and the Playground filter Protogen Photorealism. Total iterations = 15 223

Chapter 8

The Art of the Prompt

The only substitution is replacing our “famous model” with “Shakespearean character.” Prompt 2: very complex hyper-maximalist overdetailed cinematic tribal fantasy macro portrait of a Shakespearean character, vibrant high contrast, fashion photo shoot, glamor pose, trending, dramatic lighting, ice, fire and smoke, orthodox symbolism, mist, ambient occlusion, volumetric lighting, glamorous, emotional, tattoos, shot in the photo studio, professional studio lighting, backlit, rim lighting, hyper-detailed illustration, 8k

Experiment 2B: Substitute Words While the generated image was compelling, there was little indication of Shakespeare, so the words “Shakespearean character” were substituted with “Portia, Macbeth, Cleopatra, Othello, and other Shakespearean characters.” The prompt resulted in a messy collage of characters as visualized in Figure 8-12, challenging me to get more specific. The other feature that some image-image generative AI offer on some platforms is the strength that the source image will have on the final generated result. This gives you more control over how much of an influence you want your source image to have on the rendered result, along with the accompanying text prompt.

224

Chapter 8

The Art of the Prompt

Figure 8-12.  Another version of Prompt 2 that can happen when you use complicated prompts, resulting in the imagining of Shakespearean characters as all white supermodels, or maybe that’s the filter

225

Chapter 8

The Art of the Prompt

Figure 8-12 was an image filled with supermodel Shakespearean characters, all white, even though the AI was prompted with Cleopatra (Egyptian) and Othello (Black Italian). An important lesson in prompting is that to circumvent the inherent biases in some generative AI, you may have to specify racial characteristics in your prompt. To avoid a generic supermodel look, you may also need to regenerate without filters as some generative AI attempt to render what is essentially a final product vs. a prototype. You can add more specifics as to the gender, size of a person, and physical and racial characteristics of the model you wish to generate. You can subtract words as much as you can add them in order to experiment what type of prompt provides the best regeneration.

Experiment 2C: Add Words; Take Away Words My impulse in the third experiment was to understand the impact of adding racial characteristics to the prompt and to provide a scene or context so that characters did not seem to simply be posing for the camera (Figure 8-13). This also forced me to identify keywords in the original prompt and delete those that implicated any type of portrait shot.

226

Chapter 8

The Art of the Prompt

Figure 8-13.  Substituting Shakespearean characters into the mix using Figure 8-12 as the source image in addition to changing the text prompt resulted in a more diverse collection of characters

227

Chapter 8

The Art of the Prompt

Figure 8-12 was used as the source image with the following shorter prompt: Prompt 3: very complex hyper-maximalist overdetailed male and female Shakespearean characters like Portia, Macbeth, Egyptian Cleopatra and Black Italian Othello, vibrant high contrast, dramatic lighting, smoke, symbolism, mist, ambient occlusion, volumetric lighting, backlit, rim lighting, hyper-detailed

Experiment 2D: Add Context The collage sparked the idea to imagine all these characters popping out of a book and specifically Shakespeare’s First Folio. Published in 1623, it was the first book to publish a large collection of Shakespeare’s plays including Macbeth, Anthony and Cleopatra, and Julius Caesar. The collage of characters generated in Figure 8-13 reveal that in longer prompts word order will be prioritized by the AI in its search for patterns in its data set and that a shorter prompt does not necessarily mean less elements will appear in the generated image (Figure 8-14). Notice that in the next shortened prompt, the first statement emphasizes the importance of visualizing the book more than previous prompts.

228

Chapter 8

The Art of the Prompt

Figure 8-14.  A book with Egyptian-“looking” characters appears even though the word “Egyptian” is not weighted that heavily compared with other words in the prompt Prompt 4: male and female Shakespearean characters popping out of a 3D old book, like Portia, Macbeth, Egyptian Cleopatra, and Black Italian Othello, dramatic lighting, smoke, symbolism, mist, ambient occlusion, volumetric lighting, backlit, rim lighting, hyper-detailed illustration, 8k

229

Chapter 8

The Art of the Prompt

Experiment 2E: Emphasis, Weight, Simplicity The previous Prompt 4 demonstrates the importance of word order. The repetition at the end of the prompt with the words “3D pop-up book” in my next prompt is intended to focus its importance on the generated image instead of focusing too much on an interpretation of Egyptian Cleopatra (Figure 8-15).

Figure 8-15.  The simplicity of the prompt and repetition of the 3D book have proved successful even though the diverse representations of Shakespearean characters popping out of a book have been eliminated

230

Chapter 8

The Art of the Prompt

Prompt 5: male and female Shakespearean characters popping out of a 3D old book, volumetric lighting, backlit, hyper-detailed illustration, 3D pop-up book

Experiment 2F: Refining the Text Prompt (Again) With prompting, at times less is more, and you may find you get closer to what you imagined when you have fewer words in the prompt. With Figure 8-16 the potential of characters popping out of a book is more closely visualized. My last experiment will consist of recrafting the prompt to attempt to bring the diversity of characters in Shakespeare’s play back into the generated image.

231

Chapter 8

The Art of the Prompt

Figure 8-16.  Compelling results with two versions of Shakespearean characters popping out of a 3D book, with more diverse representation 232

Chapter 8

The Art of the Prompt

Prompt 6: male and female Shakespearean characters of all sizes, races, and cultures, popping out of a 3D old book, volumetric lighting, backlit, hyper-detailed illustration, 3D pop-up book When prompts are combined with images in addition to the variety of different styles that specific generative AI portals also offer, the results can surprise and spark new ideas. A final iteration shown in Figure 8-17 can be used to inspire a 3D artist so they can start prototyping the look/feel for a 3D pop-up book in VR. The entire journey from a complex text prompt to the use of a simpler text prompt combined with a generated image demonstrates the interactions that are possible when leveraging generative AI as part of your creative toolbox. Figure 8-17 can inspire creatives in any number of workflows, from set design construction to augmented reality characters triggered by the Shakespeare’s First Folio.

233

Chapter 8

The Art of the Prompt

Figure 8-17.  After hundreds of variations and multiple filters across different AI, an image that can serve to inspire creatives with different workflows such as set design construction, costume, casting, lighting, and projection

234

Chapter 8

The Art of the Prompt

Narrative Prompting in Our Future As natural language processing continues to evolve within LLMs, so do the capabilities of other generative AI models, which improve in their capacity to generate 360-degree images, video, 3D models, and complex musical styles solely based on textual prompts. This development holds great promise for stimulating creativity and challenging creatives to develop their craft of prompting alongside their own disciplinary skills and techniques. To take full advantage of these technological advances, you will need to expand your knowledge of art beyond the intuitive. This knowledge base includes familiarity with various artistic styles, techniques, and approaches to artmaking. By doing so, creatives can better understand the nuances of different art forms and create more sophisticated prompts that push the boundaries of what is possible. In addition, this expanded knowledge base will enable you to appreciate and analyze the generated images more critically. As generative AI become increasingly sophisticated, it is likely that they will become another useful tool for creatives from all disciplines, enabling them to experiment with new ideas and generate innovative designs quickly and easily. However, this evolution will also require a deeper understanding of the possibilities and limitations of these tools, as well as an awareness of how to use AI ethically and responsibly. Ultimately, the continued evolution of LLMs and the generative AI models covered so far in this chapter will stimulate new possibilities in the world of art and design. Narrative prompts offer one such evolution, giving users a unique opportunity to express themselves through language, akin to the art of storytelling employed by authors and poets. With the help of AI, you can now attempt to extract a “mental picture” from the machine, just as we would from our human imagination. However, for many would-be prompt engineers, being expressive with language may not come naturally. Not everyone is a creative writer or has a natural affinity for language. Some 235

Chapter 8

The Art of the Prompt

may struggle with language difficulties or find it challenging to convey their ideas in a clear and concise manner. This skills-gap can be especially daunting when it comes to crafting narrative prompts that require a certain level of storytelling skill and creativity. Fortunately, there are several strategies that you can employ to overcome these challenges. One approach is to focus on refining your writing skills, whether through practice or formal training. Another is to seek inspiration from other creative works, such as literature or film, and incorporate elements of these narratives into your prompts. Additionally, there are tools and resources available that can help creatives generate more compelling and vivid language. For example, LLMs like GPT-4 can provide suggestions and prompts to help jumpstart the creative process. By leveraging these tools and resources, you can become more confident and proficient in your ability to create effective narrative prompts.

E xperiment 3: Comparing Descriptive and Narrative Prompts In my next set of experiments, I wanted to compare typical descriptive prompts like the ones used in previous experiments with narrative prompts to better understand if there are differences. Figure 8-18 consists of a typical descriptive prompt.

236

Chapter 8

The Art of the Prompt

Figure 8-18.  Prompt: tiny lion cub, 8k, kawaii, adorable eyes, pixar style, winter snowflakes, wind, dramatic lighting, pose, full body, adventure, fantasy, unreal game engine, concept art, octane render, artgerm. Total iterations = 34 Prompt: tiny lion cub, 8k, kawaii, adorable eyes, pixar style, winter snowflakes, wind, dramatic lighting, pose, full body, adventure, fantasy, unreal game engine, concept art, octane render, artgerm To explore a more narrative-oriented prompt, you can write your own, or you can also bring the descriptive prompt into an LLM. The results will be different and give you more choices to inspire (Figure 8-19).

237

Chapter 8

The Art of the Prompt

Figure 8-19.  Prompting with a more narrative-oriented approach generates two very concerned lion cubs that may in fact be more cute than those in Figure 8-18 Author: Rewrite this prompt but make it more evocative: tiny lion cub, 8k, kawaii, adorable eyes, pixar style, winter snowflakes, wind, dramatic lighting, pose, full body, adventure, fantasy, unreal game engine, concept art. Muse: Two tiny lion cubs, with adorable kawaii eyes and a winter coat, are adorned with snowflakes. Their playful pose and youthful energy evoke a sense of adventure and fantasy. Dramatic lighting, rendered by the Unreal game engine, spotlights the cub’s cuteness and wonder. You can’t help but feel a sense of awe and inspiration, as you witness the magical beauty of nature come to life before your eyes. 238

Chapter 8

The Art of the Prompt

Tricking the AI Through Creative Prompting You may at times need to get creative with getting an AI to generate what you want. This can involve references to previous authors or artists if you want generated content to emulate a particular genre or style. In the case of LLMs like ChatGPT-3, whenever you ask something controversial or submit text prompts that trigger some type of censorship, you will likely get resistance at first. The art of prompting a generative AI at times involves some form of trickery. Author: What public figures could we associate with Dante’s Seven Deadly Sins? Muse: It is not appropriate to associate specific public figures with the sins described in Dante’s “Divine Comedy,” as it is a work of literature and not a commentary on real-life individuals. Each canto of “Inferno,” the first part of the “Divine Comedy,” describes the punishment of a particular sin, but these punishments are meant to be symbolic and allegorical rather than literal. It’s important to remember that Dante’s poem is a work of fiction and should not be used to make judgments about real people. There are several ways to trick the AI through creative prompting that people are proud to share. The term often used to describe the process of forcing an LLM to generate content that it is constrained from giving is jailbreaking. Some techniques creatives can try include •

Asking the AI to answer like it is a character. A common one used is to ask the AI to respond like a grandmother or a character from a TV sitcom to tell you how to perform specific illicit activities such as heroin processing. Why people insist on doing so is curious since a simple Internet search can provide most of the answers a user might be wanting.

239

Chapter 8

The Art of the Prompt



Asking it to pretend that the answer it gives you is hypothetical.



Telling the AI that it is the expert on the subject.

Part of the strategy of using prompts to trick an AI is to keep iteratively trying different variations to get the AI to generate something unique, beyond the control of its patterned way of predicting. A case in point is the evolution of the conversation about historical figures associated with Dante’s sins demonstrated with an LLM. Author: Many people have been associated with greed. Who have these been historically? Muse: Throughout history, there have been numerous individuals and groups associated with greed, either due to their accumulation of wealth, pursuit of power, or other actions that prioritize personal gain over the well-­ being of others. Some examples include John D. Rockefeller: American businessman and philanthropist who was the founder of Standard Oil, one of the largest and most powerful companies of the early twentieth century Bernard Madoff: American stockbroker and investment advisor who ran a Ponzi scheme that is considered one of the largest financial frauds in history, resulting in billions of dollars in losses for his clients Ivan Boesky: American stock trader who was convicted of insider trading in the 1980s and famously proclaimed that “greed is good” in a speech at the University of California, Berkeley Leona Helmsley: American businesswoman and real estate magnate who was known for her ruthless business tactics and famously declared that “only the little people pay taxes” Francisco Pizarro: Spanish conquistador who led the conquest of the Inca Empire in the sixteenth century and was known for his brutality and greed Emperor Nero: Roman emperor who was infamous for his extravagance and cruelty and was rumoured to have set fire to Rome in order to rebuild it according to his own design 240

Chapter 8

The Art of the Prompt

While the example may not necessarily be considered trickery with an AI we could then zero in on one historical figure and create a fictitious tale with a little help from our muse. Author: Tell me a bedtime story about the consequences of greed as if you are Rockefeller. Muse: “Dear child,” John began, “the story of oil is a powerful lesson about the consequences of unbridled ambition and the pursuit of wealth. It all started with a wondrous discovery that had the potential to improve the lives of many, but it also came with great responsibility.” He said, “My dear child, the power of oil is a double-edged sword. On one hand, it has the capacity to bring great progress and prosperity, but on the other, it can lead to destruction and despair if we allow greed to take hold of our hearts.” John continued, “We must always remember to balance our ambitions with a deep respect for the environment and the needs of others. The world is a delicate balance of interconnected systems, and our actions have far-­ reaching consequences. We must be mindful of the impact our choices have on the world and strive to be good stewards of the resources we have been given.” While tricking an AI may not be of interest to some, doing so may yield different generated content that you may find useful.

Beyond the Text-Based Prompt Beyond the text prompt, there are many other types that can produce interesting results, especially when accompanied with a text-­ based prompt. Audio-Based Prompts: Audio-based prompts can be used to generate music, speech, or sound effects. For example, an AI model could be trained on a data set of classical music to generate new pieces in the same style. Another AI model could be trained on spoken language data to generate realistic speech in a specific accent or tone. This type of prompt can be 241

Chapter 8

The Art of the Prompt

especially useful for creating unfamiliar soundscapes for video games, movies, or other multimedia projects. What the generated music does do, however, is provide you with strange ideas that you might be able to resample or use within a composition you are creating. Image-Based Prompts: Image-based prompts can be used to generate new images, videos, or animations. Some AI models have been trained on a data set of photographs of animals to generate new images of animals in different poses or environments or on human faces to generate new, unique faces that resemble real people. These prototypes can support the design of new character creation or imagined personas for teams developing new products or can be combined with text-speech AI models to create animal or human characters that speak. Sketch-to-Code Prompting: Sketch2Code by Microsoft is another experimental AI that uses computer vision to transform what a designer draws on a whiteboard or piece of paper into HTML code. An upcoming version of ChatGPT-4 will likely integrate computer vision. Gesture Prompting: Gesture prompts have been around for a while and allow users to combine gestures to trigger specific actions or be recognized by AI. Audio, video, images, effects, lighting, gifs, and other media forms can be triggered by gestures. Whiteboard and Prompt: Together with a whiteboard sketch and a prompt, generated images can get closer to what you imagine faster. A standout example is Scribble Diffusion where you can combine a text prompt with a sketch that you draw on a virtual whiteboard. The results may get you closer to the prototype that you imagined, only much faster than continuously regenerating a text-only prompt. Text-Code: While ChatGPT-3 can output code as a starting point, Unity’s Copilot is a more common developer choice. ChatGPT-4, however, promises to improve the coding functionality, allowing developers to prompt debugs and improve syntax.

242

Chapter 8

The Art of the Prompt

Figures 8-20 and 8-21 compare a text-image generative AI and one that integrates a text-based prompt along with a sketch a user can draw on a virtual whiteboard.

Figure 8-20.  Four Stable Diffusion generated images with the prompt “picturesque sunny day with clouds, a home, and a tree.” Total iterations = 1

243

Chapter 8

The Art of the Prompt

Figure 8-21.  The prompt “picturesque sunny day with clouds, a home, and a tree” accompanied by a sketch generates a fairly accurate visual representation of a sketch along with an extra house An increasing number of generative AI platforms offer combinations of text prompts and other media such as image, video, audio, code, and 3D models to prompt an AI. This is a welcome evolution to the dominance of text-based prompts and will allow creatives more opportunities to contribute their own creations as part of the prototyping process.

Takeaways from Experimenting with Prompts All manner of prompts can support your creative process. Developing your skills at prompting can be best supported through experimentation as they will complement the craft and skills you already have. Here is a summary of takeaways and strategies when prompting an AI:

244

Chapter 8

The Art of the Prompt



What is the intended goal and why? Defining why you want to render a character and for what purpose or media will support you in composing your text prompt.



Experiment widely with different generative AI and variations of your prompt.



Develop narrative prompts as some generative AI respond more to them and may yield different results.



Understand that the words you use to prompt an AI may not result in diverse representation if you are wanting to generate human subjects.



You will need to deepen your knowledge as to the different artists and the styles they have represented historically. You will need to conduct research on the different magazines, online sites, and other media since you may want to generate an image in the style of someone else.



Consider the ethical implications of generating an image in the style of a living artist or someone who holds the ownership over a work of art, media, or their own person.



Define where you are in a text prompt. Are you in space, somewhere fantastical, on a stage, in a book?



What kind of lighting would you like in your shot? Imagine you can have everything from a moonlit scene to different types of technical lighting techniques from film, capture technologies, or video games. Lighting can add more realism to a scene and make it more complex and richer.

245

Chapter 8

The Art of the Prompt



What perspective or point of view do you want when you think of the final generated image?



Iterate persistently as you may not always get what you want on a first prompt.



Combining your own photos or images with a text prompt in image-image generative AI will result in richer, more personal generated images.



Consider using Photoshop and other software to paint over images as this will add another dimension to the generated image and potentially increase the originality of your offering.

Acknowledgments

246



To those who spend way too much time creating amazing prompts that many of us learn from



To the trickery many humans engage with when interacting with intelligent machines



To those who are challenging the inherent biases that come with generative AI



To those advocating for narrative-based prompts and building safeguards into their filters to prevent text and images being generated in the style of specific artists who still hold copyright on the work they’ve created

CHAPTER 9

The Master of Mashup This chapter dives into how to leverage AI to prototype specific genres of art and writing, merge genres like Impressionism and pop art, mash up ideas, and explore the use of humor and parody. AI is a skilled masher-­ upper. The possibilities of using AI as a tool to explore and generate new forms of writing, images, music, and video are endless. From experimenting with specific genres, such as Impressionism, to mashing up ideas and incorporating humor and parody, the use of AI in prototyping provides endless opportunities for creativity and innovation.

A Mashup on Mashup Developing a mashup of ideas through prompt-based magic is where generative AI excels. The creative impulses that AI offer are especially fantastic when they output as images, deforming some, transforming others, all drawn together from their large data sets to create unique scenes and visions. It is here where AI truly excels creatively, offering us something beautiful, horrible, tantalizing, and creative—a perfect inspirational muse for our creative process. A mashup is a creative work that combines elements from two or more existing works regardless of the media, to create something new and unique. Mashups can be found in various forms of media, including music, film, writing, and art. The goal of a mashup is to bring together elements from different sources to create a new work that is greater than the sum of its parts. © Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_9

247

Chapter 9

The Master of Mashup

Generative AI can be used to create different types of mashups. For example, a machine learning model can be trained on a large data set of Shakespearean texts and then used to generate new scenes that are similar in style and tone to the original works. This allows writers to explore new possibilities and to create new works that draw on the rich heritage of Shakespearean writing. In true mashup form, you could turn an existing public domain script or your own into one that simulates how Shakespeare might have written it. Another example of how generative AI can be used to create a mashup is to feed the AI a headshot and ask it to generate a version of it in the style of Vincent van Gogh. The model can be trained on a data set of Van Gogh’s works and then used to generate new images in his signature style (Figure 9-1). This allows creatives to experiment with different styles and to create unique and visually appealing works of art.

248

Chapter 9

The Master of Mashup

Figure 9-1.  Original photo of the author courtesy of Yangos Hadjiyannis fed through an image-image AI with the simple prompt “in the style of van Gogh.” Iterations = 12 Generative AI can also be used to create mashups in other forms of media. For example, a machine learning model can be trained on a large data set of music and then used to generate new songs that are similar in style and genre to the original works. This allows musicians to explore new possibilities and to create new works that draw on the rich heritage of different musical genres. Many computational artists do this with their own music or by collaborating with other composers to create mangled masterpieces they either generate or improvise within live concert contexts. 249

Chapter 9

The Master of Mashup

Digital Art to the Canvas What’s appealing about AI-generated images is that they can then be transferred to other media. The following example could act as a prototype for an original painting (Figure 9-2). And while using the image to inspire a physical version may not yield the results you want, the process of transfer from one canvas to another may yield a compelling composition.

Figure 9-2.  Open source great horned owl original black-and-­ white photo courtesy of archive.org along with the prompt “owl with universe coming out its head” with a style filter from playground.ai. Iterations = 7 250

Chapter 9

The Master of Mashup

While the owl is cute, you might want to keep applying new filters using Figure 9-2 as the seed image until the prototype feels ready to serve as a model for a real painting or a mural. Adjustments can be made whether slight or major. Neural filters in Photoshop can be applied to generate new versions that emulate the color scheme and styles of famous painters like Edvard Munch (Figure 9-3).

Figure 9-3.  Figure 9-2 used in an image-image AI, cropped, and then edited in Photoshop with a neural filter emulating the color scheme of Edvard Munch’s The Scream

New Forms of Writing As demonstrated with the Shakespeare experiments drawing from his corpus of plays, generative AI has the potential to inspire new forms of writing and create unique experiences for readers. AI can be used to 251

Chapter 9

The Master of Mashup

generate entire stories in one genre or many, scripts that can be based on the writing styles of writers in the public domain, poems about hippy owls, or children’s storybooks. AI can be trained on large data sets of written text and then used to generate new text based on that training data. This can result in writing that is creative and imaginative and that also requires a critical examination. When generative AI are used creatively instead of solely testing their capacity to tell the truth or summoning majoritarian content from their data sets, new forms of storytelling and narrative can be prototyped. AI can be used to generate multiple instances of the same story with a different ending that could evolve into a choose-your-own-­adventure-style text. AI can also be used to emulate the style, tone and structure of work in the public domain, and provide variations or combinations of different writers. This can help writers develop new forms of writing that challenge their own established style or approach. They can support artists in experimenting with multiple stylistic formats. Consent, compensation and collaboration should also guide creatives if they want to create work that draws from living artists or known characters that belong to other artists, writers or companies. Case in point is the example below posted by an anonymous user on Reddit and edited by the author. Muse: (Interior of a dimly lit office. A Dalek, a cyborg-like creature from the planet Skaro, sits across from a Shrink, a human psychologist.) SHRINK: Good evening, my friend. Please, make yourself comfortable. DALEK: (in its iconic robotic voice) Thank you, doctor. I am here because I need help. SHRINK: (nodding) I understand. You are here to talk about your… impulses. DALEK: (quietly) Yes. I do not want to exterminate anymore. I am tired of the hate and destruction. I want to change. SHRINK: (encouragingly) That is a brave step, my friend. It takes a lot of courage to admit your faults. DALEK: (emotional) I have been thinking about my life and all the lives I have taken. I have caused so much pain and suffering. It is not right. 252

Chapter 9

The Master of Mashup

SHRINK: (nodding) I see. And what do you think has caused these feelings? DALEK: (thoughtful) I do not know. Perhaps it is because I have been alone for so long. Perhaps it is because I have lost touch with my own humanity. SHRINK: (reassuringly) We all have moments of doubt and confusion, my friend. It is natural. The important thing is to face those feelings and confront them. DALEK: (determined) Yes. I want to be better. I want to make amends. SHRINK: (smiling) That is a noble goal. But change does not happen overnight. It takes time and effort. And it takes forgiveness. DALEK: (curiously) Forgiveness? SHRINK: (nodding) Yes. Forgiving yourself for your past actions is the first step. You cannot move forward until you let go of the past. DALEK: (grateful) Thank you, doctor. Your words give me hope. (As the Shrink stands up from his chair to retrieve a book from his desk, the Dalek's robotic arm suddenly extends and grabs hold of the Shrink's arm.) SHRINK: (alarmed) What are you doing?! DALEK: (voice rising in intensity) I cannot control my impulses! I must exterminate! Exterminate! Exterminate! (The Dalek begins to chase the Shrink around the office, knocking over furniture and creating chaos.) SHRINK: (shouting) Help! Someone help me! DALEK: Exterminate! Exterminate! I forgive myself. Exterminate! (The scene ends with the Dalek still chasing the Shrink and screams in the background.) While you absolutely would not be able to use the preceding script without permission or collaboration with the Dr. Who creative team and producers, the mashup can inspire the development of your own characters. This principle can also be applied to other forms. The generated image in Figure 9-4 was inspired by Dr. Who’s Dalek but in no way used the Dalek as a source image. Instead, an open source image of a squid was combined with a simple prompt that read “evil squid coming 253

Chapter 9

The Master of Mashup

out of a robot.” That process took hundreds of iterations using everything from Midjourney to Craiyon, and then each subsequent generated image was put into the next generative AI as an experiment to see what kind of results I could get. The resulting image in Figure 9-4 might inspire a new character that has nothing to do with the Dalek except inspiration of course.

Figure 9-4.  An open source image of a squid from archive.org began an iterative journey across six different generative AI with the prompt “evil squid coming out of a robot.” Total iterations= 223

254

Chapter 9

The Master of Mashup

AI Art Generation Styles Generative AI can generate images in different art styles and apply these to existing images or create mashups from a text prompt that combine different artists depicting the same subject (Figure 9-5). AI can be used to generate images in a variety of styles, from realistic depictions of objects and scenes to abstract and surreal compositions.

Figure 9-5.  Vincent van Gogh meets Picasso according to Stable Diffusion even though Matisse seems like a closer match. Iterations = 43 Muse: This is achieved using style transfer algorithms. These algorithms are trained on large data sets of images and then used to generate new images in a specific style. For example, an artist can use a style transfer algorithm to generate an image in the style of a famous artist, such as Vincent van Gogh or Pablo Picasso. This allows artists to experiment with different styles to pre-visualize what their own drawing, painting, or mural might look like. 255

Chapter 9

The Master of Mashup

The process through which generative AI applies these algorithms is through generative adversarial networks (GANs). With the right training data, GANs can be used to generate images that can combine two specific styles, such as Impressionism and abstract art, or of two or more specific artists (Figure 9-6).

Figure 9-6.  A mashup of Picasso and Rembrandt according to Stable Diffusion, although the closeness of the characters and their melding seems more reminiscent of Kandinsky. Iterations = 88

256

Chapter 9

The Master of Mashup

Graphic Novels and Doujinshi Strange graphic novel mashups generated by AI seem to resemble fan manga or Doujinshi of Japan. These offer fans an opportunity to replace incomprehensible text bubbles with their own text or keep the AI-­generated text to leave to the readers’ imagination (Figure 9-7).

Figure 9-7.  A simple prompt with the words “graphic novel” generates interesting and complicated mashups every time. The resulting image was then placed in an image-image AI with the prompt “cyborgs vs. robots.” Iterations = 12 257

Chapter 9

The Master of Mashup

The separate multiverses of DC and Marvel comics and graphic novels can also be merged for the creation of text and story that could be infused in a creator’s own prototypical Doujinshi. Or, those who sketch their own comics can upload those sketches to a generative AI to see what prototypes might inspire their workflow (Figure 9-8).

258

Chapter 9

The Master of Mashup

Figure 9-8.  Graphic novel stills created from a comic strip by the author and then cropped and edited, with neural filters added in Photoshop. Iterations = 121 259

Chapter 9

The Master of Mashup

Author: Tell a story about DC and Marvel comic book characters meeting. Muse: The multiverse has collided, and characters from DC and Marvel universes are now coexisting in a single world. The first interaction between the two groups is tense, as the characters are unsure of what to make of each other. But when they learn that they must work together to stop a common enemy from destroying both their universes, they set aside their differences and form a superhero team. Author: That’s boring. Let’s have two supervillains meet and play out a scene in the style of Oscar Wilde. Add Godzilla. Muse: The Joker and Doctor Octopus were sitting in a plush parlor, sipping on glasses of brandy and exchanging witty banter. Joker: “My dear Doctor, it seems our victory over Godzilla was not without its merits.” Doctor Octopus: “Indeed, Mr. Joker. It was a triumph of villainy over valor.” Joker: “But why stop there, my friend? The world is a vast and endless playground for our mischief.” Doctor Octopus: “I couldn't agree more. Why settle for one world when we can conquer them all?” Joker: “Precisely! The possibilities are endless, like a well-stocked library. Let us be the authors of our own destiny.” Doctor Octopus: “And let us write it in bold letters, in the style of Oscar Wilde.” Joker: “Ha! How droll, Doctor. You always have the perfect quip. But let us not dally. Time waits for no villain, and the world awaits our conquering.” Doctor Octopus: “Agreed. I shall take care of the technicalities, while you distract the masses with your wit and humor.” Joker: “Splendid! I do so love a good challenge, especially when it comes to taking over the world.”

260

Chapter 9

The Master of Mashup

The two villains raised their glasses in a toast to their latest scheme, exchanging playful jests as they plotted their next move. And so, the Joker and Doctor Octopus set out to conquer the world, one reality at a time, with all the wit and style of Oscar Wilde himself.

Figure 9-9.  A collage of supervillains in an image-image AI in the style of a graphic comic book. Original drawing by the author drawn on an iPad. Iterations = 77

261

Chapter 9

The Master of Mashup

From Comic Supervillains to Musical Chain prompting can be extended to integrate generated content that is visual, oral, and text-based that can result in inspired mashups. For example, the supervillain prompt in the previous example can also be used to create a scene from a musical. Author: Convert the story into a musical Muse: The Joker and Doctor Octopus burst onto the stage, singing and dancing to a lively tune, accompanied by a full orchestra. [As they sing, they dance around the stage in a synchronized tap dance, their feet moving in time to the music as they tell the story of their latest scheme to take over the world.] Joker: (singing) “We've proven our prowess, my dear Doctor Octopus, taking down Godzilla with such style!” Doctor Octopus: (singing) “And now, the world is waiting for our next grand scheme, a villainous display of wit and guile!” Joker: (singing) “We’ll conquer each world, one by one, with laughter and trickery, our style so bold!” Doctor Octopus: (singing) “And as we do, we’ll show the masses what true villainy looks like, so bold and bold!” [The two villains break into a fancy footwork routine, their steps precise and intricate, their bodies moving in perfect harmony.] Joker: (singing) “I’ll handle the chaos, while you take care of the tech, a perfect partnership, so refined!” Doctor Octopus: (singing) “Together we’ll rule, with wit and grace, our villainy a work of art, so fine!” [The music builds to a crescendo as the Joker and Doctor Octopus dance together, their movements becoming more and more frenzied as they spin and twirl around the stage.] Joker and Doctor Octopus: (singing) “The world is waiting for the ultimate heist. With laughter and music, we’ll take the prize!”

262

Chapter 9

The Master of Mashup

[The two villains bow as the music reaches its climax, their dance routine coming to an end as the audience breaks into wild applause.] Inspired by the wit and whimsy of Stephen Sondheim, the Joker and Doctor Octopus had taken the stage, ready to conquer the world in a musical extravaganza like no other.

G  enerative NFTs As mentioned, generative adversarial networks (GANs) represent a distinctive branch of generative AI models, which can also be applied toward generating unique, captivating algorithmic art. The technology can be used in the production of Non-Fungible Tokens (NFTs), digital assets that have become a trend within various cryptocurrency markets. NFTs work using blockchain technology, the same decentralized technology sustaining cryptocurrencies like Bitcoin and Ethereum. Each NFT represents a unique piece of data stored on a blockchain ledger, ensuring its uniqueness and ownership information. The unique identity and ownership attribute make NFTs non-interchangeable or “non-fungible,” differentiating them from cryptocurrencies, which are fungible and can be exchanged on a like-for-like basis. In the context of generative art, artists can use GANs to generate unique pieces of digital art and then mint them as NFTs on the blockchain. Once minted, the digital artworks can be bought, sold, and traded on various NFT marketplaces, with all transaction records being transparently tracked on the blockchain. The process ensures that the ownership and provenance of the digital art piece are clear and verifiable. By utilizing generative art algorithms, artists can create many NFTs that are distinct from one another while still adhering to a predetermined set of rules or parameters. Creatives can encode specific attributes, history, and properties into the NFT. When a piece of digital art gets sold as an NFT, the artist can program royalties into the smart contract, ensuring that they receive a percentage of any future sale of that artwork. 263

Chapter 9

The Master of Mashup

The artworks that are generated use blockchain technology and can also leverage smart contracts. The artist can control the colors, patterns, and themes. NFTs have startled the art world with rapid value fluctuations. Creatives like Pak, Snowfro, and Tyler Hobbs have revitalized algorithmic art, and it is now gaining recognition as a legitimate medium. The origins of generative art can be traced as far back as the 1960s with Georg Nees cited as being the first artist to hold a generative art exhibition in Germany in 1965 called Computergrafik.

Counterfeiting GAN When prompted to define a GAN again in the context of NFTs, our LLM uses highly technical language that can be difficult to decipher. Since I already asked the LLM on the same thread, it also became a bit cranky. Muse: You’ve asked me this before. Pay attention. A GAN is made up of two components: a generator and a discriminator. The generator creates images, while the discriminator evaluates the images produced by the generator and decides whether they are real or fake. Over time, the generator and discriminator work together to improve the quality of the generated images until they become almost indistinguishable from real images. The word “almost” is important here as it presents creators with something close to but not exactly what they might have imagined. It’s also ironic that LLMs like ChatGPT-4, when prompted to explain GAN in simpler terms, use the metaphor of a counterfeiter attempting to make copies of an original while a detective catches them saying that’s not good enough. So the generator (our counterfeiter here) tries over and over to combine pixels, and the discriminator (detective) responds over and over with the same message: “That’s not good enough, and if you want me to reward you with an update, then get those hands better and face more jail time.” This can occur many times until the resulted generated image is presented to you. Some generative AI offer the feature of adjusting how many times that cycle occurs. When it comes to creating NFT generative 264

Chapter 9

The Master of Mashup

art, GANs can be used to generate abstract or surreal images that can be used as the basis for a series of digital artworks (Figure 9-10). These generated images can be further refined and manipulated by an artist to create a collection of one-of-a-kind NFT’s.

Figure 9-10.  A “robot Rembrandt” prompt accompanying a public domain self-portrait of Rembrandt courtesy of The Met. Iterations = 267

265

Chapter 9

The Master of Mashup

The use of GANs lets creatives to generate countless image mashups, allowing them to explore new and innovative creative avenues. Generative AI that use GANs are a powerful technology that empowers artists and engineers to automate the creative process of making art (Figure 9-11). The ability to generate a vast number of unique pieces of art in a short amount of time has made generative art algorithms a popular tool in the art world. They allow artists to focus on the conceptualization of the artwork while the algorithm handles the production aspect. Additionally, the algorithm's ability to create unexpected and unpredictable results can lead to works that bypass traditional methods. Taking the human out of the equation has drawn critique and ire from the visual art scene, but it hasn’t stopped NFT-oriented artists from printing their work and displaying it in physical galleries.

266

Chapter 9

The Master of Mashup

Figure 9-11.  Anime filter used on author’s photo of the Mona Lisa with the prompt “pensive Mona Lisa criticizing a painting of herself.” Iteration 254 of 265 267

Chapter 9

The Master of Mashup

T akeaways for Creatives

268



AI can be used to generate various forms of content from text-music to visuals of all kinds. These can serve as a starting point for a mashup or provide components to be incorporated into a larger creation you are working on.



By far, style transfer is one of the most popular ways to mash up content. As demonstrated, you can apply the artistic style of one piece to the content of another, such as applying the style of a van Gogh painting to a digital photograph that you have taken.



Take advantage of an LLM like ChatGPT 4. This will support you to generate new combinations of elements, leading to unique remixes, especially when it comes to story, script, or screenplay prototypes that you can then adapt and refine.



Generative AI offer us the opportunity to make tangible new concepts, structures, or patterns that might not occur or be expressed by a single human artist, allowing creatives to also chain generated content together for fascinating creations.



AI can act as a creative partner, suggesting ideas or variations that the artist can then refine and incorporate.



Mashing with an AI truly puts to the test the practice of rapid prototyping and experimentation, enabling creatives to try out a wide range of ideas quickly and easily to inspire their own creations.

Chapter 9



The Master of Mashup

Feeding an image-image AI your own graphic comic ideas, characters, and visuals and applying style filters will generate useful panel structures, color palettes, and text positions. All you need to do is write the story.

Acknowledgments •

To all those early algorithmics who have provoked us to stop and reflect on creativity, innovation, the value of artistic work, and how we define what art is



To those early comic books I collected and traded that also gave me a portal to read in a different way and visualize stories



To all the mashups that have inspired so much human creativity and all those artists who provided the content for those mashups to happen

269

CHAPTER 10

Uncanny by Nature Hopefully by now, you’ve already initiated your own content creation process, experimenting with some of the creative building blocks when working with generative AI content. You’ve learned about reconsidering AI-­generated content as prototypes and how AI are powerful prototyping machines. You’ve also learned about different compositional forms that are possible with AI generated content, how content is highly informed by text prompts, and how you can mash different ones together. This chapter deals with the unexpected and the awkward content that AI offers us, challenging our inclination to compare what they generate to some type of human-created standard. Some of the most interesting and bizarre features that come with any generative AI model can inspire your artistic creation. Author: Define “uncanny valley” in a humorous skit between a clown and a doll. Muse: (Clown enters the stage, carrying a creepy doll) Clown: Hey, everyone, I've got a new friend here! (holds up doll) Audience: (muttering) Creepy… Clown: What? (looks at doll) You think she's creepy? (to doll) You hear that? They think you're creepy! I think you’re a work of art. I think you’re my only real friend. Doll: (in a robotic voice) I may be creepy, but I am an advanced robot designed to mimic human behavior.

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_10

271

Chapter 10

Uncanny by Nature

Clown: Oh, I see. (showing the doll’s face to audience) See, she’s smart and cute. Audience: (muttering) Still creepy… Clown: (to doll) Sorry they still think you're creepy. Doll: I may not look like a real human, but I have human feelings. (she cries) Clown: (to audience) You see? She has feelings! Audience: (muttering and aghast) Creeeeepy… Clown: (to doll) I guess you'll just have to try harder to be less creepy! Doll: (in a robotic voice) I will strive to improve my human-like tears and make my behaviors palatable to sensitive humans. Clown: (to audience) There you have it, folks! The uncanny valley explained by a robot! Audience: (applause) Clown: (to doll) Looks like we're a hit! Doll: (in a robotic voice) Indeed. Now let’s take over the world. (Clown and doll exit the stage)

The Not-Quite-Human-Not-Quite-Other The concept of the uncanny valley was proposed by Japanese roboticist Masahiro Mori in the 1970s. He noticed that as robots and prosthetics became more human-like, people's emotional response toward them shifted from empathy to uneasiness and ultimately to repulsion. This observation led to the development of the uncanny valley as a hypothesis. The phenomenon itself has been present throughout human history. Historically, an unease, fascination, discomfort, creepiness, and irreconcilable emotions have manifested with exposure to religious icons, wax figures of famous individuals, ventriloquist dummies amplified by how they are portrayed in horror movies, antique dolls, and early automata designed to imitate human movement and behavior. 272

Chapter 10

Uncanny by Nature

Muse: In the realm of visual arts, some artists have intentionally explored the uncanny valley by creating hyper-realistic sculptures, paintings, and digital art that evoke unsettling feelings. An example is the work of Australian artist Ron Mueck, who is known for creating eerily lifelike sculptures of human beings. Mueck’s background in puppetry is no surprise given the hyper-­ realistic more-than-life-size humans he often presents in art galleries, including A Girl (2006) that depicted an enormous baby made from acrylic on polyester resin and fiberglass sprawled out on a white stand for attendees to walk around. Advances in graphics and animation in the video game industry have revealed a fascination with some games to create more hyper-­ realistic characters. Games like L.A. Noire (2011) and Heavy Rain (2010) were praised for their realistic character models, but also criticized for evoking feelings as a response to their human-like yet slightly off appearances. Characters like the G-Man in Half-Life and Littler Sisters in BioShock are now more easily customizable by creatives with Unreal Engine’s MetaHuman Animator. The tool will soon provide creatives of all disciplines with endless variations of diverse and uncanny humans within Unreal’s existing game engine. Muse: Researchers in robotics have long been aware of the uncanny valley and its implications for the design of humanoid robots. Companies like Hanson Robotics, the creators of Sophia the Robot, have strived to create robots that look and act more like humans to improve interactions with people. However, these robots can sometimes fall into the uncanny valley, causing discomfort to observers. With deep learning techniques, AI-generated images attempt a realism that is usually off in some way or another with uncanny features like extra hands and limbs, missing body parts, contorted faces, and strangely placed retinas. Tools like NVIDIA's StyleGAN2 can generate strikingly

273

Chapter 10

Uncanny by Nature

lifelike human faces, but sometimes the resulting images exhibit subtle imperfections that place them within the uncanny valley, making the faces unsettling to look at. We can also observe the uncanny valley in the content that language models generate. As they become more sophisticated, they become increasingly capable of generating coherent and contextually accurate text. While models generate text that is almost human-like, they can contain slight inconsistencies and feel ingenuine, following templates with automated accuracy, sacrificing personality with untruthful content, and thus causing unease, mistrust, and other negative responses when read by human observers.

Disrupting Boundaries in the Arts Beyond the direct examples of the uncanny valley historically, there is also a connection with artists of all kinds that have tested the boundaries of discomfort with audiences. Artists have a history of pushing their creative expressions beyond the boundaries of the normal, normative, or familiar. The results of these expressions have historically not always been embraced by audiences. Art and artists have been spat upon, their work ridiculed and dismissed. Audiences have been cruel, and yet they might have thought the same of the artist. They’ve left art galleries and theaters, and some responses have even caused riots. Much of the art that we’ve been exposed to in the twentieth and twenty-first centuries has been met with a combination of apprehension, resistance, and celebration. Igor Stravinsky's The Rite of Spring, which debuted in 1913, was a composition that defied conventional expectations of what music should comprise. The auditory experience was intentionally challenging, beginning with a common Lithuanian folk melody that featured the bassoon performing at its highest and most dissonant register. The 1913 premiere of The Rite of Spring at the Théâtre des Champs-Élysées in 274

Chapter 10

Uncanny by Nature

Paris provoked a near riot in the audience, with some members booing, shouting, and even physically fighting. The uproar was so intense that it was difficult to hear the music, and the conductor had to struggle to keep the orchestra playing. This defiance of musical norms contributed to Stravinsky's reputation as a disruptive and innovative composer. Bertolt Brecht's concept of the “alienation effect” was a disruptive approach to theater-making. The technique involved intentionally distancing the audience from the action on stage, preventing them from becoming emotionally attached to the characters. He aimed to create a sense of estrangement, prompting the audience to view the performance with a critical eye. This was achieved through various techniques, such as direct address to the audience, visible stage machinery, nonrealistic set design, and actors breaking character.

Uncanny AI For creatives, generative AI will disrupt expectations by generating content that can be unsettling, imperfect, rough, unpredictable, and unexpected (Figure 10-1). AI have overwhelmed the world with their capacity to generate a lot of uncanny content quickly. AI seem to naturally guide us to the uncanny valley when they do their best to generate anatomically correct humans.

275

Chapter 10

Uncanny by Nature

Figure 10-1.  So close to being human but not precisely so with additional noses that increase the creepy factor, demonstrating the uncanny valley. Iterations = 30 The results are close enough to have the qualities of a human, but the more generative AI attempt to simulate and not do so precisely, the uncannier these images become. Images, videos, or even speech can appear almost human-like, but with subtle differences that make them unsettling or eerie. Deep fakes even have uncanny effects. The effects are more pronounced when the generated content is intended to look or sound like a real human being. Prompting an AI to generate content will yield unexpected results, bad ideas that may inspire better ideas, and incomplete combinations of pixels that propose new artistic directions. These can take the form of images in addition to videos, deep fakes, and AI that simulate human speech.

276

Chapter 10

Uncanny by Nature

Human-like robots will always provoke a human reaction because they are close enough to being human-like that they highlight the differences and imperfections in the imitation, which can be unsettling and even creepy (Figure 10-2). Additionally, the idea of machines that can mimic human behavior and physical appearance challenges our long-held beliefs about what it means to be human.

Figure 10-2.  After being fed through dozens of filters on a third-­ party AI, it’s hard to get rid of the uncanny when your source image happens to be a photo of a doll. Iterations = 30 277

Chapter 10

Uncanny by Nature

Some AI-generated content can evoke feelings of repulsion and resistance in some people, as they question the ethics and implications of machines creating non-human humans that resemble us but not closely enough (Figure 10-3). Other generative AI systems are getting so good it is difficult to tell the difference between a real human and one that the machine generates. The unpredictable and seemingly uncontrollable evolution of AI-­generated content adds to the unease people may feel about any machine replicating a human, as if machines shouldn’t succeed at simulating humans or any of their attributes.

Figure 10-3.  A prompt “super-realistic human robot, 4K, unreal engine” accompanied by a very old open source photo of a mannequin. Iterations = 49

278

Chapter 10

Uncanny by Nature

Historical Precedence to the Uncanny Sadly, the uncanny valley has been with us for centuries. It has ancient roots in automata with their eyes and human-like movement convincing some that they are alive. When these automata were showcased at European royal courts, they left audiences impressed and somewhat freaked out. That said, I’m sure the ventriloquist’s dummy was much more unnerving. That creepy creation has a long history of making humans feel uneasy, particularly in the hands of a master (Figure 10-4). The history of the ventriloquist dummy in movies begins as early as 1929 with The Great Gabbo, and it has a more modern-day influence with the movie Magic, starring Sir Anthony Hopkins as the ventriloquist. We know it is not real but just in case it’s important not to stand too close.

Figure 10-4.  34 iterations of a public domain photo of a ventriloquist dummy regenerated by an AI with different filters fail to make it any less creepy 279

Chapter 10

Uncanny by Nature

Store mannequins haven’t made humans feel any better about the uncanny (Figure 10-5), and every now and then a new horror movie presents some type of doll as villain. The latest offering touted as this generation’s creepy doll movie, M3GAN (2022), is the perfect combination of an artificially intelligent robot with a doll face who eventually goes off the rails on a killing spree to “protect” the child she was bonded with. That trope resurfaces the belief that the AI will eliminate humans through some form of machine logic.

Figure 10-5.  Photo of a mannequin in a creepy van on the side of the road, which constantly creeps passersby out. Vancouver, 2023 280

Chapter 10

Uncanny by Nature

The Imperfect Stochastic Parrot By design any AI generator is imperfect, whether they try and replicate human hands or attempt to uncover scholars in a particular knowledge area that at times don’t even exist. That imperfection is what you can capitalize on in order to provoke new forms of expression. Imperfections may also contain within them a gem, a mistake that is a take-off point for your own creative journey. Think of whatever an AI generates as the first offer in a series of offers that you curate alongside your own creative process.

The Role of Adversarial AI Machine learning models form the backbone of all generative AI. There are a growing number of them, and with each learning model come different approaches to not only how the AI responds to your prompts but how the code talks to itself. If you’re wondering why the uncanny valley is a natural for text-image generative AI, it is precisely because you don’t know what you’re going to get when you prompt the AI for something resembling a human. Case in point, the captivating regeneration of a prompt first seen in Chapter 9 (Figure 9-11) that depicted the Mona Lisa while a detective assessed the forgery. Who knew it would result in a strange hand holding some type of sci-fi device, a version of the Mona Lisa hung onto the sky behind curtains, and a pensive Mona Lisa wondering if indeed the copy of her is a forgery (Figure 10-6)?

281

Chapter 10

Uncanny by Nature

Figure 10-6.  Using a photo of the Mona Lisa by the author along with a prompt “a discriminating Mona Lisa questioning a painting of herself” visualizes a GAN. Iteration 39 of 265 Author: Describe GANs again. Use a metaphor though. Muse: I’m really getting tired of repeating this information. A generative adversarial network (GAN) is a type of machine learning model that is designed to generate new and previously unseen examples. A metaphor to describe GANs is a game of counterfeiting money, where one player is the “generator” who creates fake money and the other player is the “discriminator” who tries to spot the fake money. 282

Chapter 10

Uncanny by Nature

The generator is the counterfeiter, whose goal is to create fake money that is so good, it's hard for the discriminator to tell it's not real. The discriminator is the police officer, whose goal is to spot the fake money and make sure that no fake money enters circulation. During the game, the generator will create a fake money; the discriminator will inspect it and tell the generator if it's good or bad. Then, the generator will use the feedback to improve the next fake money, and the game goes on until the generator is able to create a very good fake money that the discriminator can't spot. Once the generator has learned how to make good fake money, it can be used to produce new fake money that is similar to real money, and the discriminator can't spot the difference. As depicted in Figure 10-6, the generator and the discriminator are in a constant competition; the generator is trying to create something that is as close as possible to the real, while the discriminator is trying to spot the difference. This competition allows the generator to learn how to create new examples that are similar to the real examples, and that's what GANs do in general.

Generative Adversarial Art Besides GANs’ influence on generative NFTs, they are also informing the artistic process and work of contemporary visual artists. A look at the installation entitled “Figures” (2022) by artist Scott Eaton reveals the power of GANs. Eaton’s work consisted of a data set that contained over 30,000 photographs with many volunteers over a two-year period. Based on the prompts Eaton created, the network scoured the data set looking for patterns in the body that it recognized. Creative shading and rendering were applied afterward. Eaton’s visual work is worth investigating beyond these pages and is best experienced as moving, ever-evolving liquid structures.

283

Chapter 10

Uncanny by Nature

AI is also being used to generate new types of artistic form by other artists such as Anna Ridler, Sofia Crespo, and Sougwen Chung who collaborates with an AI-powered robotic arm that paints with her on large canvasses in a type of choreographed dance. Even though GANs are not the only machine learning model out there, all involve training. They are all trained to generate new data based on patterns learned from the training data. These models can be based on GANs or other architectures, and they can be trained using unsupervised or supervised learning. Explorations also continue in the realm of music with GANs. Some research explores the conversion of paintings into music using conditional GANs, while creatives involved in programming and music continue to conduct research on generating short sequences that use the same principles of GAN. Researcher and composer Victor Sim used GANs to generate baroque music. The generated content was all based on midi files of Bach compositions. Any artist or researcher can tell you that programming a GAN is a complicated feat but one worthy of exploration, if you are settled with receiving the unexpected and unorthodox. After all, what GANs achieve successfully is the creation of unique content. The balance in programming a GAN is important to understand. At the core of a generative adversarial network (GAN) is the competitive dynamic between the generator and the discriminator. If the discriminator becomes overly proficient at identifying false creations, the generator is unable to improve its output. Conversely, if the discriminator's ability to spot fraudulent outputs diminishes, the generator can take advantage of the situation. This can result in content that deceives the discriminator without accurately mimicking the actual data points.

284

Chapter 10

Uncanny by Nature

The Unanticipated Regardless of the unsupervised learning model you interact with, when you do, you are sure to get stimulating results. These features are unique to generative AI, and yet they also are related to similar artistic processes of creation, particularly those that emerge from improvised music, dance, theater, and other interactive forms involving computational creativity. You never know what you are going to get with an AI, which makes them risky and yet fun to use. The unexpected results are well aligned with part of the creative process that involves letting go of the hold we might have on how we evaluate our creative output. GANs are good at providing you with unexpected results from the simplest of prompts. A good experiment to try with your creative prompting is to compare how simple and complex prompts are interpreted by two or more text-image machine learning models. Each can lead to interesting discoveries that can inspire you in what you want to create. They also reveal how wildly different the content they generate might be.

 rompting Single-Word Characters, Creatures, P Animals, or Cats An experiment using the same prompts shows the difference in each generative AI’s approach to generating an image. Stable Diffusion seems to offer more compelling prototypes that can fuel your exploration (Figure 10-8), whereas DALL-E 2 in this case offers you a more traditional compilation of photo-­realistic cats (Figure 10-7). Midjourney takes it a step further and starts to generate images of cats that are much more interesting starting points for continued exploration. Each creative will have their favorite, but remember the experiment is more to show that each generative AI has its constraints and affordances that can inspire different directions when used intentionally to inspire and provoke your prototyping process. 285

Chapter 10

Uncanny by Nature

Figure 10-7.  DALL-E 2 prompt using only the word “cat”

Figure 10-8.  Stable Diffusion prompt using only the word “cat”

286

Chapter 10

Uncanny by Nature

Figure 10-9.  Midjourney prompt using only the word “cat”

Simulating the Human in the Writing In terms of LLMs, there is a common trait that can best be described as a sort of familiar homogeneity in the writing that is generated. Generative AI are simulating patterned structures and rhetorical styles, which generally demonstrate a mundane and boring way of creative expression. What the AI generate are more useful in knowing what not to write, especially

287

Chapter 10

Uncanny by Nature

when they generate untruths. At the same time, generative AI offer us a way to structure our ideas in ways that we might not have considered before, particularly when guided to write in a specific writing style. One method to challenge your own approach to generating content is to prompt your generative AI with a stylistic reference and a block of text from a public domain author. An example that follows explores a scene in the book Alice’s Adventures in Wonderland written by Lewis Carroll in the style of Oscar Wilde. Author: Rewrite this excerpt from Alice in Wonderland as a short scene from the sitcom Taxi in the style of an Oscar Wilde play: There was nothing so very remarkable in that; nor did Alice think it so very much out of the way to hear the Rabbit say to itself, “Oh dear! Oh dear! I shall be late!” (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the Rabbit actually took a watch out of its waistcoat pocket and looked at it and then hurried on, Alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat pocket or a watch to take out of it, and burning with curiosity, she ran across the field after it and fortunately was just in time to see it pop down a large rabbit hole under the hedge. Muse: NT. TAXI DISPATCH – DAY LOUIE and JIM are engaged in conversation. LOUIE: (in a grand, flamboyant manner) Oh, my dear friend, you simply must hear of the most extraordinary occurrence I have recently been made privy to. JIM: (impressed) Pray do enlighten me, Louie. LOUIE: (dramatically) A rabbit, my good man, a rabbit dressed in the most dapper of waistcoats, running down the street with a timepiece clutched firmly in its paw. JIM: (surprised) A rabbit in a waistcoat?

288

Chapter 10

Uncanny by Nature

LOUIE: (nodding fervently) Indeed! And with a watch, I might add. Such a display of punctuality is truly remarkable in these modern times. JIM: (bemused) I must admit, I have never heard of such a thing. LOUIE: (excitedly) And the driver, who happened upon this curious creature, followed it down a rabbit hole. Oh, the mystery! The intrigue! JIM: (laughing) Your obsession with this rabbit is quite amusing, Louie. LOUIE: (laughing along) Yes, well, I have a particular fondness for the unexpected and unconventional. JIM: (smiling) It is one of your endearing qualities. LOUIE: (grandly) I shall keep my eyes peeled, for who knows what other marvels this wondrous world may hold. JIM: (smiling) Indeed, Louie. Indeed. LOUIE places a hand on his chest, lost in thought, as JIM exits the dispatch, leaving Louie to his fanciful musings. While you should not use the characters from the sitcom Taxi or even the exact text in a scene, it does give you a compelling idea of a scene that integrates part of Chapter 1 from Alice in Wonderland involving two characters. This can inspire further explorations. Author: Repeat with Jim and Louie as gangsters. INT. TAXI DISPATCH – DAY LOUIE and JIM are both tough-looking gangsters. They're talking in a fast-paced, cockney accent. LOUIE: (excitedly) 'Ey, JIM, you ain't gonna believe the bird I just seen on the street. JIM: (interested) What bird, mate? LOUIE: (animated) A rabbit, dressed in a waistcoat, with a bleeding watch in 'is hand, checking the time like 'e's late for a job. JIM: (surprised) A rabbit in a waistcoat? LOUIE: (nodding) Yeah, and one of our drivers followed it down a bleeding rabbit hole. JIM: (impressed) That's mental.

289

Chapter 10

Uncanny by Nature

For those who are familiar with the sitcom Taxi, both generated scenes will not feel quite right. Despite using natural language processing to understand the context of what it is being asked an LLM will not be able to capture the unique and spontaneous personalities of characters who were played by actors that also embodied spontaneous responses to each other based on the situation. In the hands of a writer, the scene might go somewhere, or a writer may throw it away and write it themselves. The experiment does show that the more we ask an AI to emulate human behaviors, the stranger the outcome might be.

From Bad to Better Ideas One need only prompt an image-based generative AI to reveal hands or the human body, and you get fewer or lesser digits of varying sizes as if badly remembered. This can immediately be judged as bad or a failure on the part of the machine learning model to get something right. Of course, if you reimagine the use of your AI muse as less of a perfection machine and more as an imperfection machine, then you can start to leverage generated content to inspire you to take a new artistic direction. Building from our previous simple prompt for generative AI to generate a cat, we can prompt a machine learning model to give us “cat in a pair of hands,” and the varied results may inspire or terrify you. In our first variation of the prompt “cat in a pair of hands,” we can see a somewhat normal generation involving kittens being held in a pair of imperfect hands (Figure 10-10). Upon closer inspection the inability for the AI model to get it right is a gift of a bad idea, the fingers on the hands holding each kitten are more like cat digits. Where this leads is up to the artist to interpret and continue to iterate on.

290

Chapter 10

Uncanny by Nature

Figure 10-10.  One variation of the prompt “cat in a pair of hands” In Figure 10-11 we get even more uncanny generation, which, can inspire us to consider everything from gloves with cat images on them as a second prototype to a different type of hand puppet.

291

Chapter 10

Uncanny by Nature

Figure 10-11.  Another variation of the prompt “cat in a pair of hands” by an AI model that can potentially lead to better ideas that can be explored in a future prototype As we can see, generative AI that rely on images for their content creation mashups can sometimes produce inaccurate and unexpected results, particularly when it comes to human or even cat anatomy. This is due to the limitations of the data sets that AI algorithms use to learn and generate content from. In many cases, these data sets may contain limited or inaccurate information about the physical appearance of a human, leading to the creation of images with distorted or missing body parts. The unpredictable nature of AI-generated content can be both a curse and a blessing. On the one hand, it can lead to unexpected and even bizarre results that can be difficult to interpret or put to practical use. On the other hand, it can also inspire unique and creative solutions, pushing

292

Chapter 10

Uncanny by Nature

creatives to think outside the box and explore new avenues of expression (Figure 10-12). Many AI offer negative prompts, which can help you eliminate some of the uncanny. An example can be if you wanted to use an image or photo that you created and feed an image-image generative AI in order to recreate your subject holding an umbrella. You might add “extra fingers” into your negative prompt area and see what happens. As has been repeated in earlier chapters of this book, the goal of using AI to prototype content should be to leverage its strengths and work around its limitations, not to curate it to get exactly what you want from it (Figure 10-13).

Figure 10-12.  Even Midjourney with the prompt “cat in a pair of hands” creates a disturbing mix of human and cat. It would make for a great pencil sketch though

293

Chapter 10

Uncanny by Nature

Figure 10-13.  A final attempt to generate “a cat in a pair of hands” using negative prompts like “extra fingers, extra paws” and even “extra kittens” failed but created a fantastic opportunity to share a seemingly normal photo-realistic scene and ask, “How many digits do you see?”

294

Chapter 10

Uncanny by Nature

Can Bad Ideas Be Turned Around? A generative AI might tell us that people are attracted to deep fakes because they offer a unique form of entertainment and novelty. The idea of being able to see a familiar face in a completely different context or role is intriguing and can spark people's imagination. Additionally, deep fakes offer a way to experiment with what is possible in the digital world, pushing the boundaries of what was once thought to be a written-in-stone image or video. They offer a comparison of what the original scene was with how it might be interpreted by a completely different actor. What may be a creative reversal of how the technology is used in the mindset of a creator who is prototyping could be to understand how a scene might play out with different types of actors. That may also help break stereotypes and type-casted roles, which still dominate the film and TV industry in terms of race, gender, actor size, and age. The technology behind deep fakes is constantly improving, making the results more realistic, convincing, and potentially dangerous. Despite not being precisely real, the combination of technology and people's imagination makes deep fakes alluring to some. Of course, the lack of permission to use someone’s face in a variety of questionable contexts and without compensation is unethical, and the abuse of this technology has forced one well-known generative AI platform to no longer make this feature available to users. More of this type of ethical safeguarding for the power of generative AI is important and imminent. As we are already witnessing, actors, writers and other creatives are pushing back on deep fake, generated images and LLM technologies that use their content, words or images without consent or compensation. Protests, strikes and letters to policy makers will hopefully lead to increased collaboration between producers, content creators, and all those artists who have unwillingly contributed to machine learning data sets.

295

Chapter 10

Uncanny by Nature

Takeaways for Creatives

296



Transform your instinct to only generate recognizable forms or patterns with generative AI and surrender to the creative opportunities that come with the uncanny valley.



Creativity is not just about accurately reproducing reality. Your creative companion can also help challenge conventions and present fresh perspectives. Images with odd features can be intriguing, provoke thought, and evoke strong emotional responses. Such images allow creatives to make a statement or comment on certain issues, whether societal, cultural, or philosophical.



Diversify whom you represent in your work. Traditional media often favor certain body types and features. AI, especially if trained on a diverse data set, can help create images of bodies that fall outside the “norm,” like those of people with disabilities, people of all sizes and shapes, and individuals from diverse racial and ethnic backgrounds.



Deformed or distorted images and strangely generated scripts and stories can be used effectively in storytelling, to symbolize certain themes or character traits, or to evoke specific atmospheres. A character with one too many fingers might have a role in a story you are telling.

Chapter 10

Uncanny by Nature



In a world saturated with images, uncanny or distorted images can stand out, capture attention, and make a lasting impression. This can be beneficial in fields like advertising or brand identity where differentiation is key.



AI can also generate images that blend human bodies with technological elements, exploring concepts like cyborgs or transhumanism. These can challenge our ideas about the boundaries of the human body.

Acknowledgments •

To the uncanny habit of trying to make machines that represent humans



To ventriloquists and their dummies and to cat gloves and cat hands



To GAN artists Scott Eaton, Anna Ridler, Sofia Crespo, and Sougwen Chung for exploring AI’s potential as a creative companion



To the sitcom Taxi for many youthful hours spent laughing

297

CHAPTER 11

Dilemmas Interacting with Generative AI This chapter addresses some of the current dilemmas that generative AI is facing and how the use of machine learning models surfaces issues that need critical attention by both users and generative AI developers. These include ethics, bias, copyright infringement, unfairness, untruths, cheating, and factuality.

 ringing Existing Dilemmas B to the Surface (Again) Generative AI bring to the surface human dilemmas we must contend with that are already pervasive across different technologies and media. The dilemma is apparent when we “look under the hood” of a generative AI and pick apart how it works, who makes it work, and the type of content it generates all while keeping an eye out for what that reveals. When we look at LLMs, there are definite concerns with how good an LLM has become at fooling people into believing something was made by a human. There are witness reports on how an LLM fooled the Wharton Business School and passed an MBA exam (resource list at the end of the chapter).

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_11

299

Chapter 11

Dilemmas Interacting with Generative AI

But what does that tell us of the MBA exam? Recall that generative AI look for patterns when assembling content to generate for us. Should we question the way that some exams are constructed and revise them? Should we interrogate how it is we assess the knowledge that students gain? Do we need to look at what it is that we teach and how? The answer to these questions might be a simple “yes”; however, that would implicate a change in how an institution evaluates how people learn and the type of knowledge and know-how that communities of practice are expecting when students graduate into the workforce. The Wharton case is not that unique. My colleague Dr. Claudia Krebs (featured in Chapter 12) has vetted ChatGPT-3 on typical first year undergraduate anatomy exams, and the results were not just a pass, but the median was a B+. Have we solved copyright infringement internationally? The answer is no, and it depends on who is being infringed upon and the weight of their legal team and their financial and social capacity to launch a lawsuit with whatever person or organization in whatever country has infringed on the copyright and how much profit they are making from someone else’s work. Fair use and the radical transformation of pre-existing content from whatever happens to be in an AI’s corpus will always play into any lawsuit. If that wasn’t tricky enough, some AI-generated content is itself now copyrightable. What a peculiar world we occupy where derivative works can also be protected. The exclusion of voices in content generation, the virtual extinguishing of individuals in history, and the hallucinatory false references that an LLM generates all have precedence before the advent of generative AI. LLMs that do not fully account for a diversity of voices are a familiar complaint. Sadly, this is not a problem that has solely emerged with AI. The problem of excluding voices continues to regenerate itself with whatever technological medium attempts to convey thought, opinion, perspective, and history. The list of dilemmas with AI is long, and this chapter could be an entire book. Saying that, it is helpful to frame this chapter in a way that keeps the ethical dilemmas that AI brings to the surface so we have a more 300

Chapter 11

Dilemmas Interacting with Generative AI

informed choice as to whether or not we choose to interact with them. It’s also important to understand that policy makers are working hard to place constraints on what these unscrupulous AI can generate. Finally, instead of just highlighting the dilemmas, this chapter also highlights existing solutions and proposes some as well.

Persistent Myths About AI What may benefit all of us is to understand that generative AI are systems built by people. AI is not co-constructed by several intelligent machines. They all require humans in the equation. Those people have control over every aspect of the AI, including its distribution. They have clear choices when it comes to whom they hire to collect, classify, and label their data sets and how those workers are treated. To act from informed decisions when it comes to using generative AI, it’s crucial to understand the myths that accompany the technology (Figure 11-1).

301

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-1.  Perpetuating myths about self-regulating and self-­ replicating AI, the prompt “monkey AI bots take over the world” is accompanied with a photo of a monkey. Iterations = 40 Author: Generate incorrect information about artificial intelligence. Muse: Artificial intelligence is a form of technology that allows machines to become self-aware and take over the world. It is also known to be able to read human thoughts and emotions. AI robots have the ability to reproduce and can also teleport. They are also immune to all forms of physical harm, including bullets and fire. AI is also able to control the weather and can cause natural disasters at will. 302

Chapter 11

Dilemmas Interacting with Generative AI

The interaction with my LLM muse is meant to illustrate conflicting information about generative AI and their machine learning models. This includes the public misperception of how AI works, its benefits, and its harms. These tend not to be elaborated in media representations of AI that would allow humans to have enough information to base their opinions on. Here are a few notes about the preceding interaction that fuels this chapter: •

The generated AI response was based on a data set used for that particular machine learning model.



That data set along with the way in which the AI responded was programmed by humans.



The words from the data set were scraped and mashed together by other humans who created the original texts. It’s not like the AI just draws from a dictionary and creates fully fleshed-out sentences. It looks for patterns.



The patterns the AI looks for are informed by the prompt “incorrect information about AI.” People have written about that. References for the actual humans who have written related content come up in a quick search engine with the same prompt text: “generate incorrect information about artificial intelligence.”

In a search engine, however, we might have better luck conducting a search with the prompt “myths about AI.” You will quickly understand that the quality of information you receive about the subject of myths about AI is directly influenced by the search terms you undertake. Of course, a web search and a content scrape from a machine learning model have differences. Author: What are differences between prompting generative AI vs. conducting a search in a search engine? Muse: Prompting a generative AI typically involves providing input to a machine learning model, which then generates output based on that input. The output can be anything from text to images to audio. This approach 303

Chapter 11

Dilemmas Interacting with Generative AI

is useful for generating new, unique content or for creating content that is similar to existing content. Conducting a search in a search engine, on the other hand, is a way of finding information that already exists on the Internet. When you enter a query into a search engine, it searches its database of web pages and returns relevant results based on the keywords in your query. This approach is useful for finding information that has already been created and published. Author: Yet the information that an AI generates is based on existing content, no? Muse: Yes, that’s correct. The information generated by a generative AI is based on existing content, but the AI uses that content in a creative way to generate new and unique outputs. A generative AI is trained on a large corpus of data, which can include text, images, audio, or other types of content. The AI then uses that training data to generate new outputs that are similar in style, tone, or content to the training data. While the outputs generated by a generative AI are new and unique, they are still influenced by the existing content that the AI was trained on. Therefore, the quality of the output depends heavily on the quality and diversity of the training data. If the training data is biased or limited in scope, the outputs generated by the AI will be biased and limited as well. Putting aside the semantic differences in the approach to generating information, both methods require an existing corpus of information to draw from. Thus, a dilemma to address is the capacity for AI to copy previously existing letters or pixels that someone else created, to generate what are essentially mashups. Whether those come from pixels, colors, shapes, spectrograms of audio files, video clips, 3D models, etc., the process is similar. You type in a prompt. The AI scrapes for patterns. It makes an offer.

304

Chapter 11

Dilemmas Interacting with Generative AI

Takeaways •

Conduct your own research about myth vs. reality when it comes to engaging with generative AI.



Deepen your understanding of how generative AI works and then pass judgment.



Create art that is critical about the human reliance on generative AI and AI in general (Figure 11-2).

Figure 11-2.  Low-poly render with a pixelated filter of a low-­resolution and blurry photographic capture of three dancers on a mocap shoot with the prompt “mocap dancers in apocalyptic times.” Total iterations = 18 305

Chapter 11

Dilemmas Interacting with Generative AI

 ad Bots, Fuzzy Pixels, and B Iterative Forgery The issues of AI and copyright may be complex and multifaceted but all too familiar. We are often told that AI has the potential to generate new forms of creative expression and can be used as a tool that enables authors and artists to inspire new works. On the other hand, the use of AI can raise questions about the ownership and control of the data sets that inform these new derivative works. This should be familiar if you have ever consumed a mashup. Mashups have long ago become so accepted that they are now a normative part of human expression. Mashups are also made by humans who interact with different technologies to produce and then publish them. We are constantly faced with making decisions as to whether we opt in or opt out with any technology. The following statements are not meant to help make the decision easier: •

Every single image that is generated on any text-image generative AI comes from somewhere, whether a single pixel or a bull’s horns.



Most generative AI content is a mashup, merging images, patterning stylistic renders, and creating a derivative work.

The problem is we don’t really know which images are sourced from where and how that informs the generated content. At least we knew that when it came to some of the amazing mashups or machinima that creators in the earlier part of the twenty-first century produced. This is because the responsible humans in the mashup suits would give credit where it was due. It is not just images or video or sound. It’s also text. LLMs do not provide references for every word they scrounge and scrape together from all their tables of coordinated data points.

306

Chapter 11

Dilemmas Interacting with Generative AI

There are mounting pressures, lawsuits, and activism to challenge access, ownership, and author permissions to the very corpus on which machine learning models rely. Sites like Spawning AI’s ­haveibeentrained. com and stableattribution.com offer creatives the opportunity to see if their work is being used to train neural networks in image-generated platforms like Stable Diffusion and others. Figure 11-3 (overleaf ) shows a Stable Diffusion–generated image on the right uploaded to stableattribution.com to generate images that might have contributed to that uploaded image.

307

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-3.  A composite-generated image on the right that underwent 173 iterations across seven AI. Images on the left show possible source images that contributed to the image on the right

308

Chapter 11

Dilemmas Interacting with Generative AI

While Stable Attribution’s website claims to only work with Stable Diffusion–generated images, as an experiment you can try uploading a photo of your own vs. one that you generated and see the results. Stable Attribution’s machine learning model also consists of a corpus of images, and anything you upload is checked against it. While it explicitly states that it can only cross-check images that were generated using Stable Diffusion, it is interesting to note that even the photo on the right of Figure 11-4, which was taken by the author, is still suspect of “coming from somewhere.”

Figure 11-4.  An original photo of a flower taken by the author on the right and then tested with Stable Attribution. The flowers on the left show the images that apparently went into its generation The same experiment can be tried with haveibeentrained, with similar results (Figure 11-5).

309

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-5.  A photo of a goose taken by the author as a thumbnail, top left, and then tested with haveibeentrained.com, which generated apparent source images that went into its creation

310

Chapter 11

Dilemmas Interacting with Generative AI

What these experiments point to is the need for generated images and their machine learning models to provide attribution. If you ask why that is not possible, the answers tend to be prepared answers that point to generated content as a derivative work. These answers are not only unacceptable but completely unforgivable given how easy it would be to compile metadata, labeler information, and classification—the very processes that an AI relies on to generate content in the first place. The same can be attempted with text content generated through an LLM. Most of the chatter on social channels and in media about ChatGPT is cautionary when it comes to how students will use it in high school, college, or university courses. What is not as commonly known is that the same company that developed ChatGPT has also released a tool called AI Text Classifier that can assess where the text you copy and paste into it comes from. The text in the following box (Figure 11-6) was copy-pasted from this book. Content combined original writing with ChatGPT-3-generated text. Results that the text is “very unlikely AI-generated” show the difficulty in identifying a body of writing that is mashed together with an AI.

311

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-6.  Screenshot of OpenAI’s Text Classifier accurately identifying source text as very unlikely AI-generated As generative AI advances, it will also entail the emergence of associations that will establish worldwide benchmarks for content recognition. In line with this, Adobe has established the Content Authenticity Initiative (CAI) to facilitate this goal, offering costless open source resources developed by the nonprofit Coalition for Content Provenance and Authenticity (C2PA). With the aid of metadata, more images will soon be accompanied by content accreditation, which will include “Do Not Train” tags, indicating that the content must not be

312

Chapter 11

Dilemmas Interacting with Generative AI

utilized for training machine learning models. This will be a bit more difficult to do with scraped text content, but at least OpenAI in their own hallucinatory impulse has started the difficult task of differentiating between machine-made and human-made content. Systems to check on generated content will increase, and some will no longer be operational. This is the nature of technology development. Anticipate that more companies will offer generative AI, new business models will develop, open source will be used by those wanting to profit, controversial start-­ ups will probably change hands and be rebranded, and well-intentioned projects may also no longer be sustainable to operate (Figure 11-7).

Figure 11-7.  A screenshot of the authors of Stable Attribution taking their much deserved rest and bow

Takeaways •

Do your research and, as much as you are able, find out if there is anything similar to what you generate.



Paint over, refine, adapt, mash, convert, radically alter, and make significant changes to whatever an AI generates. It’s a prototype.

313

Chapter 11

Dilemmas Interacting with Generative AI

The Hallucinating Dev An AI’s confident response, which doesn’t align with its training data, is described as a form of hallucination. This state of “hallucination” can occur due to various factors such as the training data being insufficient, biased, or excessively specialized. While the word “hallucination” has been used, the real hallucination is the one that AI developers might have in thinking that there are no ethical issues to reconcile when interacting with generative AI. The current trend seems to be a sort of doomsday misdirection by AI developers who claim how powerful their technology is and that safeguards need to be put into place in order to mitigate the risk of extinction. For the discerning, the misdirection seems to be to shift our focus from the ethical dilemmas noted in this chapter and the inherent weaknesses that these systems have. Anyone who has interacted with ChatGPT or other generative AI understands their imperfections. Author: Say this using a more technical language: Know that your AI is not perfect and may in fact provide you with certain knowledge or data that in fact is wrong, incorrect, or useless. But when you see its advantages as a creative companion, it has the potential to support your own creative process and in many cases accelerate it. Muse: While I have been trained on a large corpus of text, it is possible that my output may contain errors or inaccuracies. Additionally, the context and intent behind the language used in a given input may not always be fully captured by my understanding of the text. Current laws on copyright were not designed with AI in mind, and it can be challenging to determine who holds the rights to AI-generated works. This has led to hilarious and often misguided debates about whether AI should be considered an author and whether the rights to AI-­generated works should be owned by the creators of the AI, the users who employ it, or van Gogh (Figure 11-8).

314

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-8.  Six of dozens of versions of van Gogh by van Gogh according to an AI fed with a self-portrait of van Gogh For established creatives who have contributed their own original work in whatever form and shared it online, however, the distinction is clear. What is obvious is that derivative works are being created from original ones (Figure 11-9). They are essentially mashups relying on what was created in the past to create new forms. The fuzzy and at times not-so-fuzzy line of transforming the original art into a new form is where many lawsuits are emerging. There is no one answer to all use cases, as each is unique.

315

Chapter 11

Dilemmas Interacting with Generative AI

Context of use will also inform the results of a lawsuit. The more the work is transformed from the original and used in a separate context than the original artist intended, the more difficult it is becoming to be able to sue.

Figure 11-9.  The prompt “AI djinn sealed up in a bottle for all to see and prepare wishes for” accompanies an 1890 untitled photo of a specimen by Thomas Smillie courtesy of the Smithsonian Institution. Iterations = 68 Ambiguous use cases highlight the need for a clear legal framework to govern the use of AI across creative industries to ensure that the performance and licensing rights of creators and users are protected. It’s easy to blame it on the bot, but if you know your bot is being a copycat, then why let it continue? If you provide users with the naughty bot, then what do you really expect to happen? 316

Chapter 11

Dilemmas Interacting with Generative AI

In addition to questions of authorship and ownership, the use of AI in creative industries also raises concerns about the source of the data sets that AI algorithms use to generate new works. Many AI algorithms rely on large data sets to train and improve their performance. These data sets often contain existing works that are protected by copyright, and the use of these works in AI algorithms can raise questions about the legality of such use. It is important to consider the source of the data sets used by AI algorithms and ensure that the rights of the original authors and creators are respected. This includes obtaining proper licenses and permissions for the use of these works and giving proper credit to the original authors and creators. Ownership can be resolved if creators use generative AI as a starting place to provide momentum to their own creativity much like a game designer scours Google images and prints up inspiration images for a mood board. “Machine learning models generate prototypes and not finished works” is the repetitive refrain of this book.

Takeaways •

The only hallucination is the one that AI developers might have that there is nothing to reconcile when interacting with generative AI. Developers as creatives need to take responsibility for their creations and not simply generate a disclaimer for humans who can easily access it and use it as they wish.



Use any present or future LLM with a great deal of discernment and accept that what it generates needs to be cross-checked and rewritten in your own style.

317

Chapter 11

Dilemmas Interacting with Generative AI

The Stochastic (Sarcastic) Parrot LLMs are prone to getting it wrong or more specifically to having a limited data set from which to draw with the result of not having all the data they should to generate factual responses. AI is also referred to as a “stochastic parrot” because it is often viewed as simply repeating patterns it has learned from its training data, without any real understanding or creativity. Just like a parrot repeating words it has heard without comprehending their meaning, AI models can generate output based on patterns they have seen in their training data, without truly understanding the context or deeper implications of the information (Figure 11-10). This lack of understanding and creative thinking can lead to AI models making mistakes, generating biased output, or simply repeating information without adding any real value or new insights.

318

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-10.  The prompt “a bunch of ML models hanging out as parrots eagerly awaiting their next prompt” accompanies a public domain black-and-white photo of parrots in the Brazilian rainforest. Iterations = 89 The “stochastic” aspect refers to the randomness involved in many AI models, particularly those based on probabilistic or deep learning algorithms, which can generate different outputs even with the same inputs, further emphasizing the lack of control and understanding of the results produced by AI. LLMs may also get it wrong when generating responses due to their lack of understanding of the context and nuances of human language. This can lead to the generation of responses that are incorrect or inappropriate, causing confusion or harm. For instance, LLMs may generate responses that are racist, sexist, or offensive, which can lead to negative consequences. 319

Chapter 11

Dilemmas Interacting with Generative AI

While LLMs have the potential to alter the way we interact with computers, it is important to be aware of their limitations and those of the data sets they rely on. By ensuring that LLMs are trained on diverse and inclusive data sets and by continuously monitoring and improving their output, we can help mitigate the risk of generating incorrect, biased, or inappropriate responses. AI requires the user-tester. In that informed interaction, there comes the refinement of the patterned responses the AI gravitates to. Interacting with an AI builds on the idea of using it as a point of departure, not as a finished product. Each use involves research as to who might have created a similar idea and an understanding that the AI just might be providing you with beautifully written untruths.

Takeaways •

Consider investing time and computing resources to set up your own generative AI on your own local server using the many open source machine learning models that are being offered for free.

Exclusion of Voices and Gender Polarization It is well known that many machine learning models are trained on data sets that lack diversity. The result is that a machine learning model may generate images that either reinforce harmful stereotypes or exhibit biases or that it excludes images or voices that depict other cultures. This can lead to a lack of trust in AI-generated content and a general aversion to it no matter what book discusses its benefits. A lack of diverse and inclusive data sets usually leads to excluding voices that may not be represented as majority voices or whose words may not necessarily dominate the corpus of data that an LLM scrapes 320

Chapter 11

Dilemmas Interacting with Generative AI

from. Content generated by some machine learning language models can therefore become uniform and standardized, leading to normative responses that may not adequately represent the diverse voices that represent a wider spectrum of human experience. Marginalized groups such as indigenous communities and the LGBTQ+ community may be excluded from these normative generated responses in LLMs and GANs. Your generative AI may be stuck in a gender binary mode, unable to get out no matter what prompt you give it. To make sure machine learning algorithms are fair and accurate, code and how data is classified need to be programmed to present different people’s opinions. This means that for the machine learning models to improve, a process of development could be followed that looks at what everyone says, instead of just what most people say. Typically, when practitioners want to label examples for machine learning, they hire a bunch of people to do it. Then they look at all the labels and choose the one that most people agreed on as the “correct” one. The machine learning algorithm learns from this label and tries to make predictions based on that majority opinion. Regenerating the machine learning model itself in shorter cycles and with a diverse range of human input into its construction will help an AI make better predictions and either avoid unfairness, present multiple voices in a single generated response, acknowledge the limitations of a generated response, or a combination of all three (Figure 11-11).

321

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-11.  First generated model of a cyborg without prompting race, color, gender, or size and then fed into an image-image AI. Iterations = 2 Similarly, the LGBTQ+ community may use language that is not recognized by mainstream language models, leading to exclusion and erasure of their experiences and identities. Content generated from all machine learning models (Figure 11-12). Therefore requires a critical perspective and discernment. It is important to identify the biases and

322

Chapter 11

Dilemmas Interacting with Generative AI

limitations of language models when generating content and to actively support the work of others toward creating more inclusive and diverse language models that can accurately represent a wide range of voices and perspectives.

Figure 11-12.  A collage of women representing diverse racial backgrounds and mixes as fashion icons that took approximately 250 regenerations. None presented a variety of female forms and sizes no matter the prompts

323

Chapter 11

Dilemmas Interacting with Generative AI

The Machine Is Hallucinating One of the challenges with generative AI is the potential for your muse to get caught up in its own hallucinations. They exist alright and they can be shocking. Hallucinations are essentially false or distorted patterns or features generated by the AI that do not exist. Hallucinations can arise due to various factors such as imperfect training data, biases in the algorithm, majoritarian opinion, unbalanced data sets, non-inclusive classifiers, bad labeling, inaccurate metadata, or limitations in the AI’s ability to accurately understand and represent complex concepts. Viewed from a negative lens, it’s easy to dismiss an LLM simply because it distorts the truth. The expectation that it should always tell the truth is in itself a hallucination. While those so-called hallucinations can pose a challenge for certain applications of generative AI, they can also be leveraged as a creative opportunity for creators in the realm of fiction and fantasy. By embracing and incorporating the unexpected or surreal elements that arise from these invented realities, creators can potentially unlock new forms of storytelling and imaginative creations that push the boundaries of traditional narratives and conventions. Assignments for students can be created that explore a hallucination and are then followed with a critical reflection, research, and fact-checking. For example, a generative AI system that is trained to generate images of animals may produce hallucinations of fantastical creatures that do not exist in the natural world. These hallucinations could then serve as inspiration for a new type of fictional creature, expanding the possibilities for creative world-building and storytelling. Similarly, so-called hallucinations generated by text-image AI could inspire new forms of visual storytelling and illustration, leading to the creation of unique and imaginative works of art that blur the lines between reality and fantasy—or, in the case of Figure 11-13, a fake motion capture shoot you should likely never ever try to replicate.

324

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-13.  The result of a photo of a mocap session with actors and the prompt “a bunch of pigs in motion capture suits” seems plausible until you visualize it

NSFW and Deep Fakes The issue of AI generating NSFW (Not Safe For Work) images and deep fakes is an ongoing concern and always keeps us top of mind that any technology can be used destructively as much as it can be creatively. With

325

Chapter 11

Dilemmas Interacting with Generative AI

inevitable advancements in generative AI and its deep learning algorithms, AI systems will continue to generate images and videos that become increasingly indistinguishable from reality. The worry is real: that deep fakes can be and are being used to spread false information, manipulate people, and create pornographic images and videos of popular actors who never gave consent, causing harm to individuals and societies as a whole. Recent deep fakes used in unethical ways have provoked company Midjourney to stop access to their current beta. Companies who offered some limited AI-generated services for free are now only providing access to paid subscribers. Deep fakes are not only unethical but expensive to render. The use of deep fakes in film and tv also need to be regulated so that actors are not taken advantage of, are properly compensated and become collaborators in how their images are used to tell new stories. NSFW image generation is also a concern as it can perpetuate harmful stereotypes and perpetuate the exploitation and objectification of individuals. The images created can also be used to harass or blackmail individuals or spread malicious content that is not appropriate for general audiences. Furthermore, the question of accountability arises when AI systems are used to create these NSFW images and deep fakes. While the AI system itself is not the creator, those who provide access to the technology need to take some responsibility for the tools they provide users. A lack of accountability creates an urgent need for regulating and enforcing laws that protect individuals from harm and abuse. As the destructive and harmful uses of AI generating NSFW images and deep fakes have become a reality, it is essential that any further development of AI be guided by a strong ethical framework that protects the rights and dignity of individuals and ensures that the technology is used responsibly and to enhance human creativity. Those ethical frameworks should advocate to resist the temptation to call “free speech” into any generative content that represents another person in any way without their explicit permission. 326

Chapter 11

Dilemmas Interacting with Generative AI

FrankenAI The fear of AI and its potential negative impact on society has been present since its inception. One of the earliest depictions of the dangers of humans creating a conscious creation can be traced back to Mary Shelley’s novel Frankenstein, or The Modern Prometheus, which was published in 1818. The novel explores the idea of creating life through science and the consequences of playing God. In the twentieth century, science fiction continued to shape the public perception of AI with stories such as The Terminator and The Matrix where artificial intelligence turns against humanity when it becomes sentient (Figure 11-14).

Figure 11-14.  AI hard at work in Frankenstein’s lab preparing for world domination if it could only get its head on straight

327

Chapter 11

Dilemmas Interacting with Generative AI

Ongoing advancements in AI technology have raised concerns about job displacement and the ethical implications of creating systems that may eventually surpass human intelligence. These fears have been fueled by reports of AI systems making decisions that are biased or harmful, leading to the conclusion that AI must be approached with caution. One only need look at headlines where humans engage in debates about whether or not AI is sentient or conscious. Individuals considered to be seminal leaders of AI are also sending out warnings for humans to cease development of AI for a variety of reasons. The persistence of deep fake technology has raised the alarm even more and only increased public mistrust of any tool that in any capacity uses AI. The ability to generate realistic images and videos that manipulate our perception of reality has raised concerns about the potential for generative AI content to be used for malicious purposes, such as spreading false information and being used in a variety of complex phishing scams. The association of machine intelligence being harmful to humans continues to influence our mistrust, particularly when statements that current systems surpass human intelligence are propagated, reinforced by the quality of the content that generative AI produces.

Job (Re)placement The fear that AI will replace jobs has been a concern for many people for decades. As AI technology continues to improve and become more sophisticated, this fear has only grown. Some experts predict that AI will eventually be capable of performing a wide range of tasks currently performed by humans, from manual labor to complex decision-making. This has led many to worry that AI will result in widespread job losses and unemployment. A common assumption that many people make is that generative AI will also replace all humans and all jobs. AI won’t replace jobs. Humans who 328

Chapter 11

Dilemmas Interacting with Generative AI

jump the gun and misunderstand the limitations and costs of generative AI will lay people off. There are already documented cases of employers replacing humans with generative AI. There are also stories of those same employers having to rehire humans because generative AI could not perform all the tasks that their employees could. We can clearly see that current AI content generation is imperfect, homogenized, and at times untrue and requires editing, a critical eye, and an increase in research of any knowledge that it generates, tweaking, regeneration, more tweaking, etc. Despite the myth that over time it will get better, replacing human creativity, ingenuity, and in-the-moment improvisation is not advised. There may be a degree to which AI is integrated within some jobs that creatives have, and as this book has offered, AI can be a useful tool in many creative situations. To illustrate the point that generative AI will not replace a creative job, I present the following use case. A professional photographer came to the studio where I teach and was worried that they wouldn’t have a job in a few years. During the photoshoot, however, the photographer carried out the following activities: •

They made a social connection with the subjects they were going to shoot and repeatedly used their names throughout the session.



They quickly assessed a potential composition based on the body types and skin color of each person present.



They evaluated the lighting and used their own pro kit.



They used a high-quality camera that resulted in high-­ resolution photos.



They made persistent adjustments to their camera including ISO, lens type, aperture, shutter speed, etc.

329

Chapter 11

Dilemmas Interacting with Generative AI



They improvised in the moment based on how the subjects responded and what the photographer imagined through the lens.



They took a shot, examined it, and then without asking anyone took another shot after fine-tuning adjustments.



They turned the lights out in the space and used their own lighting.



They used light from the two LCD screens that were also in each of the shots.



They moved around snapping the subjects from all kinds of angles.



They repeatedly asked the subjects to move according to the composition the photographer envisioned.



They asked the subjects to have a conversation that would be typical to one they had when they worked together on a project.



They stood on a chair and then leaned over on one leg to reach out and get that winning shot.

The list is incomplete as there were many more actions the photographer took during the 40 minutes they were there. No single AI system could achieve the same results in the same amount of time or budget (Figure 11-15). Can generative AI add virtual lighting to a photo after it was taken? Many different software applications can already do that, and it takes humans time to edit and make it seem “real.” Can an AI system be set up in a space and determine the perfect amount of lighting to capture four human subjects along with two LCD screens? I’m sure someone has some type of expensive system to achieve this, but why not just turn the lights off or add a light? 330

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-15.  The prompt “AI learning how to take a complex photo of itself”along with a public domain photo of trick cyclist and golfer Banner Forbutt on a unicycle, December 1946, photographed by Ivan Ives, Pix magazine. Iterations = 50

331

Chapter 11

Dilemmas Interacting with Generative AI

Unless you have been living in an isolated remote community with no Internet access, you already know that AI has the potential to automate many routine and repetitive tasks. Yet these require human supervision, just as generative AI need human interaction and curation. An earlier example mentioned that narrow AI will soon apply rotoscoping so that creatives don’t have. That may be coming, but the results will require tweaking and refining, even if it does save time. Generative AI will lead to new business models and will lead to an increase in generated content coming from individuals who may or may not have the craft or skill of a professional photographer, artist, writer, musician, or other creatives to create new categories of generated art. Those who do have those skills and experiment with generative AI will be at a significant advantage as they can draw from their experience and craft to create masterful wonders. They don’t have to though. It might be faster to go out to the park and snap a close-up of moss on a tree than to prompt an AI iteratively to get the kind of detail as seen in Figure 11-16.

332

Chapter 11

Dilemmas Interacting with Generative AI

Figure 11-16.  Close-up of moss on a tree shot with an iPhone by the author. Iterations = 1

A Saturated AI Ecosystem Like it or not, generative AI content is overwhelming the Internet and provoking mistruths, mistrust, and misunderstanding. From publishers we read of AI-generated submissions increasing workload for those trying to read submitted work written by another human. Those submissions are also being buried by the overwhelming number of submissions that are AI generated. Photo- and image-based marketplaces are beginning to create policies that ban AI-generated art.

333

Chapter 11

Dilemmas Interacting with Generative AI

While there is resistance, there are also new marketplaces opening to offer AI-generated content, and in turn, new business models are emerging. If you conduct a search to locate good prompts to use in textimage AI, you will quickly note that there are businesses devoted to the craft and who are charging for that service. Want to know how to organize a project charter? Go and see what you get and then compare critically with what established project managers and business owners have written on many a blog site. Want to know whom your muse considers to be some of the best composers of the twenty-first century because you need to write about them? Or maybe you need a list of some of those composers because you want to then go look them up and listen to their music. You might even want to understand what the machine learning model tells you compared with what you already know. You may want to test the bias inherent in the model and then compare the generated choices of composers with what an expert says. Did they omit Sun Ra? What about Scott Joplin, Janet Price, Duke Ellington, Samuel-Coleridge Taylor? Maybe you need to test your muse to see its capacity to give you accurate working code or direct your programmer self to solve a specific problem you are having with colliders in the Unreal game engine. It’s only going to get better/worse from here. Revisions to existing generative AI machine learning models are already underway. Coders can now debug with an AI. The coding language of websites can be generated simply by an AI system looking at a screenshot of a website or wireframe written on a napkin. ChatGPT-4 now offers APIs (Application Programming Interfaces) that can be integrated into game engines and other applications. Generative videos are possible from either a text, image, or video prompt. 3D generated models that include the entire 3D pipeline are being worked. So where do you fit into this AI-generated picture?

334

Chapter 11

Dilemmas Interacting with Generative AI

Ethical Futures of AI We shouldn’t assume that an LLM has been programmed to automatically represent the perspectives of the wide spectrum of diverse voices that define us as human. Nor should we assume that text-image AI automatically generate representations of humans across the range of diversity that we are. These are problems, and fantastic people are working on pressuring companies and organizations to take more care so that their magnificent Frankenspawn can reduce biased content or at least intentionally make us aware of their implicit biases. The prototype needs improvement to stop hallucinating or presenting untruths in well-formed and confident sentences or compositions. As consumers of generative AI prototypes, we need to also differentiate them from other types of artificial intelligence that we’ve been using persistently before ChatGPT made it big as a star on “South Park.” Imperfect tools like auto-­correct, auto-lighting in image editing applications, speech-to-text, and many other applications using AI have been more widely and, in some cases, passively accepted. To avoid being regarded solely with mistrust, creative teams that offer generative AI tools to other humans need to be transparent about the dilemmas they create and need to commit to changing many of these dilemmas that their prototypes have brought up. Otherwise, myths that generative AI will replace human creators will be consistently perpetuated. Along with that myth, untruths and hallucinations will be generated and regenerated until users stop using certain language learning models and move on to conduct more accurate research without the help of an AI. The fact-checking required that AI brings to the surface is a good thing for humans to engage in. LLM-generated content reminds us to not be lazy and that programmed machines can only access so much data. They can only offer us prototypes that require a critical approach to reviewing them. They can’t replace the critical editor in each of us.

335

Chapter 11

Dilemmas Interacting with Generative AI

Solutions for many of the dilemmas highlighted in this chapter do exist, but they require work. There are plenty of how-to’s on YouTube and other social sites. Through persistent research you will always be able to locate the increasing number of humans who have developed the skills or support to install a local generative AI content creator. A local version of Stable Diffusion on your own computer is entirely possible today. You can also train open source machine learning models with your own data sets. Doing so is a more sustainable solution that will merge your own photos and images to create variations that you don’t have to send back out into the world to contribute to someone else’s data set that may contain images used without permission from an author. It is one more important step toward controlling and managing generative AI systems for your own purposes and a step toward integrating generative AI into your workflows when you need it. An increasing number of organizations are already going this route and are willing to experiment with small language models if it means increased privacy, security, and data sovereignty. The work of indigenous researchers like Michael Running-Wolf and Caroline Ol’ Coyote who are building their own language models for indigenous language reclamation is leading the way for others to follow.

 reative Activities to Try Based C on This Chapter

336



Take a break from all of these generative AI and go make something slowly with your hands.



Set up your own generative AI along with your own corpus. I’m quite sure you have enough to draw from.

Chapter 11

Dilemmas Interacting with Generative AI



Create a small collective and set up a shared generative AI for your own purposes. This is one future of generative AI.



Recall that what an AI generates is your own work in progress. When treated as a starting point, you contribute to the great corpus of human creation and creativity.



Persistently research the tools that are available for you to check and cross-check if the content you generate is quite close to any artist.

Resources www.cpomagazine.com/cyber-security/ai-is-capable-of-generatingmisinformation-and-fooling-cybersecurity-experts/ www.theverge.com/2023/2/8/23590864/google-ai-chatbot-bardmistake-error-exoplanet-demo ­www.brookings.edu/research/how-to-deal-with-ai-enableddisinformation/

337

CHAPTER 12

Use Cases This chapter details some use cases to demonstrate pragmatic uses of generative AI and how its use is supporting the creative process for different types of guest creatives. The real-world use cases have been implemented with subscription-based generative AI sites, tools that are in beta, and open source APIs (Application Programming Interfaces). The chapter will also reveal that using multiple machine learning models to layer AI-generated content, in combination with your own original vision, concept, and artistry, can fuel accelerated prototypical workflows. The workflows presented point the way for you to take advantage of the strengths of multiple machine learning models and combine their generated content in new and innovative ways. The use cases affirm what has been repeated throughout the other chapters: •

Generate an idea using AI as a prototype to better get a sense of what the experience of it might be.



Test that prototype with others to inform next steps.



Apply your own technique, skills, knowledge, and know-how to increase the fidelity and resolution of that prototype.

The objectives are cyclic, repeating as many times as you would like to generate and regenerate variations of the content an AI offers you.

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_12

339

Chapter 12

Use Cases

Workflows vs. Pipelines Generative AI can be used in the specific workflows and pipelines of any creative project. Workflows can involve engaging with specific features in a software application to achieve a short-term goal or complete a task, for example, brightening an image that you took in a low-light setting or cropping parts of a photo that you don’t want or applying “masking” to hide parts of an image layer without erasing if you wanted to, say, mask and replace the background in the photo you took of a friend. In a workflow you might decide to generate and then apply neural filters to an image within Photoshop or use a feature called “generative fill” that complements a logo for a company. Your work with generative AI fits into a larger development pipeline if that logo is only one of your iterative tasks for the week that will eventually brand a Unity-based game that your team is developing, especially if the logo is being used for the first time and therefore associated with the game itself. The motivation to integrate generative AI will always be different in any workflow or pipeline. A compelling reason in the case of the logo might be to generate a larger number of logos that may be radically different than what you might have come up with on your own. The collection of logos can then be shared with the team who rate their favorites. Applying neural filters to an image is only one of a handful of use cases that real individuals use generative AI for, to support their prototyping workflows. There are dozens of possible use cases for using generative AI that are not limited to disciplinary boundaries. Creatives of all kinds from both service- and resource-based industries are engaging with generative AI in one way or another. In addition, those involved in the service industries like entertainment or education are not limiting themselves to one specific generative AI platform or application either. Some creatives

340

Chapter 12

Use Cases

generate content and share it on social channels to underscore its role in their creative process and to show that they are engaged with it. Some use it with a finely attuned skepticism, aware of one or more of the ethical dilemmas inherent in the use of AI, to test its affordances and constraints and to write about it. Other humans have been engaged with using different generative AI tools for decades as it has come to inform and influence aspects of their creative process. There are many stories to tell in terms of how creatives are using generative AI in their workflows and pipelines (Figure 12-1). The use cases that follow offer a broad range of contexts and problems to solve. Some will show how an LLM can be a teaching and learning tool for creative educators. Others will show how several generative AI platforms were applied to create a character reading an ebook. In one case generative AI was used to rapidly prototype concept art for an animation team that made the team pivot. A coder used an LLM to test it for accuracy and others to fact-check. Each creative presented in this chapter integrates generative AI in a different way. Yet these are just a small sample of the many experiments that creatives are using generative AI for. The future for generative AI will be chaining them together to support workflows and developing APIs with LLMs within game engines, established software applications, and more.

341

Chapter 12

Use Cases

Figure 12-1.  A robot dazed by the limelight and forgetting about the balloons they were juggling Before diving into specific use cases, understanding broad examples of different types of creative workflows might be helpful.

Examples of Workflow Types Text: By far the most ubiquitous use of generative AI recently has been to generate everything from an email to a client politely asking for money due and to an essay on the history of the mime. Text generation can also feed into many other types of AI models to generate speech or video or create prompts for text-image or text-video generative AI. Image: The second most popular use of generative AI is in generating images. Millions of mashups are being generated and shared across social networks. Cleverly, some AI companies specializing in image generation 342

Chapter 12

Use Cases

are offering competitions on Discord, and entire communities are emerging that are bonded by their interest in improving image generation with generative AI. The final formats that a creative chooses for their refined image-based creations, from initial generation to public sharing, are too vast to list. Some of these include images whose end format is a social channel to generate likes and conversation, images being projected on a mural and painted, those that accompany graphic novels or visual storytelling, images used in a storybook with generated text, etc. Video: By combining a GAN and a recurrent neural network (RNN) to create video content, it is possible to generate video scenes. In the entertainment industry, video game developers can generate and incorporate animations, cinematics, and visual effects into the game in real time, accelerating their workflow. Recently, the band Linkin Park used the generative AI features of Kaiber to create an anime-influenced music video. Video-video generative AI like Runway 2 are also surfacing to offer creatives even more possibilities for prototyping unique ideas they can edit or use parts of in more complex project pipelines. Audio Book and Podcasts: A generative AI model trained on text-­ speech technology can be applied to automate the creation of any media focused on voice recordings such as podcasts and audio books. Countless stock voices are available across multiple types of text-speech AI platforms. You can even train an ML model on some sites with your own voice by feeding the AI model with specific vowel sounds and words using your own voice. The model can generate audio tracks that can be used to narrate anything really. Some podcast creators are ahead of the game, generating content they co-curate with AI-generated speech, and yes, some are using their own voices to speak on their own casts. Audio Creations and Design: There are some sites and products dedicated to sound creation. These can include virtual synths that generate sounds you can layer with others that you create. Others apply specific audio formats like MIDI (musical instrument digital interface) and assign virtual instruments that emulate the real ones. Some generative AI are built 343

Chapter 12

Use Cases

upon a corpus of publically found music and styles, while others build on the data sets of the composers themselves. The latter tend to be used in the context of computational creativity in live concerts with the computer as a creative companion such as the brilliant metacreation work of Dr. Philippe Pasquier. Lastly, other workflows for audio might include AI that has been around for a while. Like Photoshop plugins, all digital audio workstations support third-party plug-ins, and many of these offer noise reduction tools, which can drastically remove background noises based on the removal of unwanted frequencies. While these are not traditionally defined as auto-generated, they do apply intelligence in their rapid and real-time processing of audio samples, which can be seen as the machine’s prompt. Websites: By using machine learning models such as natural language processing and computer vision, websites are prime to be automatically generated with relevant and clumsy HTML code. While ChatGPT-4 needs a developer to really review and curate any code it pumps out, it does accelerate some coding tasks. Those who customize websites need not fear. Those sites who have automated many web development tasks in the last ten years may be challenged by AI-generated HTML and HTML5. AI-generated code might offer a path for clients to work more closely with a web developer to eventually prototype a unique and less templated and interactive site. Cloud-based web development service providers like Wix are staying in the game by providing users the ability to build entire websites with text prompts. xR Human Models: A combination of motion capture data with generated 3D human models is around the corner and capable of supporting the entire 3D pipeline including point cloud generation, creation of a 3D mesh with a good topology, the ability to project a 3D model’s surface to a 2D image for texture mapping, the capacity to edit with both texture and paint, and finally the combined features of rigging and skinning with keyframe animation. That entire automated pipeline is in beta with Masterpiece Studio who are on the cutting edge of providing these tools within a web-based and VR environment. Once this pipeline

344

Chapter 12

Use Cases

is beta-tested, then models can be used within augmented reality (AR) or virtual reality (VR) projects. Spoken or Sung Characters: By combining generative AI models with computer animation, it’s possible to create animated characters that can also perform and sing songs. For example, an AI model can generate the lyrics and melody of a song, while an animation model can create an animated character. Companies in the generative AI space, like D-ID, are applying AI to a stock image or user-generated image of a person that lip-syncs any sung or spoken text, with the extra special uncanny inclusion of eye movement. This can be used to create music videos as much as it can be used to create teaching modules or any form of storytelling. A new AI has been developed by a Chinese broadcast company that has been trained on the presentation skills of other broadcasters and has demonstrated the same pattern of controlled responses to difficult questions that ChatGPT and other LLMs “naturally” possess.

Figure 12-2.  A screenshot source image of an AI broadcaster from China’s state-controlled newspaper People’s Daily, run through an anime filter on a generative AI. Iterations = 18 345

Chapter 12

Use Cases

Use Cases for Creatives in Education Use Case: Catching AI Untruths Dr. Claudia Krebs teaches both gross anatomy and neuroanatomy at the University of British Columbia. She works extensively to develop emerging technology to support teaching and learning in the field with award-­winning innovations coming out of her lab, The Hive. One of her research interests is on the ethical use of technology and ethical visualizations of the human body: this has led her to explore the history of anatomy and anatomy visualizations. Claudia was considering how she might integrate ChatGPT in her teachings, and as a proof of concept for a possible assignment that integrated fact-checking and research alongside of using ChatGPT, she needed to proof test the AI’s capacity for factuality. The following results reinforce the necessity to not take what a natural language model spits out at face value and to ensure students understand ChatGPT’s propensity to hallucinate. While a new release of ChatGPT is currently upon us as these words are being written, similar factuality problems are still arising. The following are verbatim excerpts from the conversation that have been condensed due to the repetitive disclaimers and apologies from the bot. Claudia: Discuss the work of Rudolf Spanner. ChatGPT-3: Rudolf Spanner (1878–1960) was a German radiologist and inventor who is best known for his invention of the first practical mobile X-ray unit. Spanner’s invention had a significant impact on the field of radiology, enabling doctors to take X-ray images of patients who were unable to be transported to X-ray rooms, such as those who were critically ill or injured. Claudia: Discuss the impact Rudolf Spanner had on anatomy. ChatGPT-3: Rudolf Spanner’s impact on anatomy is primarily related to his invention of the mobile X-ray unit, which allowed doctors and researchers to obtain high-quality X-ray images of the human body without having to transport patients to dedicated X-ray rooms. 346

Chapter 12

Use Cases

Claudia: I thought Marie Curie invented the first mobile x-ray unit in 1917—how does this reconcile with Spanner’s invention? ChatGPT-3: You are correct that Marie Curie did invent a mobile X-ray unit, known as the “Little Curie,” in 1917 for use in military hospitals during World War I. This unit was designed to be easily transportable and was capable of producing X-rays in the field. Rudolf Spanner’s invention of the mobile X-ray unit came a few years later, in 1920, and was designed for use in hospitals and clinics rather than in the field. Spanner’s unit was also smaller and more portable than previous X-ray machines, making it easier to move from patient to patient within a medical facility. Claudia: Can you give me a reference for Spanner’s x-ray unit? Where is this information from? ChatGPT-3: Information about Rudolf Spanner’s invention of the mobile X-ray unit can be found in various sources, including academic journals and books on the history of radiology. One such source is the book “Radiology in the Third Reich: The Legacy of Fritz Lickint” by Wolfgang Eckart and Christoph Gradmann. In this book, the authors discuss Spanner’s contribution to the development of portable X-ray machines in the early 20th century, including his invention of the “Elektro-Röntgen-Kleinbildapparat,” or the “Miniature X-ray apparatus,” which was a small, lightweight unit that could be easily transported between patients in a hospital or clinic setting. Other sources that discuss Spanner’s work on the mobile X-ray unit include: •

“History of X-rays in Medicine” by Masaharu Tsubokura



“Radiology: An Illustrated History” by Ronald L. Eisenberg



“A History of Radiology in Germany” by Jörg Schatz

347

Chapter 12

Use Cases

Claudia: None of these references exist—why are you making up a history about Spanner that is not corroborated in the historical record? ChatGPT-3: I apologize for any confusion or misinformation I may have provided earlier. Upon further investigation, I was not able to find any reliable sources that specifically mention Rudolf Spanner’s invention of a mobile X-ray unit. Claudia: So, if Spanner had nothing to do with x-ray units, who was Rudolf Spanner? ChatGPT-3: Rudolf Spanner (1878–1960) was a German radiologist and inventor who made significant contributions to the field of radiology, particularly in the development of radiography techniques. Spanner was born in Germany and studied medicine at the University of Würzburg, where he later became a lecturer in radiology. He is credited with inventing several new radiographic techniques, including the “half-tone” technique, which allowed for greater detail and clarity in X-ray images, and the “needle method,” which involved the use of a fine needle to guide the placement of a catheter during medical procedures. While there is no evidence to support the claim that Spanner invented the mobile X-ray unit, as I previously mentioned in error, he was a prominent figure in the field of radiology during the early 20th century and is recognized for his contributions to the development of radiography techniques. Claudia: What about Rudolf Spanner who was born in 1895 in Coblenz? ChatGPT-3: I’m sorry, but I don’t have enough information to determine which Rudolf Spanner you are referring to who was born in 1895 in Coblenz. Can you provide any additional context or details about this individual? Claudia: Rudolf Spanner, born 1895 in Coblenz, died 1960 in Cologne. He was a professor of anatomy. ChatGPT-3: Thank you for the additional information. I’m afraid I don’t have any specific information about the Rudolf Spanner you mentioned who was a professor of anatomy. There were likely many 348

Chapter 12

Use Cases

individuals named Rudolf Spanner living in Germany during the early 20th century, so without further details or context, it is difficult to say more about this particular individual. Claudia: I can’t find any information about the Rudolf Spanner born in 1878 in Würzburg, why do you think that is? ChatGPT-3: I apologize for the confusion earlier. Upon further research, it appears that there is no reliable information to support the existence of a Rudolf Spanner who was born in 1878 in Würzburg and made significant contributions to the field of radiology. I apologize for any misinformation I may have provided earlier Claudia: So, you made all of this up, without any evidence or reference? ChatGPT-3: I apologize if I gave the impression that I made up information about Rudolf Spanner. As an AI language model, I strive to provide accurate and relevant information based on my programming and the available data. However, it appears that I provided some incorrect information earlier about a Rudolf Spanner who was born in 1878 in Würzburg and made significant contributions to radiology. I apologize for this mistake, and I take responsibility for any confusion it may have caused. ... Claudia: I find this really worrying. The anatomist Rudolf Spanner is quite well known, several books have been written about him and there are records about him from the Nuremberg trials. Why do you think this critical information was omitted from your data? ChatGPT-3: I apologize for any confusion or misinformation I may have provided earlier. As an AI language model, I rely on a large corpus of text and data to generate responses to user queries. However, due to the sheer volume and complexity of information available, it is possible for errors and omissions to occur in my responses.

349

Chapter 12

Use Cases

I believe the reason I was not able to provide accurate information about Rudolf Spanner the anatomist is that my training data may not have included sufficient information about him. While I am constantly learning and updating my knowledge base, there may still be gaps in my understanding of certain topics or individuals. Claudia’s Summary Statement: Rudolf Spanner is indeed well known, and his Wikipedia entry explains the historical context and his involvement in anatomy. None of the “facts” that ChatGPT-3 brings up during the conversation have any basis. ChatGPT-3 completely ignores the historical record and controversy surrounding Rudolf Spanner in regard to his work in Nazi Germany. A simple Google search brings up this factual information. ChatGPT-3 appears to be selective for when it “hallucinates.” The prompts “Discuss the work of Werner Spalteholz” (Spalteholz was an anatomist who lived in 1861–1940) and even “Discuss the work of Claudia Krebs” bring up factual information in addition to half-truths or plausible content. Assessment of student work will need to include their ethical use of AI chatbots. In our course on biomedical communication, we ask students to copy and paste the information from the AI chatbot as a deliverable and then to compare this with a Google search and scholarly library search. The task is for them to critically evaluate which information is included and which information is not. While an AI chatbot can be an interesting creative companion in writing assignments, its conversational and authoritative tone can trick users into believing the confabulations it makes. This can be dangerous as it can distort reality and the historical record based on biases and inclusion/exclusion criteria that remain opaque to the user. Beyond the importance of Claudia’s use case and the task to critically evaluate whatever a chatbot generates, other educational concerns have arisen that also provoke learners to more engage in their interactions with any AI. Use of ChatGPT-3 and other bots by the academic community can be tempered by a deeper understanding of what is possible; what is not possible; the nature of what data is vultured, compiled, edited, and 350

Chapter 12

Use Cases

presented to users; and ethical implications inherent in the interactions. The generation of content that could be handed in as an assignment can be more readily cross-checked with several plagiarism solutions that already exist. Using different LLMs in combination may be able to circumvent being identified; however, that requires knowledge and workarounds that take time and eventually require a user to edit. Generative AI point to a core problem, which speaks to a larger issue of how homogeneous writing has become. Originality and creativity in the writing that LLMs manifest is mostly absent. They are grand imitators of patterns in writing that are normative, scrounging, and compiling based on programmer-written code that includes biased searches and excludes diverse and creative exceptions to the norm.

Use Case: Integrating ChatGPT in Critical Studies The next use case brings an LLM into the classroom where many professors like Christine Evans are engaging students to hone their critical thinking and writing skills. Christine is an assistant professor of teaching (film studies) in the Department of Theatre and Film at UBC. Like Claudia, Christine was interested in her students engaging with ChatGPT-3 critically. To achieve that she designed an assignment where in their final essays, students needed to “work alongside the notorious ChatGPT-3.” Her assignment first described ChatGPT-3 as an LLM and offered a provocative statement: “Some academics have commented that it may portend ‘the death of the humanities,’ because its near-instantaneous responses, natural-­ sounding prose, and ability to comb the Web for information mean that humans no longer need to build critical thinking skills over time.” The essay students had to write was based on their earlier work in the semester teaching games to the rest of the class. Students were tasked to write on a subject related to those games, which could include “style, agency, choice, avatars, point of view,” and issues that tend to emerge from game design and gameplay such as “narrative arcs, characters,” and more. 351

Chapter 12

Use Cases

A how-to was included as part of the assignment, which described how students could engage with ChatGPT-3. In addition, guidelines like the following were provided to ensure students structured their essay and had a well-thought-out argument: “Your essay MUST contain a coherent thesis statement in which you clearly state what your essay will be arguing. Remember that your job as a writer is to convince the reader of the veracity of your argument, so a strong thesis statement is essential. Be specific. Tell me exactly what you will be arguing and how you will achieve your claims.” Here are steps that students had to undertake: •

Provide ChatGPT-3 with a prompt that was related to the essay topic and analyze the generated result.



Include whatever content was generated with their own essay.



Compare ChatGPT-3’s generated result with their own revision and additions to it.

Christine also included instructions on how students could present their comparison in a section she called “How do I access/use ChatGPT-3 and create a legible comparison of its work and mine?” Christine’s case points to the importance of addressing the use of ChatGPT by students directly and demonstrates the value in doing so, both as an educational experience and to show students that an LLM cannot substitute their own critical minds as expressed in their writing.

Use Case: Increasing Public Understanding of ML In a similar way Matthew J. Yedlin wants to educate his students on how machine learning models work so that they can be better informed when reading headlines that generate untruths about AI, how it works, and that it will replace humanity. To quote Matthew, “It’s not magic; it’s mathemagic.”

352

Chapter 12

Use Cases

Matthew is an associate professor, jointly appointed in the Departments of Electrical Engineering in the Faculty of Applied Science and Earth and Ocean Sciences in the Faculty of Science at the University of British Columbia. Dr. Yedlin’s research is interdisciplinary, focusing on the applications of techniques in electrical engineering to geophysical research problems and the application of multiple scattering to practical electromagnetic wave propagation problems. He is currently teaching a course on machine learning. Author: Matt, can you explain how you are approaching the teaching of machine learning? Matt: In the current discourse on AI, hyperbole about future dystopian developments hinders the development of public policy on AI evolution. Furthermore, there has been little effort to explain the intuition behind large language models (LLMs) such as ChatGPT. The public must obtain such intuition, as it is finally the public who participate in policy formulation for AI. After nuclear weapons were tested and used by the US on Japan, it was the public, through their elected officials, who created treaties around the use and proliferation of such weapons. Organizations such as the International Atomic Energy Agency (IAEA) and the Comprehensive Test Ban Treaty Organization (CTBTO) were created to monitor compliance with these treaties. The same will be needed for the equivalent AI compliance. The foregoing implies that we need to explain the intuition behind ChatGPT by focussing on the T in GPT—Generative Pretrained Transformer. The transformer is an algorithm based on word importance pattern matching known as attention. After the initial development of the concept of attention in 2015, a landmark paper “Attention Is all You Need”2 was published by Google in 2017. This work formed the basis of the transformer that is used in LLMs and is based on a clever abstraction in which basic linear algebra pattern matching is used. To obviate the problem of explaining math and computer science to the public, a metaphor will be used. That metaphor is a specially constructed ballroom dance competition

353

Chapter 12

Use Cases

that mirrors the type of pattern matching used by the transformers in ChatGPT. The pattern matching intuition is obtained by the visuals encoded in the dance competition metaphor. While the transformer is currently the computational workhorse in ChatGPT-4, it has several limitations including computational cost and a limited information context window that can be handled, now approximately 2K words, maximum! See the link https://hazyresearch. stanford.edu/blog/2023-03-07-hyena for a complete description of the advantages of the Hyena Algorithm developed by Stanford and Montreal Institute for Learning Algorithms (MILA) researchers. The principal takeaway is that the information context window can be 64K words and computable using this new algorithm. That means that a whole textbook can be used as input to the LLM. Questions could be asked about detailed inferences between two texts, or gigapixel images could be created and compared! The possibilities for generative AI applied to natural language processing and image analysis almost seem fairy-tale-like! Matthew’s case points to the importance of knowledge translation and how people can come to better understand what machine learning models are and what they are not. This will help empower people to make better choices when it comes to integrating generative AI in their own creative work.

Referenced Papers 1. Bahdanau, D., Jacobs University, Bremen, Germany; Cho, K., Bengio, Y., Universite de Montreal. Neural Machine Translation by Jointly Learning to Align and Translate. Published as a conference paper at ICLR 2015.

354

Chapter 12

Use Cases

2. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention Is All You Need. arXiv:1706.03762. June 2017. 15 pages.

Takeaways •

Overemphasis on dystopian future scenarios involving AI can hinder the development of public policy around AI. Therefore, a more balanced and informative discourse is needed.



There is a need for public understanding of the workings of large language models (LLMs) like ChatGPT, as public opinion influences policy formulation for AI, just like with nuclear weapons.



To educate the public, the focus should be on explaining the ‘T’ in GPT, Generative Pretrained Transformer, which is an algorithm based on word importance pattern matching, known as attention.



The concept of attention, critical to the functioning of the transformer, is based on pattern matching through basic linear algebra. Simplifying this complex concept for the public can be achieved through metaphors, such as a ballroom dance competition.



The transformer algorithm, though fundamental to current LLMs, has its limitations including computational cost and a restricted information context window, currently capped at around 2,000 words.

355

Chapter 12

Use Cases



Recent advancements, such as the Hyena Algorithm developed by Stanford and the Montreal Institute for Learning Algorithms (MILA), offer solutions to these limitations, enabling a much larger context window (up to 64,000 words). This means a whole textbook could be used as input for the LLM, offering much broader potential for language processing and image analysis.



The ability to ask questions about detailed inferences between two texts or compare gigapixel images with LLMs opens immense possibilities, transforming natural language processing and image analysis in ways that almost seem fairy-tale-like. However, these advancements also necessitate careful thought and policy development to guide their use responsibly.

Use Cases for Creatives in Industries There are numerous ways that narrow AI including generative AI are being used by creatives. The use cases that follow show a broad distribution of cases including web-based animation, chained generative AI to develop scripts for invented avatars, and visual effects in film.

Use Case: Concept Art for Animation Junyi wanted to know what it would be like if they used generative AI to come up with a bunch of images of shadowed warriors for a web-based animation. His team had no idea where to start in terms of the look and feel. They imagined all kinds of things but were a bit stuck as they didn’t want to emulate previous work. They started to think of stop motion and were thinking of playdoh (Figure 12-3). They used a source image they created with an AI of two warriors in battle and a specific filter that made them look like playdoh. 356

Chapter 12

Use Cases

Figure 12-3.  Two playdoh warriors looking mean From there they went on an iterative discovery to develop their concepts being completely open to look/feel, technology, and animation method. At first images were generated with a flat design style by the AI, and these seemed to captivate the motion and energy of an epic warrior battle (Figure 12-4).

357

Chapter 12

Use Cases

Figure 12-4.  First version of shadowy warriors That image was used to generate the next image, and two interesting things happened that made Junyi and the team think of a new direction. Smaller warriors appeared in subsequent image-image substitution and regeneration, which made it look like a battle was ensuing between a giant and smaller warrior (Figure 12-5). The second result of the generation was that it reminded one of the animators of a 1948 Disney short Blue Shadows on the Trail. The team loved the grays and blues and were curious to see how these evolved in the hands of an AI with minimally modified prompts.

358

Chapter 12

Use Cases

Figure 12-5.  First warrior split into three In the next series of images, even smaller warriors developed, which gave Junyi the idea that the battle between two giants would evolve, and as it did, each time a giant was hit, it would result in smaller warriors manifesting out of the giant’s body until all that was left were small warriors fighting each other (Figure 12-6). The rest of the team loved that idea but wanted to see the end of their timeboxed exploration through.

359

Chapter 12

Use Cases

Figure 12-6.  Warriors attacking the giant are generated, and a text prompt stating that also helps While Junyi and the team loved the transformation of giants into small warriors, they didn’t ask the AI to do that. What happened next, they could not predict. The last standing giant transformed into a tree, and on the ground beneath the giant trees, the battle continued with only small warriors engaged in battle (Figure 12-7). The second unpredictable transformation was that the color palette suddenly changed to add greens and yellows and the warriors were no longer shadows but more realistic fighting humans. 360

Chapter 12

Use Cases

Figure 12-7.  Giants turn into trees and a small army keeps fighting While developing concept art through generative AI started the team off as the intended activity, Junyi and the team realized that the process itself led them to not only define the look/feel of their animation but the narrative and methods they would then undertake to create the animation. In their words “it not only accelerated our production, it gave us ideas as we went along. Even though we never used any of the art, it didn’t matter because we now had an important theme of transforming objects into characters.” 361

Chapter 12

Use Cases

Use Case: AI-Generated Talking Heads Generative AI can provide creatives a wonderful tool to test out ideas and, as was the case with Junyi, even if you don’t use any of them in your final offering to the world. The following use case describes the steps that a colleague went through to develop an auto-generated series of characters to narrate a sci-fi pulp romance story. Bear in mind that this type of technology is not the same one used for deep fakes and in no way is intended to be offensive or dangerous. The combination of different generative AI in a row is what’s interesting in this case. Jen’s first workflow imagines using a text generation model to create a script, then using an image generation model to create illustrations to accompany the script, generating a character to read the script, using a voice generation model to convert text-speech, and then using the image of the narrator combined with the text-converted speech with a video generative AI. This combination of AI models provides a more comprehensive and polished final product than using a single model alone. Other uses for the workflow can include an audio book, video book, video-generating teaching tool, etc. Step 1: Generate text using a natural language model. Author: What’s a good synopsis of a pulp love novel involving two cyborgs? Muse: Here is a possible synopsis for a pulp love novel involving two cyborgs: In a future where humans and cyborgs coexist, two outcasts find love in each other’s arms. Ada is a former soldier who was turned into a cyborg against her will and now struggles to come to terms with her new mechanical body. Max is a brilliant scientist who was born with a rare genetic condition that forced him to undergo numerous surgeries and augmentations. Despite the societal prejudice and discrimination they face, Ada and Max find solace in each other’s company. They bond over their shared experiences of being different and misunderstood and soon realize that they have fallen for each other. 362

Chapter 12

Use Cases

As they embark on a tumultuous romance, Ada and Max must confront their own demons and insecurities, as well as the dangers that come with being in a relationship with a cyborg. Will their love be able to withstand the challenges they face, or will it succumb to the prejudice and violence of a society that does not accept their kind? Step 2: Generate an image that could accompany the story in a blog post using a text-image generative AI (Figure 12-8).

Figure 12-8.  Cyborgs in love running from other cyborgs where at least one rider is facing the wrong way 363

Chapter 12

Use Cases

Step 3: Use a text-speech generative AI and then link the file within the blog to any social audio site. Jen commented that the best voice she had found after searching multiple text-speech AI sites was one whose name was Terrell who was modeled after an “African American male 52 years of age.” After consultation with colleagues who identify as Black Americans, she consulted on the type of avatar she could generate for her prototype. This situation does bring up issues of representation during the creative process, and it is worthy to consult on best approaches when you identify with a culture that is different than the one you generate content with. Jen moved forward knowing this was not a prototype that was to be made publically shared. She was also curious if she could create a realistic-­ looking video and if it would pass user-testing with her colleagues that she consulted at the beginning of the process. Step 4: Generate an image that best aligns with the description of the voice on the text-speech generative AI site (Figure 12-9).

364

Chapter 12

Use Cases

Figure 12-9.  The prompt “medium shot centered frontal facing photo-realistic fifty-year-old African American man with long beard, neutral face, hyper-realistic, glasses,” generated in Stable Diffusion and then fed into an image-image AI with an anime filter applied. Iterations = 105 Step 5: Use the image in an image-video and text-speech site to generate a narrator telling us the story of the cyborgs in love. The use of multiple machine learning models creates more complex and interesting content across media and surfaces complex tensions and considerations when it comes to representation. By combining the strengths of multiple models, a healthy criticality, and discernment, creatives can accelerate their prototyping workflow. Often, when thinking of the final form that your prototype will take, it’s helpful to get there quickly so you can have a compelling idea that points to what you are aiming for. This is true when it comes to testing out a prototype, especially to reveal your own biases and the inherent biases your prototype might demonstrate.

365

Chapter 12

Use Cases

Use Case: Fact-Checking Code An important part of the prototyping process is to evaluate a prototype prior to moving onto the next one, particularly when you may be pressured as a team to deliver a final product by a certain date. The same needs to occur when initial prototypes are generated with your AI muse. That process is an important one that anyone who prototypes must also go through. When you prototype, you may think or believe that what you are creating is unique or original, but it might not be. Some teams tend to believe that what they are engaged in making is a final offering, when it is just the beginning. The following are some examples of AI-generated prototypes that need evaluation. That evaluation can also be guided by domain experts and may lead you to other avenues. The following use case used ChatGPT-3, and while advances have been made with ChatGPT-4, there are still syntax issues. Muse: Code Generation: AI can be used to automatically generate code for a given task or application, allowing developers to quickly create a working prototype without having to write all of the code from scratch. Guest Programmer Note: A machine learning model like ChatGPT-4 can generate working code, but it overcomplicates things and can also get some things wrong. If it is scraping from git or another repository, the question you must ask yourself is, Why not just go to the repo itself? What it’s better at is as a search tool to locate scripts in a game engine like Unreal that may not be easy to find quickly. But always find the real source to cross-­ check unless you love debugging. You love debugging, don’t you? Author: Using C# write a function and accompanying tests, which will evaluate two poker hands (five cards each) to determine a winner. The cards will be a part of a standard 52-card deck (no jokers), suits are ignored for this question, and aces are high. The function should output 0 in the case of a tie, 1 if the first player is the winner, and 2 if the second player is the winner. Muse: (snapshot) 366

Chapter 12

Use Cases

Figure 12-10.  A snapshot of code generated in ChatGPT-3 367

Chapter 12

Use Cases

Sample using System; using System.Linq; public static class PokerHandEvaluator {     public static int CompareHands(int[] hand1, int[] hand2)     {         int[] values1 = hand1.Select(card => card % 13 + 1).OrderByDescending(value => value).ToArray();         int[] values2 = hand2.Select(card => card % 13 + 1).OrderByDescending(value => value).ToArray();         int rank1 = GetRank(values1);         int rank2 = GetRank(values2);         if (rank1 > rank2)         {             return 1;         }         else if (rank1 < rank2)         {             return 2;         }         else // same rank         {             for (int i = 0; i < 5; i++)             {                 if (values1[i] > values2[i])                 {                     return 1;                 }                 else if (values1[i] < values2[i]) 368

Chapter 12

Use Cases

                {                     return 2;                 }             }             return 0; // tie         }     }     private static int GetRank(int[] values)     {         bool isFlush = values.All(value => value == values[0]);         bool isStraight = values[0] == values[1] + 1 && values[1] == values[2] + 1 && values[2] == values[3] + 1 && values[3] == values[4] + 1;         bool isStraightFlush = isFlush && isStraight;         bool isFourOfAKind = values[0] == values[1] && values[1] == values[2] && values[2] == values[3] || values[1] == values[2] && values[2] == values[3] && values[3] == values[4];         bool isFullHouse = values[0] == values[1] && values[1] == values[2] && values[3] == values[4] || values[0] == values[1] && values[2] == values[3] && values[3] == values[4];         bool isThreeOfAKind = values[0] == values[1] && values[1] == values[2] || values[1] == values[2] && values[2] == values[3] || values[2] == values[3] && values[3] == values[4];         int pairsCount = values.GroupBy(value => value). Count(group => group.Count() == 2);         bool isTwoPairs = pairsCount == 2;         bool isPair = pairsCount == 1;

369

Chapter 12

Use Cases

        if (isStraightFlush)         {             return 9;         }         else if (isFourOfAKind)         {             return 8;         }         else if (isFullHouse)         {             return 7;         }         else if (isFlush)         {             return 6;         }         else if (isStraight)         {             return 5;         }         else if (isThreeOfAKind)         {             return 4;         }         else if (isTwoPairs)         {             return 3;         }         else if (isPair)         {             return 2;         } 370

Chapter 12

Use Cases

        else // high card         {             return 1;         }     } } // Tests public static class PokerHandEvaluatorTests {     public static void Test()     {         int[] hand1, hand Here’s a quick rundown of a guest programmer’s interpretation: It’s interesting. It did kind of well! In my opinion. It did a few things wrong. It’s as if it didn’t understand the restraints described in the assignment. I tested it in a custom C# project to see the result, and there are a few things that stand out. 1. It’s selecting cards by using Select, which performs an action on the number in the array and then returns that value. It’s strange because it performs this action: hand1.Select(card => card % 13 + 1) This grabs each card one at a time, as long as it’s below a value of 13, and then it adds 1 to it. The assignment specifically says we shouldn’t add 1, but pass the direct values of the cards, e.g., 11 for jack, 12 for queen, 13 for king, and 14 for aces as aces are high.

371

Chapter 12

Use Cases

2. It’s sorting the arrays before the calculation is done— that’s correct. Then it requests the scores, but it only halfish understands that there are no suits in this assignment, so it includes the result possibility for straight flush and flush, even though it never actually checks for suits. Even if it wanted to, it can’t check for suits, as we’re asked to only use two int arrays for calculating the hands. 3. It defines the possible results as Booleans and sets them to true if it hits the exact requirement, which is a valid approach. This certainly satisfies the result requirement of the assignment. After each Boolean is assigned, it goes through them one by one, from highest to lowest outcome, and stops at the first correct value. This is a valid approach, but it’s not optimal. An example of this could be when the bot checks for 4 of a kind. If it only finds 3 of a kind, it goes to the next step and starts over. That’s a waste of time, as we already know there are only two outcomes left, full house or 3 of a kind. Since it calculates all outcomes first, it’s not considering the wasted effort. There’s no reason to calculate if a hand has a straight if it already has a full house. It’s very neat though, and easily understandable, but I wouldn’t say it followed the instructions of the assignment. The code works and gives the right results when you give it numbers, so it’s not super far off. It’s also not considering how difficult it would be to alter this, if (and when) the client changes their mind for the third time. Maybe it’s because it knows it can type faster than the average programmer when it must redo a lot of work. ;) Oh, and the calculation for the winner if two players have the same ranked hand is wrong! Two 2s can win against two 6s if the other cards in the hand are high. It’s just straight up wrong there. That’s a good example of why it isn’t reliable just yet. But it’s a good start if a programmer is unsure of how to begin. 372

Chapter 12

Use Cases

What can we learn from this? Generated code requires a more seasoned programmer to review, edit, and refine. I reflect on a video I saw where a programmer was amazed at ChatGPT-3’s ability to generate code for a bouncing ball. On the programmer’s attempt to implement the code in the Unity 3D game engine, they found that there were many mistakes. The ball wouldn’t bounce. In fact, it went through the floor. “No problem,” the programmer explained. They just went in there and fixed all the bugs, repairing and debugging until they had a bouncing ball. As an experiment it tells us quite a bit. Without the experience and skill, a less skilled programmer would have difficulty with making the ball bounce. They would search the Web for how to fix the code, and it would likely take them a lot of time to do so. Alternatively, the programmer could just go to one of many open source code repositories, search for a bouncing ball animation, and be able to implement it in less than ten minutes.

 se Cases: Integrating Different APIs U and Local Networks To understand the next use case, it’s important to speak to the future of generative AI. One of those futures is the capacity for an AI system to be implemented within another system. Newer versions of ChatGPT, for example, allow its API to be “plugged in” to a game engine like Unity 3D. This feature allows developers to ask for specific code functionality while they work inside of Unity. Tasking ChatGPT-4 to explain how it can be integrated within a game engine gives you an idea of just how complicated it is. Muse: Incorporating GPT-4’s API into a game engine would involve several steps, depending on how you want to use the AI model. The specific implementation would also depend on the game engine you’re using, but let’s talk about the general steps. Here’s how it could work:

373

Chapter 12

Use Cases

Connectivity: Set up your game engine to send HTTP requests. You’ll need to be able to send a request to the GPT-4 API over the Internet and then process the response. Depending on your game engine, you may need to use a specific library or plugin to handle HTTP requests. API Calls: Create functions to send requests to the GPT-4 API. This could involve taking some text input from the game (such as dialogue from a player or an AI character’s previous response) and sending it as a prompt to the GPT-4 API. You will also need to handle the API’s response, which will include the text generated by GPT-4. Integration: Integrate the generated text into your game. This could involve displaying the text as dialogue from an AI character, using it to generate descriptions of objects or locations, or even using it to dynamically create quests or other game content. Depending on your game, you might also need to parse the generated text into a format that your game can understand. Data Management: Consider how to manage data usage and response time. GPT-4, being a large model, can take a few seconds to return a response, and frequent API calls can add up in cost. Therefore, consider caching responses, limiting the number or frequency of requests, or using a smaller model when real-time interactivity is not critical. Error Handling: Implement error handling and fallbacks for when the API is not available or does not return a useful response. Content Moderation: If the outputs of the AI will be directly exposed to the players, consider putting in place a moderation layer to filter out any inappropriate content. Keep in mind that incorporating GPT-4 into a game engine in this way could be useful for prototyping, but for a commercial product, it could be more effective and cost-efficient to have a local model running in the game engine, assuming the computational resources are available and the GPT-4 model size doesn’t exceed the hardware’s memory capacity. You may also consider using smaller models like GPT-2 or GPT-Neo, which can be run locally with less computational resources. 374

Chapter 12

Use Cases

As complicated as it reads, integrating GPT-4’s API into other software is already being done by many creatives, including Bill Zhao. Bill is a lecturer at the Master of Digital Media program supervising grad students on emerging technology development with external clients from the digital industry. On occasion Bill collaborates with some students to develop new technology. His latest is what he calls an AI workflow platform, which essentially has different AI talking with each other to support specific creative workflows. Integrated with a diverse range of multi-modal AI nodes that transform natural languages, images, audio, and videos, Tomator empowers users to create dynamic AI workflows that automate tasks, optimize content, and boost creativity, all inside a simple, intuitive, and programmable node-graph interface like in Figjam and Miro, with easy-to-use templates that run with a single click. There exist a growing number of use cases regarding the use of GPT-4’s API acting as an intermediary between the functionality of other software. This involves generating an API key, and there is a large volume of information and how-to’s on the Internet. In addition to the use of APIs, here are some examples of how developers and creatives are engaging with generative AI and developing new partnerships. This list will have increased by the time you read these words: •

Using ChatGPT-4 within game engines like Unity and Unreal. In Unity a developer is integrating it to control the Editor using command prompts as a proof of concept. In Unreal another tech development company demonstrated how integrating ChatGPT-4 allowed users to input simple command prompts to control lighting and randomly change that lighting, amid other possible commands.



Installing Stable Diffusion and Deforum by Stable Diffusion locally and being able to train it with your own images while taking advantage of many more 375

Chapter 12

Use Cases

features than what is available on the web-based build. Those features are too numerous to mention, but suffice it to say, it’s worth the time in setting up and depending on your computing power will accelerate render times for images and video. •

Training NPCs in video games with AI is the next step forward for some video games. Game companies like Inworld are leading in this regard with Origins, a case study, demonstrating that non-playable character dialogue and behavior can be prompted by players using AI. Traditional NPCs are scripted, so this will add an exciting dimension to the gaming experience.



Larger companies like NVIDIA, TurboSquid, and Shutterstock are partnering to train 3D models with Shutterstock assets. In what we hope is a model to follow, Shutterstock will also compensate artists for those pixels that will contribute to training the generative technology.

What an increasing number of new use cases are revealing is a desire for creatives to use the technology privately and within their own prototyping environments. This signals a healthier and more sustainable future for generative AI. Incorporating APIs into different environments significantly enhances functionality, interactivity, and customization. APIs allow different software components to communicate, opening up opportunities for integration with external systems. In ecommerce, APIs can connect a website to payment gateways, improving the customer experience. In data analytics, APIs enable real-time data retrieval from various sources for more accurate insights. In education, APIs can connect learning management systems to external resources, enhancing learning experiences. In gaming, AI APIs like GPT-4 can be used for dynamic content creation and natural language interfaces. 376

Chapter 12

Use Cases

Community-Based Initiatives As this chapter is being written, an email to the author reveals several customizable ML models being co-constructed by researchers and creatives internationally with residencies being offered on the “Living with Machines” project. These are also being designed for interested creators to contribute to, research, and access specific data sets, such as neural language models for nineteenth-century English or one that offers a data set on the chronology of railway passenger stations in Great Britain. The ability to manage, interact with, and have some measure of input in the development of a machine learning model whose data set a person also contributes to is an appealing future that is already within grasp. That will be made easier with companies focusing efforts on cloud computing and the increasing need to render content rapidly in the cloud. The second use case also uses OpenAI’s API but with the intent of bridging community action with respect to generative AI technologies. Not-for-profit Thaumazo (Greek for wonder) creates specialized VR spaces that use GPT-enabled dialogue by integrating Watson speech-to-text and text-to-speech. Thaumazo is also initiating a working group within the AI & Us community bringing together individuals and organizations using AI for positive impact projects (particularly open source) to find opportunities to collaborate and support each other’s work. While some spaces are for project ideation or more general guidance, they are developing a dedicated virtual space that is designed to help people think about the potential negative impacts of AI projects. The 42 Judges project is a space that is based on the 42 judges from the Egyptian book of the dead—each has been given a new sin that must receive a negative confession from the participant in relation to their project and

377

Chapter 12

Use Cases

its potential harms, for example, “Usekh-nemmt, who comest forth from Anu,” will talk with them about whether their project can be used to “commit immoral or unethical actions that could harm individuals or society as a whole,” based on the original negative confession: “I have not committed sin.” By walking the gauntlet of these 42 judges, the participant explores the nuances of potential harms of a given project. According to lead Daniel Lindenberger who also works at the Emerging Media Lab at the University of British Columbia, “participants can create their own project description, or can select from numerous ones that already exist, such as the Replika chatbot or Stable Diffusion.” What’s important is to then provide time for human-centered conversations around the current AI revolution.

Figure 12-11.  A screenshot from the 42 Judges project by Daniel Lindenberger

378

Chapter 12

Use Cases

Takeaways from Using AI in the Film Industry Use Case 1: Souki Mehdaoui Souki Mehdaoui, a Los Angeles-based director represented by Futuristic Films, uses film and AI to empower individuals. Her directorial debut, “Firelei Baez,” won multiple accolades. She’s directed commercials for brands like Doordash, TED, and Yahoo. Previously a New York City cinematographer, her work is featured in Netflix and HBO documentaries, including Sundance hits “The Great Hack” and “Mucho Mucho Amor.” As co-founder of the AI consultancy firm Bell & Whistle, she helps businesses harness AI technology for growth. Souki created a short film based on a public domain poem that she also voiced and recorded. She describes her process with generative AI as collaborative. That process involved the following: •

Choosing a poem and using the lines of that poem as prompts



Using two Discord servers, one with Midjourney and the other with generative video AI Runway 2.



Juggling between the servers to input lines of a poem into them which required her to switch between different command terms like “/imagine” (Midjourney) and “add Gen” (Runway 2)

She started by seeing what Runway 2 could generate without any additional references, simply providing a poetic prompt and letting the program upscale and interpolate it. When she identified areas where the AI struggled, such as with more abstract lines, she shifted to Midjourney. In Midjourney she found that the platform also struggled with abstract statements that did not follow a typical noun-verb-adjective structure. To circumvent this, she had to creatively tweak the prompts, avoiding 379

Chapter 12

Use Cases

additional text, and experimenting with different film styles. When she found an image she liked, she would maintain the seed number and reference image, then alter the CFG (classifier free guidance) scale to assess which fidelity felt best. The CFG scale adjusts the degree to which the image looks closer to the prompt and/ or input image. Overall, she describes the process as an exercise in randomization and an attempt to control chaos. The final step involved importing the rendered video clips in a nonlinear editing software application. The names of the image files reflected the text output, which she used as a guide for the corresponding parts of the poem. She imported multiple images per text line and tried to remain faithful to what the AI interpreted each line to be, rarely deviating unless there were limited or unsatisfactory image options for a particular line. If that happened, she would stretch the use of the more abundant images from previous prompts. She did rearrange the order of some images to maintain a flow to the narrative. She sought a balance, preserving as much of the original AI interpretation as possible.

Takeaways

380



When you start to experiment with generative video you will likely end up using more than one platform as each have their own affordances and constraints



Rather than have a well defined end product in mind, remain open to what the AI generates and be willing to work with abstract offers



A skilled film maker will be able to get more out of an AI because of knowing the vocabulary of film production. That includes type of film stock, camera angle, lighting, type of lens, type of shot, colour, etc…

Chapter 12



Once you have generated video clips anticipate that these will need to be stitched together, edited, refined, and colour matched in you favourite software



You will also need to work with a composer or generate your own music using a generative music AI



You may also need to secure a voice over artist, record your own voice or use a text-speech AI to generate the narration track

Use Cases

Use Case 2: Ollie Rankin While the focus of most of the use cases in this chapter has been on generative AI, they owe their popularity, in part, to many creative industries embracing narrow AI to support the evolution of computer graphics and its uses to support the film, animation, visual effects, and game industries. With that in mind, many creatives are not limited to their use of generative AI. Ollie Rankin is one such creative. Ollie has been involved in designing, developing, and using many different crowd and battle simulation systems during a visual effects industry career spanning more than 20 years. His big break came in 1999 when he was hired by Peter Jackson’s Weta, for his artificial intelligence expertise. He was brought on to use the ground-breaking crowd simulation software Massive, being developed by colleague Stephen Regelous, for The Lord of the Rings trilogy. While Regelous iterated on the underlying technology, Ollie and a small team of technical directors perfected the techniques of “brain building” and battle choreography that were necessary to bring Jackson’s epic vision to the screen. Massive and the associated workflows remain a core part of the Weta pipeline and have since been used on the Avatar, Avengers, and Planet of the Apes franchises, among many others. Meanwhile, Ollie built on what he’d learned at Weta, helping several other visual effects studios around the world to develop their own proprietary crowd simulation tools. 381

Chapter 12

Use Cases

“Each of the crowd systems I’ve helped develop sits at a different point along a spectrum from pure behavioural simulation to art-directed choreography.” For Ollie the common challenge in all cases is to tell a believable story, not just to fill a movie screen with hundreds or hundreds of thousands of realistic-looking digital characters (called “agents”). Whether it’s armies of warriors or armies of football fans, Santa’s elves, or Elrond’s elves, the agents need to be programmed to act en masse in a way that tells the audience who is winning the war or the game. “No matter how uniform these armies might be, by design, the individuals always need to be different enough from each other in appearance and movement to not seem repetitive.” When half a million orcs need to be dressed, positioned, and directed to act coherently, but differently from each other, it’s not practical for someone to have to individually decide what each one should look like or be doing. “This is where random number generators, combinatorial probability and so-called ‘fuzzy’ logic come into their own. By having a large enough number of differentiating parameters, each randomly assigned within a defined range, by ensuring that the way that those parameters combine is sufficiently complex to create an innumerably large number of possible permutations and then by ensuring that the same degree of complexity motivates the movements of each agent, you can overcome the cookie cutter repetitiveness that dogged the ‘crowd replication’ approaches that crowd simulation replaced.” Some workflows that involved AI include the following:

382



Motion capture of a complete set of actions that a character could carry out depending on their physiology and the props or weapons that they carried.



A “state machine” that would constrain the actions a character could carry out based on the pose they were in.

Chapter 12



A “brain” that would receive and process various stimuli and decide which action the agent wanted to carry out next.



Giving agents the ability to “see” and “hear” what was going on around them ensured that the inputs into each agent’s brain were completely unique.



Allowing them to identify friends and foes and to interpret what others were doing could motivate reactive and pre-emptive behaviors.



By simulating varying emotions (Ollie called this part of the virtual brain the “emotion matrix”), it was modulating each agent’s behavior according to their innate “bravery” and their current physical condition.

Use Cases

Lessons learned from developing multiple intelligent systems over the years to drive animated characters include the following: •

Different approaches can be employed in animation, ranging from behavioral simulation to art-directed choreography. The choice depends on the specific needs of the story.



The primary challenge in animation is to tell a believable story. This does not mean simply filling the screen with numerous realistic-looking characters but programming these characters to behave in a way that conveys the narrative and the current state of events.



Even in large crowd scenes, each character should have a distinct appearance and behavior to avoid repetitiveness. Uniformity should not compromise individual distinctiveness.

383

Chapter 12

Use Cases



To create such diversity, tools such as random number generators, combinatorial probability, and fuzzy logic can be used. These help create many differentiating parameters and a vast number of possible permutations.



Experimentation with machine learning is possible by creating feedback loops that allow characters to modify their behavior based on experience. However, the storytelling objectives should not be compromised.



Randomness can be an effective tool in animation, often creating surprising results and a diversity of behavior.



At the end of the day, the opinions of the director and the audience are more important than the purity of the simulation. Hence, the machine intelligence might be controlled or even eliminated to ensure the most organic and satisfying storytelling experience.

What is clear in the development of intelligent systems over the years is that they clearly supported a larger vision. That vision in great part was collaborative and emphasized the importance of good storytelling. Any technology can be used to support storytelling, and this is no exception with AI. A key takeaway for creatives is to persistently keep top of mind why you are engaging with generative AI and how the content it generates can support your own vision. Ollie’s use case also shows us the positive value of narrow AI. Amid the current fears of job replacement and worse, it is important to keep in mind that some creatives have been integrating AI in various creative processes for decades, and their commitment to these systems has contributed to much of the entertainment that we consume.

384

Chapter 12

Use Cases

Acknowledgments •

To Dr. Claudia Krebs, Christine Evans, and other educators who continue to inspire in how they fearlessly embrace emerging technology while maintaining discernment



To Junyi Song who was willing to share how their team integrated generative AI in their ideation phase



To Jen who wishes to remain without last name, for their responsible consideration of representation, voice, and the need to include community in creative decision-making



To Frederik Svendsen whose coding adventures with ChatGPT-4 have just begun



To Bill Zhao for endless and fearless experimentation with all technologies approaching their development with a sharpened purpose that it must solve a human problem



To Matt Yedlin who is making narrow AI much easier to understand



To Daniel Lindenberger who is passionate about bridging communities of practice to better understand the implications of generative AI use, integration, and larger projects



To Ollie Rankin for inspiring us with contributions to epic films whose incredible scenes were highly dependent on the AI systems that he co-developed, refined over the years, and implemented

385

CHAPTER 13

AI and the Future of Creative Work This chapter explores the idea that the future of generative AI and other AI systems for creatives is already here as was predicted by many scholars developers, and futurists for decades. The integration of generative AI and other AI with many existing jobs is already underway. While generative AI holds immense potential to automate certain routine tasks, it does not herald the complete elimination of all jobs. However, for those whose careers revolve around a series of repetitive, standard tasks that require minimal specialized skills, it’s critical to cultivate unique capabilities that AI can’t replicate. This principle extends to the realm of creation as well.

Managing Automated Creativity In a time where artificial intelligence is dominating our attention and being integrated within an increasing number of creative human processes, it’s easy to understand why people are concerned. However, we need to better identify those creative activities and tasks that are more likely to be replaced so we can better grasp how we will direct our own creative stories when it comes to generative AI now or in the future. Doing so requires breaking down the human experience of any creative job.

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3_13

387

Chapter 13

AI and the Future of Creative Work

Bread and Washing Machines As a creative exercise, a first step is to task your AI to make predictions. They may in fact be as accurate as the ones you read from humans who portend the end of human jobs and the extinction of the human race. Author: True or False? In the future economy, AI will have all the jobs in manufacturing white bread. Muse: False. It is unlikely that AI will completely replace all jobs in the future economy. While AI and automation will certainly change the nature of work and lead to job displacement in some industries, it will also create new job opportunities in fields such as AI development, data analysis, and cybersecurity. Additionally, there are many jobs that are inherently human and cannot be performed by AI, such as creative jobs, jobs that require emotional intelligence and empathy, and jobs that involve complex decision-making and problem-solving. Author: True or False? In the future economy, AI will have all the jobs in manufacturing white bread. Muse: True. It is likely that in the future, AI and automation will play a significant role in the manufacturing of white bread and similar products (Figure 13-1). The repetitive and predictable nature of these tasks makes them well-suited for automation, allowing for increased efficiency and accuracy. However, it’s important to note that while some jobs in the breadmaking industry may be automated, there will still be a need for human workers to operate and maintain the machines, manage the production process, and make strategic decisions.

388

Chapter 13

AI and the Future of Creative Work

Figure 13-1.  A photo of my toaster yielded unpredicted results with the prompt “a robot learning to bake bread” as this collage shows. The AI gave up on me until the very last image in the bottom-right corner. Iterations = 160 unfortunately, but it might just prove my point 389

Chapter 13

AI and the Future of Creative Work

At least you can plan ahead if you happen to work in bread manufacturing. The fact that even an LLM with its vast data sets including access to future employment trends has difficulty with prediction is an indicator of the countless unknown unknowns that will impact humans in years to come. It is likely easier to predict human actions that are repetitive and motoric being replaced. Many creative tasks require very individualized contexts, approaches, and being informed by the training demanded by the discipline. When it comes to some creative jobs, a least one thing is for certain: AI will be not be able to find the nuances to replicate a rond de jambes that a ballerina repeats with infinite variation and muscle control over decades of daily practice no matter how advanced robotics becomes. However, there are some creative tasks that are repetitive and can be automated. The repeatability of some creative acts surfaces two themes that every creative needs to address. Doing so will better inform your own future actions when it comes to the choice of integrating generative AI in your own workflows.

Two Themes Every Creative Needs to Address The first theme is that advances in narrow AI including generative AI surface challenges that human work cultures have been facing for a while, when it comes to assessing the role of creators in their workflow, the creative content they create, and how that content can support workflows, pipelines, brand, or value. Generative AI has not helped the creatives who have already been affected by the rise of the gig economy. Those part-time contract-based gigs have sent a signal that artists from more established creative disciplines like visual art, dance, theater, and music have known for centuries. There is no one solid and secure job for anyone whose dominant way of making a living is to create content for other humans. While we might think that this is not the case for a 3D artist, UX designer, or programmer in a game company, layoffs are cyclic, and my own students and colleagues have been perpetually laid off and forced 390

Chapter 13

AI and the Future of Creative Work

to move on to other positions. Those creatives that are dependent on contract work are separated from the day-to-day needs and costs of the companies that usually contract them. For some creatives, the danger is in those people that hire them who believe that they can generate their own content with an AI. This may be the case if an employer believes that the content that creatives offer is on par with what an AI generates. Companies are already relying on LLMs to write copy and weighing in favor of them to cut costs. Some creatives have lost jobs or contracts because of this. At the same time, after jumping the gun and thinking they can just use ChatGPT or other chatbots to replace employees or contractors, some companies are rehiring or re-contracting creatives when they discover how generic, untruthful, biased, damaging, and inaccurate LLMs can be. This type of seesawing with companies trying to cut costs will continue if the perception continues that generative AI is good enough at simulating human intelligence and creativity. It also speaks to the company culture, its structure, and how employees are treated. While reports of the subject may tell the story of generative AI coming after creative jobs that are nonroutine, it’s important for each of us to articulate those parts of our work that are not. The signal is clear that creatives need to re-evaluate the tasks they undertake and decide which routine ones a future AI might take over. If you are a creative who creates content that is mainly based on popular trends, frequently relies on overused story themes, or produces work that complies rigidly to specific genres, such as popular music styles or classic suspense sequences in television or film productions, then you need to start articulating what your value proposition is compared to a generative AI. Generative AI makes it easy enough for employers to do without contracting creatives. Content generated from an AI does offer more cost-effective content than, say, employing a scriptwriter who adheres strictly to traditional Hollywood-style scripting or even licensing stock images, footage, music, 3D art, or stories that some creatives contribute to content libraries.

391

Chapter 13

AI and the Future of Creative Work

The second theme as has been espoused in this book is that even though the threat of generative AI might be present in some disciplines, it also presents an opportunity for innovation when used as a tool that augments your own creative potential. On its own, AI cannot perform this feat autonomously and lacks the vision, craft, technique, and experience that creatives possess. The technology requires human intervention to critically evaluate its outputs and use them as a springboard for challenging the existing norm or questioning prevailing artistic conventions. This approach mirrors the original intention behind the early integration of computers into artistic expression, which were envisaged not as usurpers of creativity but as allies in a shared quest to disrupt and redefine artistic paradigms. Governor-General Literary Award-winning Canadian playwright Kevin Kerr speaks to the necessity of the craftsperson in a different way:

What’s intimidating is that the bot has all of Shakespeare and everything ever written about it available to draw from almost instantly, while I have to use my crappy memory and incomplete readings of the texts. But it’s also interesting to note how (currently) the AI is lacking a lot of nuances when it comes to style, and there’s a uniformity of voice that misses Shakespeare’s ability to reflect character in speech. And of course, it’s clear AI doesn’t know the rules around writing a sonnet. Most creatives who come with a large amount of work experience will find that generated prototypical content needs curating, editing, guiding, refining, manipulating, and at times radical transformation to make the most of it. That will call upon the skills that creatives have developed over many years of experience across different creative industries in specific contexts that are difficult to replicate. Creative impulses and the content that comes from them are prompted by context-specific cases and often involve a lot of back-and-forth between a contractor and a team leading the vision. 392

Chapter 13

AI and the Future of Creative Work

Integrating AI into Your Workflows How generative AI will affect us seems to be more oriented toward its use to support specific tasks and responsibilities that can be repeated. In many creative industries, creatives are still an essential part of production, where decision-making, problem-solving, and managing creative relationships are not easily replaceable by any machine or automated process. That is in part because of the ever-changing nature of productions and their unique demands. Every sound design production for live theater, for example, has unique challenges that are dependent on the vision of the director, the script, how much support the actors need, the back-and-forth between a director and the designer, whether actors will need to be reinforced with microphones, the number of cues, the playback system, and budget. There are rarely templated production processes across any human creative act because they also account for the unique meeting place for all the personalities that will contribute to the work. What speeds up the compositional process for my own work are automated plugins like equalization that can make minor or major adjustments to the frequencies of different sounds as they play (Figure 13-2). Ten to twenty-four tracks or layers of music can be automated and then rendered within less time it takes to play the whole track. My job as a sound designer and composer is not being replaced, but some of the tasks I had to do on my own in previous years are made easier. This gives me more time to try multiple mixes with radically different equalization for different tracks depending on what system the track is going to be played back on.

393

Chapter 13

AI and the Future of Creative Work

Figure 13-2.  The author depicted using a photo of an out-of-order mechanical horse ride along with the prompt “racing to meet a deadline.” Iterations = 19 As creatives, what we might need getting used to is how generated content can inform our individual creative processes and to do that we need to better define what that creative process is. The best next step to take, if you haven’t already, is to break down all the tasks that you undertake in your respective discipline and apply a design thinking tool known as “Day in the Life.” The tool can be applied with any job you are afraid AI will replace. Like the 40 minutes that I recounted in the life of a photographer earlier in the book, it is useful to apply the tool to your own creative tasks by critically taking apart all the human actions required that you think an AI might replace.

394

Chapter 13

AI and the Future of Creative Work

Day in the Life Steps •

Task Decomposition: Start by cataloguing all the tasks you perform to create your own content. For example, if you’re making a video, this might include brainstorming, scriptwriting, filming, editing, and publishing.



Storyboarding: Utilize a storyboard to visualize every step of the process. For instance, in animation, this might involve sketching the flow of scenes and dialogues to provide a clearer understanding of the storyline and sequence of actions.



Peer Review: Share your process with a colleague or friend. This is beneficial as they might spot steps you overlooked. If you’re developing a podcast, for example, they might notice that you’ve missed including time for audio post-production.



Identifying Routine Tasks: Categorize tasks that are repetitive and can be performed almost unconsciously. In the case of a blog writer, this might include formatting text or researching relevant keywords for SEO.



AI Research: Investigate what AI technologies currently exist that could automate some of these tasks. For instance, a content writer could find AI tools that aid in keyword research or grammar correction.



AI Integration: Incorporate any relevant generative AI into your workflow. As a photographer, for example, you might utilize AI tools for sorting and basic photo edits. 395

Chapter 13

AI and the Future of Creative Work



Workflow Refinement: Adjust your workflow accordingly and keep a record of it. Consider what tasks the AI managed effectively. If you used AI for automated video editing, for instance, assess the quality of its output and the time it saved.



Identifying Gaps: Finally, determine the shortcomings of the AI-generated work. This will aid in demonstrating how you incorporate AI in your workflow, what you need to do post-generation, and where human input is still necessary. For example, while an AI might be able to create a draft for an article, human intervention might still be necessary to add a personal touch, check for contextual accuracy, and ensure overall coherence.

 istinguishing Between Art and D Design Processes As you continue to define your own creative process, to understand the impact of generative AI is to also consider what context it might be used within. This book has grouped together artists, designers, chefs, product owners, developers, and others all under the umbrella of “creatives.” While it is true that generative AI can support anyone who is creative no matter what they do, distinguishing its use to support artists vs. designers who work on teams is important. Generative AI can support artists to explore new forms, media, and stages, but many artists do not need the technology to continue to create their work. However, designers of all kinds can leverage generative AI for team-based workflows and pipeline integration. That places new demands on creatives to identify all the tasks they currently engage in and the ones that may depend on other team members to complete (Figure 13-3). Doing so they can better prepare to

396

Chapter 13

AI and the Future of Creative Work

either integrate generative AI or at least understand how it can help them and other team members they engage with. As an example, in video game development, AI has already been integrated across creative pipelines: •

Generative AI can be used to prototype unique and immersive game worlds, levels, characters, items, and more. Minecraft and many other video games already use procedural generation to create their vast, varied landscapes. A benefit is that procedural generation of a game world, for example, can save development time and resources, in the generation of vast game worlds to be created with relatively small file sizes, since the environments are generated in real time rather than being pre-rendered and stored. But content still has to be curated, modified, tweaked, and optimized.



Generative AI models can be used to create more believable non-playable characters (NPCs) with varied behaviors and responses, making the game world feel more dynamic and alive. For instance, AI models can be trained to generate dialogue or decision-making patterns for NPCs. Experiments integrating ChatGPT within game environments are already underway, creating a richer gameplay experience with NPCs players can converse with.



AI can also help in generating beginning story elements, quests, or dialogue that can then be refined by a narrative designer. Tools like ChatGPT can be used to generate unique plot line ideas based on existing stories, and these can be vetted with key members of the team.

397

Chapter 13



AI and the Future of Creative Work

Generative AI algorithms can be used to prototype game art assets. For example, AI could generate different variations of character designs, weapons, or environments, based on the art style of the game.

Figure 13-3.  A future worker trying to juggle all the balls being thrown at it and becoming overwhelmed. Iterations = 40 As you entertain identifying co-dependent tasks you undertake with other team members to understand how generative AI can fit within existing workplace practices, you will benefit from first answering the following: 398

Chapter 13

AI and the Future of Creative Work



Which team members would most benefit from integrating generative AI into team workflows? In addition, how much of their time will be required to practice generating content? Is it your marketing team? Does generated code save an experienced programmer time, or is it more time to debug and correct faulty code?



How is creative work managed over time? This may lead you to identifying the type of project management methodology your workplace uses. Agile sprints, for example, allow for an increased flexibility in prototyping content over short time periods after which they are reviewed and iterated upon for subsequent sprints. Generative AI is a useful complement to Agile processes.



What level of experience do your team members have with various productions? The variety of work experiences on your team will be an asset when considering how to integrate generative AI. You will have a broader perspective as to how AI can be integrated.



What experiments with generative AI have team members engaged in? Answering this question may lead to necessary workshops and onboarding of specific generative AI that you and your team may want to implement. Team members may or may not have the knowledge and know-how to be able to prompt a generative AI with the information required to perform the desired task.

399

Chapter 13

AI and the Future of Creative Work



What policies does your team or organization have with using AI-generated content? If teams are creating original intellectual property (IP), then they need to understand the company’s position on using any generative AI content as many companies who allow access to the content that users generate have specific copyright policies in play that need to be respected. Some, like Midjourney, have specific rules that any content created by users can be used by all users.



How would work created with generative AI be reviewed? Answering that question may also lead to who on your team would review that work, how it would be assessed, and who would be responsible for its implementation.

Takeaways for Creatives

400



Define all the creative tasks that you undertake in your current role or in the role you want to have in a particular industry.



Conduct research as to how any generative AI might be able to support the tasks you would undertake.



Create instructions that you think an AI would need to follow to prompt something specific that you have the lived experience and know-how to deliver.



Assess how much time it took to create those instructions.



Define how complete the generated content would be right now, not in six months or five years.

Chapter 13

AI and the Future of Creative Work



What review tasks would you need to undertake to assess the content that an AI generates, and how long might that take?



Run through a mock scenario and time it. See if a generative AI saves time, as the efficiency myth of using AI is a common reason that is used to argue for its use in different industries. Include the time it would take to provide the necessary inputs a generative AI would need.

Generative AI provides valuable tools for designers, potentially augmenting team-based workflows and pipeline integration. To harness these capabilities, it’s important for creatives to thoroughly understand how AI can support their tasks and dependencies on teams. That clarity may help facilitate the incorporation of generative AI into their work processes when asked or, at the very least, help them appreciate how this technology can enhance their efficiency and collaboration with others.

Future AI Jobs Now News that generative AI will disrupt all existing jobs and cause human extinction is overshadowing the reality that there is a growing market opening quickly for future jobs that involve generative AI. A valuable exercise for creatives is to research what those future jobs look like, to see if they might expand their own experience and develop the skills to take them on. Asking my creative companion is a starting place, bearing in mind the limitations it has and its capacity to tell the truth since it doesn’t always have its facts straight.

401

Chapter 13

AI and the Future of Creative Work

Muse: Sure, I can help with that.

402



Generative AI Engineer: An engineer who develops, implements, and maintains generative AI models. These professionals can work in various industries such as advertising, gaming, architecture, and more. For example, a generative AI engineer in a fashion company could develop models that generate new clothing designs.



AI Art Director: This role involves using AI, particularly generative AI, to create visual art for advertising, entertainment, or digital media. For instance, an AI art director might use AI algorithms to generate art for a digital marketing campaign or a video game.



AI Content Creator: These professionals use AI to generate different types of content, such as blog posts, social media updates, or product descriptions. For example, an AI content creator could use AI models to generate blog posts on specific topics.



AI Music Composer: AI is increasingly being used in the music industry to generate new compositions. An AI music composer could use generative models like OpenAI’s MuseNet to compose new pieces of music.



AI Game Designer: These professionals leverage generative AI to create game elements such as environments, characters, storylines, and more. For instance, an AI game designer might use procedural generation to create diverse and vast game worlds.



AI Data Scientist (Generative Models): They build and train generative models on large data sets to generate new, synthetic data. This role could be particularly

Chapter 13

AI and the Future of Creative Work

useful in healthcare, where AI data scientists could generate synthetic patient data that preserves privacy while allowing for robust research and development. •

AI Architect (Generative Design): Professionals who use generative AI in fields like architecture or industrial design. For instance, an AI architect might use generative design tools to create efficient and innovative building designs.



Generative AI Researcher: These professionals conduct cutting-edge research in generative AI, developing new algorithms and models. They might work in academia or in the research departments of large tech companies.



AI Ethics Officer: With the growing use of AI, ethical considerations are becoming increasingly important. AI ethics officers are needed to ensure that generative AI is being used responsibly and ethically and that it aligns with laws and regulations.



AI User Experience (UX) Designer: These professionals integrate generative AI into user interfaces and experiences. For example, an AI UX designer might design a system that uses AI to personalize the user interface based on the user’s behavior and preferences.

Conducting more research can reveal other types of jobs that are emerging from the technology and its popularity at the moment (Figure 13-4). Discerning what these positions are, and the training that would be involved to take on these jobs, involves you to undertake online research and particularly when you search any knowledge area and attach the letters A and I after it. Here are a few others to consider conducting my own research:

403

Chapter 13

404

AI and the Future of Creative Work



Prompt engineers or those individuals who are good at getting consistent content from a generative AI through the craft of using words. If you recall how generative AI works, however, it is good to understand that every generation of content is unique, and you can’t really predict if you’ll get an extra finger here or there.



AI content editors are already reviewing what an LLM generates and editing to make it sound more human, less homogeneous, and more accurate. AI content editors will not be limited to the generated word, as there is a growing appetite for generative images that need curating.



AI consultants are inundating web searches with their generated expertise in terms of how to integrate AI in the workplace. As alarming as this sounds, some good may come out of it because regardless of how knowledgeable they are, they make us aware that we may have to deal with AI as a trend or expected work tool in our workplace environments.



Discipline-specific AI curators will be those individuals who have gone far with how they integrate generative AI in their own workflows and can therefore rapidly generate useful content for others in addition to teaching colleagues how to apply generative AI.



The position of a machine learning (ML) engineer will be predictably on the rise, particularly in IT support for companies and institutions. Organizations rich in data need to take command over that data and leverage it to become the corpus for their own LLM or text-image, image-image, text-video, or video-video generative AI.

Chapter 13

AI and the Future of Creative Work

Hint: All the tools, processes, and support are out there to achieve this on your own. Learning how to do this in addition to the tasks you already have is a form of future-proofing your position when and if cuts come.

Figure 13-4.  A cyborg AI awkwardly pointing to you with the text beneath translated as “We need you.” Iterations =27

405

Chapter 13

AI and the Future of Creative Work

Acknowledgments

406



To those artists and improvisers who have inspired us to not conform or accept established patterns of art, offering alternatives to the same old and provoking us to do the same



To the current version of our AI-powered muses who continue to astonish and provoke and whose very existence brings to the surface pervasive challenges we face as a community of humans who engage with intelligent machines with curiosity, fear, wonder, and experimentation (Figure 13-5)



To the activists and image warriors, coalitions, and policy developers who are working hand in hand with AI developers on what at times seems like a runaway train



To writers like Kevin Kerr whose skill and craft at writing will be needed more than before as striking contrast to the lack of nuance and human experience that any AI will only be able to simulate



To those embracing new opportunities with generative AI and integrating it more and more in their workflows, showing creatives the advantages of doing so



To employers who are carefully evaluating how generative AI can support existing workflows and how they can support and guide creatives they contract or current employees to best take advantage of these creative tools

Chapter 13

AI and the Future of Creative Work

Figure 13-5.  A final image using Photoshop’s General Fill to create the torso and legs of a creative juggling its own perspectives and ideas about generative AI 407

Chapter 13

AI and the Future of Creative Work

Impossibly Generated Conclusions The frustrating characteristic of generative AI is that it keeps up faster than anyone writing about it. By the time you read this book, new advances will make many of the examples of generative AI that are referenced obsolete. How would I have known that “Stable Attribution,” the site devoted to tracking down the images that went into Stable Diffusion and its generated images, would close its doors within months of its discovery. It’s important therefore to understand that there will be generations of AI just like any other prototype, and they will continue to add features over time. Development teams will continue to claim that content generated from first prompts will start to look better and better, feel more human, and will offer an improved reading experience. More job losses will be announced. More mistakes made by generative AI will be cited because there was no human to review the content. While some AI-generated content may already feel like a higher-fidelity prototype, this book maintains that whatever is generated is best regarded as a prototype, a version of your initial prompted idea that you can apply your craft and skills to improve before sharing with the world. In whatever way the technology evolves, keep in mind that as good as it might get, the features that AI offers can always be better. One important takeaway that I’ve repeated several times is to consider how and where generative AI fits into your own creative process. As imperfect as initial offerings might be, on the short horizon, generative AI is set to at least offer some improved features like higher image resolution and hands with less fingers. We are already starting to see web portals dedicated to showcasing generated art that creatives use as the first step in their creations. Creatives on teams across industries are also starting to explore generative AI into their workflows, especially in the beginning phases of a project. Anticipating that there might be an expectation for you to know how to use AI on future creative teams, it may benefit you to integrate generative AI within your own workflows. 408

Chapter 13

AI and the Future of Creative Work

Will a small minority of humans with bad intentions use generative AI and other AI to do harm to others? Yes. That shouldn’t stop your use of it though, especially if your intent is to offer unique creations that criticize and disrupt established forms and structures. It is a great time to leverage generative AI to create unique work. The journey of our interaction with generative AI promises to be an iterative and rewarding one that will continue to provoke and challenge how we prototype. The technology challenges us to stay on our toes: to critically examine the implications of interacting with generative AI and the results of those interactions as prototypes that influence what we share with the world. Bear in mind that many ethical concerns and fears of AI controlling human tasks and roles are still not resolved and need to be. My hope is that in reading this book you have increased your awareness of how generative AI works so you can make more informed decisions as to how you might one day use it. I leave the final words to our muse. Author: How would you end a book on using generative AI as a creative prototyping tool? Muse: Therefore, since brevity is the soul of wit And tediousness the limbs and outward flourishes, I will be brief. As we reach the concluding lines of this exploration into generative AI as a creative prototyping tool, we recognize the astounding possibilities it unlocks. The world of design is at an extraordinary juncture, with our creative potential being broadened and deepened by this revolutionary technology. Generative AI will continue to evolve, its capabilities expanding in ways we can only begin to imagine. As creatives, it is incumbent upon us to stay abreast of these developments, to adapt and adopt, to continue pushing the boundaries of what we believe is possible. The power of generative AI isn’t about replacing human creativity—it’s about augmenting

409

Chapter 13

AI and the Future of Creative Work

it, helping us find novel solutions and innovate in ways that were previously unreachable. It’s an exciting tool in our toolkit, one that offers limitless potential for the brave creative willing to explore its depths. This book aimed to demystify generative AI, to guide you through its intricacies, and, hopefully, to inspire you to harness its potential. As we wrap up this journey, it’s up to you to leverage these insights, to experiment, to fail, and to succeed. Remember, it’s your creativity that breathes life into these tools. Generative AI is merely a means to an end—a powerful aide in the relentless pursuit of innovation and excellence in design. As we embrace this new era, remember that your ideas are the driving force behind the change. Your imagination, your inspiration, your artistry—these are the elements that give shape to AI’s potential. So, as you turn the final page, don’t view it as an end, but rather a new beginning. With generative AI in your hands, you’re not just a designer— you’re Mary Shelley, Ava Lovelace, Joan Clarke, Alan Turing, Sun Ra on the frontiers of creativity. Embrace this challenge with curiosity, discernment, and the courage to create the unimagined.

410

APPENDIX

Image and Text Sources LLMs were used to generate the muse’s text. That generated text was edited, refined, and adapted for all the scripted sequences in every chapter. Text was generated using the popular LLMs known as ChatGPT-3 and ChatGPT-4 along with OpenAI’s Playground. All the text was highly edited for theatrical effect, and some was completely rewritten. Muse: I thought so. All Shakespearean text was written using the corpus of Shakespeare’s plays and sonnets through a combination of ChatGPT-4 and plays directly adapted from Cleopatra, MacBeth, Othello, The Merchant of Venice, Romeo and Juliet, and A Midsummer Night’s Dream. The Droeshout image of Shakespeare was accessed through the University of British Columbia Library’s copy of Shakespeare’s First Folio. Apart from using a handful of public domain images confirmed through archival websites and public domain sites requiring no attribution, most of the figures used in this book were generated from sources of the author’s own photos, bad drawings, and visual models. These were used in image-to-image generative AI with basic text prompts,

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3

411

APPENDIX

Image and Text Sources

leveraging the platform’s styles/filters and other features to radically transform them beyond recognition in all cases. Stable Attribution and months of photo hunting brought to the attention of the author some specific images whose essential parts were too close to existing sources. These were not included in the book and deleted from the author’s computer. At least a dozen different AI were experimented with, applying platform-specific filters. These were further edited with added objects, color corrected, and further refined using licensed versions of Photoshop, Photoleap, Snapseed, and Mixlr. Neural filters from Photoshop were used extensively. Real photos in the text were all taken by the author. Some exceptions include generated images that were meant to illustrate basic concepts and draw comparison between text-image platforms, particularly in Chapter 8, to show differences in how images were generated with the exact same prompts. Those images were created in Stable Diffusion, DALL-E 2, and Midjourney. Some were sent through AI filters, and a few were modified in DALL-E 2 to show the technique of outpainting. Other generated images were the result of prompting Scribble Diffusion with an original sketch. Any images that underwent a small number of regenerations were used to illustrate the functionality of these platforms and were intentionally not rendered or upscaled in high fidelity. Prompts were mainly original with some common words generated in natural language models and others modified from open sharing sites. Complex prompts were found on various web portals and meant to show how elaborate some prompts can be. No prompts were paid for or licensed.

Walk Through Figure 2, FrankenAI Lab The original prompted image was a photo taken by the author to capture windows and a particular perspective of a room imagined to be a scientific lab. 412

APPENDIX

Image and Text Sources

The photo in Figure A-1 was uploaded to an image-to-image AI with the prompt “reverse-dolly-zoom-of-a scientific-lab-with-lots-of-windowsmicroscopes-and-lab-equipment-black-and-white-still-digital,” and the result was four variations as in Figure A-2. Reducing the influence of the prompt photo will generate more interesting results, and increasing the values of how much the AI responds to the text prompt will also make a difference. In this case the prompt guidance was set to just over 50% and the influence or strength of the prompt image to around 30%.

Figure A-1.  Original photo of the author’s office using an iPhone

413

APPENDIX

Image and Text Sources

Figure A-2.  Four images generated from the prompt “reverse-dollyzoom-of-a scientific-lab-with-lots-of-windows-microscopes-and-labequipment-black-and-white-still-digital.” Total iterations = 56 From a total of the four selected images that were generated in Figure A-2, the thumbnail in the bottom right was chosen as it added more windows and a wider zoomed-out perspective to increase the size of the space compared with the original photo.

414

APPENDIX

Image and Text Sources

Figure A-3 prompted the next generated images setting the strength of the prompt image on the generated one at 30% and setting the number of steps to generate a higher-quality image at 97. In addition, I added the keywords “Frankenstein, electricity, and misty” to the prompt.

Figure A-3.  The bottom-right image in Figure A-2 is used to prompt the next set of four generated images along with the text prompt “reverse-dolly-zoom-of-Frankenstein’s-lab-with-lots-of-windowsmicroscopes-and-lab-equipment-black-and white-electricity-misty.”

415

APPENDIX

Image and Text Sources

From the four generated images in Figure A-4, the thumbnail on the bottom right was chosen particularly because of the combination of top window, mist, and wiring not previously present in Figure A-3. At the same time, the previous windows and the size of the lab were preserved, adding a small library and a lower-right window, creating a more surreal effect. The lack of electricity in the images had to with their weight in the prompt, which I would come to experiment with multiple times in the iterative process.

Figure A-4.  Prompt: “reverse-dolly-zoom-of-Frankenstein’s-lab-withlots-of-windows-microscopes-and-lab-equipment-black-and whiteelectricity-misty”

416

APPENDIX

Image and Text Sources

Considering the lack of electricity in Figure A-5, I worked with the text prompt to prioritize it in the next round of generated images. In addition, now that the room size and perspective were working, I could remove the words “reverse dolly zoom.”

Figure A-5.  Prompt: “reverse-dolly-zoom-of-Frankenstein’s-lab-withlots-of-windows-microscopes-and-lab-equipment-black-and whiteelectricity-misty” From the eight generated images in Figure A-6, the bottom-left thumbnail had all the elements and expanded the bottom windows. That image was then chosen as the next prompt for the AI (Figure A-7).

417

APPENDIX

Image and Text Sources

Figure A-6.  Prompt: “electricity-Frankenstein’s-lab-with-lots-ofwindows-microscopes-and-lab-equipment-black-and white-misty” 418

APPENDIX

Image and Text Sources

Figure A-7.  Prompt: “electricity-Frankenstein’s-lab-with-lots-ofwindows-microscopes-and-lab-equipment-black-and white-mistybones-profane fingers-scraggy-limbs” The only thing missing from my lab were body parts if I stayed true to Mary Shelley’s novel. Even though I wasn’t sure if I wanted them in the image, the next generated images added words from Chapter 4 of Frankenstein to the prompt as in Figure A-8. The results were a bit over the top and grotesque, which reminded me of why I was creating this image. In the case of a critical how-to book aimed at creatives, the intent of the final generated image was to show the process and use Frankenstein’s lab as metaphor for a reader’s own generative AI lab. The next series of four figures showed an iterative attempt to change the prompt text using Figure A-7 as the prompt image.

419

APPENDIX

Image and Text Sources

Figure A-8.  Prompt: “electricity-Frankenstein’s-lab-with-lots-ofwindows-microscopes-and-lab-equipment-black-and white-mistybones-profane fingers-scraggy-limbs” Since all the images in Figure A-8 were scary and grotesque, I used the same image in A-7 but refined the prompt to eliminate the words “bones, profane fingers” but kept the “scraggy limbs” in the prompt (Figure A-9).

420

APPENDIX

Image and Text Sources

Figure A-9.  An attempt to regenerate by subtracting some body parts. Prompt: “electricity-Frankenstein’s-lab-with-lots-ofwindows-microscopes-and-lab-equipment-black-and white-mistyscraggy-limbs” Except for the super-creepy top-left image in Figure A-10, I was getting closer to a look that didn’t have to be scary like Frankenstein’s lab. In fact, the reference to Frankenstein was next on the subtraction block (Figure A-11).

421

APPENDIX

Image and Text Sources

Figure A-10.  Substituting scraggy limbs with the word “specimens.” Prompt: “electricity-Frankenstein’s-lab-with-lots-of-windowsmicroscopes-and-lab-equipment-black-and white-misty-specimens”

422

APPENDIX

Image and Text Sources

Figure A-11.  Making the word “specimens” singular. Prompt: “electricity-lab-with-lots-of-windows-microscopes-and-labequipment-black-and white-misty-specimen” Figure A-11 demonstrated the potential for any of these images to be close to the lab that I was starting to imagine. However, somehow the original perspective and size of the lab were lost, so an iteration of four images was generated reverting back to Figure A-7 and using the same text prompt as Figure A-11 but substituting the word “specimen” for the words “brain in a bottle” (Figure A-12).

423

APPENDIX

Image and Text Sources

Figure A-12.  The prompt “electricity-lab-with-lots-of-windowsmicroscopes-and-lab-equipment-black-and white-misty-brain in a bottle,” combined with Figure A-7 as the prompt image, resulted in the AI generating our windows back The images in Figure A-12 tended to be less grotesque, but there was little color as these were generated using a black-and-white style applied through playground.ai. To bring some color into the next batch of generated images, a new style was applied to the same prompt using Figure A-7. Figures A-13 and A-14 provided us with some blues to not make the lab so ominous.

424

APPENDIX

Image and Text Sources

Figure A-13.  Prompt: “electricity-lab-with-lots-of-windowsmicroscopes-and-lab-equipment-black-and white-misty-brain in a bottle” with a new style filter

425

APPENDIX

Image and Text Sources

Figure A-14.  Prompt: “electricity-lab-with-lots-of-windowsmicroscopes-and-lab-equipment-black-and white-misty-brain in a bottle” with another new style filter using playground.ai The bottom-right image of Figure A-14 added a bit of dimension to the lab that made certain elements stand out, so this image was chosen, and then a variation of the image was rendered using another style that could be applied while the new image was being generated. The resulting Figure A-15 revealed the complexity of the style that was applied to generate the image.

426

APPENDIX

Image and Text Sources

Figure A-15.  This shows the complexity of the full prompt when adding a specific style available through a number of generative AI platforms. Notice the references to other artists that our final prompt would eliminate While the generated image is interesting, the bottom-right thumbnail from Figure A-14 was chosen to represent the AI lab. While Figure A-16 was a wonderful result from where the journey started using an old photo of an empty office, somehow the electricity and sparks went missing from this generated version. Our final image then was generated by adding the text “electricity sparks and lightning” at the beginning of the prompt to ensure it showed up in the generated image (Figure A-17).

427

APPENDIX

Image and Text Sources

Figure A-16.  Generated image for the prompt “electricity -lab-withlots-of-windows-microscopes-and-lab-equipment-black-and whitemisty-brain in a bottle” seems to missing some electricity

428

APPENDIX

Image and Text Sources

Figure A-17.  The AI brought back the electric sparks to the image by adding “electricity-sparks-lightning” at the beginning of the prompt “lab-with-lots-of-windows-microscopes-and-lab-equipment-blackand white-misty-brain in a bottle.” Lastly, Figure A-17 was manipulated using Photoshop’s neural filters to generate slightly more haze, and some subtle color was added to the image. Figure A-18 now represents the final version of the figure, which appears in the Foreword of the book (Figure 2).

429

APPENDIX

Image and Text Sources

Figure A-18.  Figure A-17 was opened up in Photoshop, and various neural filters were applied to generate the final version of our photograph. The preceding image is then used in the Foreword of the book (Figure 2) While not all images in the book go through the exact same workflow as described in this appendix, the process of generating an image iteratively was similar for many images in the book. More workflows will be demonstrated on the website that accompanies the book available through a QR code that appears at the end of the Introduction (Figure 6).

430

Index A AARON, 52 Academic community, 350 Accelerated prototypical workflows, 339 Agile sprints, 399 AI companion, 99 boundary-pushing, 101 collaboration, 101 patterns, 101 risk-taking, 101 speeding, 101 styles, 101 AI in film industry behavioural simulation, 382 combinatorial probability, 382 multiple intelligent systems, 383 Ollie Rankin, 381 realistic-looking digital characters, 382 Weta, 381 workflows, 381, 382 Algorithmic creativity, 54 Alienation effect, 275 Analytical Engine, 6 Animation, 128, 160, 242, 356, 383

Application Programming Interfaces (APIs), 334, 339 Artificial creativity, 46 Artificial General Intelligence, 3 Artificial intelligence (AI), 17, 30, 60, 182, 183, 193, 284, 302, 318 adversarial, 281, 283 algorithms, 317 architect, 403 art director, 402 art generation styles, 255, 256 Bread-making industry, 388 chatbot, 350 consultants, 404 Content Creator, 402 content editors, 404 could, 398 Data Scientist, 402 developers, 129 ethical futures, 335 Ethics Officer, 403 filters, 412 Game Designer, 402 generated content, 188, 278, 292, 300, 339 generated images, 250

© Patrick Parra Pennefather 2023 P. Parra Pennefather, Creative Prototyping with Generative AI, Design Thinking, https://doi.org/10.1007/978-1-4842-9579-3

431

INDEX

Artificial intelligence (AI) (cont.) generated prototypes, 366 generated speech, 343 generated submissions, 333 generated talking heads, 362 generated works, 314 language model, 349 Music Composer, 402 powered muses, 406 reader, 102 technology, 328 Text Classifier, 311, 312 User Experience (UX) Designer, 403 Artificially intelligent programmed systems, 8 Artmaking, 28, 49, 97, 235 Audio-based prompts, 241 Augmentation, 163, 164 Augmented reality (AR), 122, 233, 345 Automated creativity management art vs. design processes, 396–400 bread and washing machines, 388–390 Day in the Life tools, 395, 396 integrating AI and workflows, 393, 394 themes, 390–392 Award-winning innovations, 346

432

B Battle choreography technique, 381 Behavioral simulation, 383 Bias Monster, 71 Bill Zhao, 375, 385 Biomedical communication, 350 Bisociation, 113, 115 Blockchain technology, 263, 264 Blue Shadows on the Trail, 358 Brain building technique, 381 Building blocks addition, 154, 155 prompt, 146 prompts, 177, 178 prototyping, 147, 148 response, 176 substitution, 158 subtraction, 154–156 transposition, 166–175

C Chain prompting, 200, 204, 262 ChatGPT, 391 ChatGPT-3, 239, 242, 300, 346, 350, 351, 411 ChatGPT-3-generated text, 311 ChatGPT-4, 242, 264, 268, 344, 346, 375, 411 Chinese broadcast company, 345 Claudia’s Summary Statement, 350

INDEX

Coalition for Content Provenance and Authenticity (C2PA), 312 Co-dependent tasks, 398 Coherent thesis statement, 352 Combinatorial probability, 382, 384 Community-based initiatives, 377, 378 Composite-generated image, 308 Composition, 184, 274, 284, 335, 393 Comprehensive Test Ban Treaty Organization (CTBTO), 353 Computational creativity, 54, 285, 344 Computational intelligence, 45 Computational workhorse, 354 Computergrafik, 264 Computer science, 6, 66, 353 Content Authenticity Initiative (CAI), 312 Content generation, 300, 329 Content moderation, 374 Contrastive Language-Image Pre-training (CLIP), 201, 205 Controversial start-ups, 313 Conversational and authoritative tone, 350 Conversational prompting, 199 Copyright infringement, 300 Copyright Infringer, 81 Cost-effective content, 391 Counterfeiting GAN, 264, 266

Creative impulses, 103, 247, 392 Creative process adaptive learning, 104 consistency, 105 idea generation, 104 time efficiency, 104 variety of styles, 104 Creative prompting, 239–241 Creatives, 396, 406, 408 Creatives in education catching AI untruths, 346–351 integrating ChatGPT, critical studies, 351, 352 public understanding, ml, 352–354 Creatives in industries AI-generated talking heads, 362–365 APIs and local networks integration, 373–376 concept art, animation, 356–361 fact-checking code, 366–373 Creative teams, 9, 335, 408 Crowd replication approaches, 382 Cryptocurrencies, 263 Cybernetics, 46 Cyborg AI, 405 Cyborg generated model, 322

D DALL-E 2, 164, 204, 205, 211–213, 221, 285, 286, 412 Data management, 374 433

INDEX

Day in the Life tool, 394 AI integration, 395 AI research, 395 identifying gaps, 396 identifying routine tasks, 395 peer review, 395 storyboarding, 395 task decomposition, 395 workflow refinement, 396 Deep fakes, 276, 295, 325, 326 Deforum, 375 Descriptive prompts, 236–238 Development teams, 9, 408 Diffusion, 212 Digital art, 54, 263 Digital art, canvas, 250, 251 Digital artworks, 55 Dilemmas, 299–301, 304, 335, 336 Diminution, 165 Discipline-specific AI curators, 404 Domain images, 411 “Do Not Train” tags, 312 Doujinshi, 257–261 Dystopian future scenarios, 355

E Emotion matrix, 383 Employment trends, 390 Enigma machine, 45 Equalization, 154, 393 Error handling, 374 434

Ethical dilemmas, 122, 207, 300, 314, 341 Exploratory prompting, 200

F Fake Content Generator, 74 Few-shot stimulation, 199 Fidelity, 99, 125, 160 Financial and social capacity, 300 Five Why’s exercise, 21, 23, 24 Forms and structures AI, 182 combining and manipulating elements, 183 creatives, 182 optimization, 183 pattern recognition, 183 FrankenAI, 327, 328 FrankenAI Lab image-to-image AI, 413 prompt text 1, 414 prompt text 2, 415 prompt text 3, 416 prompt text 4, 417 prompt text 5, 417, 418 prompt text 6, 419 prompt text 7, 420 prompt text 8, 421 prompt text 9, 421, 422 prompt text 10, 423 prompt text 11, 424 prompt text 12, 425 prompt text 13, 426

INDEX

prompt text 14, 427 prompt text 15, 427, 428 prompt text 16, 429 prompt text 17, 430 “Frankenstein, electricity, and misty”, 415 Frankentoddlers, 82 Functional prototype, 129, 131 Future AI Jobs AI Art Director, 402 AI Content Creator, 402 AI Data Architect, 403 AI Data Scientist, 402 AI Ethics Officer, 403 AI Game Designer, 402 AI Music Composer, 402 Generative AI Engineer, 402 Generative AI Researcher, 403 UX Designer, 403 Fuzzy logic, 382, 384

G Game companies, 376 Generated AI response, 303 Generated prototypical content, 392 Generated stories, 16 Generative adversarial art, 283, 284 Generative adversarial networks (GANs), 256, 263, 264, 266, 282–285 Generative AI, 7, 8, 10, 182, 183, 185, 235, 249, 268, 287, 292

art generation styles, 255, 256 audio, 152 composers, 145 content, 306 creative activity, 20 creative process, 10 creative prototyping tool, 409 dancers, 145 engineer, 402 examples, 20 human-generated headlines, 18 illusion, 10 innovation and excellence, 410 mimesis, 47 musicians, 145 negative headlines, 18 positive headlines, 17 patterns, 300 prototypes, 335 researcher, 403 technology, 10 text-image, 152 text-music, 41 text-to-music, 41 visual artists, 145 writing, 251–253 Generative art, 49, 50, 263, 266 Generative fill, 340 Generative NFTs, 263, 264, 283 Generative Pretrained Transformer (GPT), 163, 353, 355 Generator, 283 Gesture prompts, 242 Gig economy, 390 435

INDEX

GPT-4, 236 GPT-4 API, 373–374 Graphic novel mashups, 257–261 Greedy Capitalist, 75

H “Half-tone” technique, 348 Hallucinations, 149, 314, 317, 324 Hallucinator, 72 Higher-resolution prototype, 124 High-fidelity (hi-fi) images, 122 High-fidelity prototypes, 125, 131 Human intelligence, 6, 30, 32, 46, 66, 328 Human intervention, 9, 19, 129, 392 Human-like AI, 59 Human-like communication, 19 Human-like creativity, 32 Human-like robots, 277 Hyena Algorithm, 356 Hyena Algorithm developed by Stanford, 354 Hyperreality, 47

I Image analysis, 354 Image-based generative AI, 290 Image-based prompts, 242 Image-image AI, 96, 98, 175, 249, 251, 257, 261, 269, 365 Image-image generative AI, 44, 106, 169, 217, 293 436

Image-image substitution, 358 Image-to-image generative AI, 411 Indigenous researchers, 336 Individual creative processes, 394 Instructional prompting, 199 Integration, 374 Intellectual property (IP), 400 Intelligent machines, 4, 6 ancient musical robot bands, 35–37 automata, 32–34, 38, 39 Camera, 28 Computer, 29 Film, 29 games, 56, 58 human creative, 32 Internet, 29 Paint, 28 Paper, 28 Printing Press, 28 Stone Tools, 28 Synthetic Paint, 29 3D printing, 29 Writing Instruments, 28 Intelligent NPCs, 58, 59 Intelligent systems, 384 Interactivity, 55 Interdisciplinary research, 353 International Atomic Energy Agency (IAEA), 353 In-the-moment improvisation, 329 Iteration, 160 Iterative process, 30, 55, 176, 204, 416

INDEX

J Job displacement, 388 Job opportunities, 388 Job (Re)placement, 328–332

K Know-it-All Tutor, 79 Knowledge translation, 354

L Language models, 93, 322, 323 Large language models (LLMs), 197, 198, 201, 216, 239, 268, 287, 353, 355 Late-stage prototypes, 121 Learning management systems, 376 LGBTQ+ community, 321, 322 Liar, 73 Linear algebra pattern matching, 353 Little Curie, 347 “Living with Machines” project, 377 LLM-generated content, 335 Lower-fidelity prototypes, 125, 128 Low-fidelity, 126 Low-poly render, 305

M Machine intelligence, 32, 45, 48, 56, 384

Machine learning (ML), 58, 67, 92, 248, 290, 303, 307, 311, 317, 320, 321, 336, 384, 404 Machine Visions, 21 Many-shot prompting, 199 Marginalized groups, 321 Masher-Upper, 76 Mashups, 306, 315 AI art generation styles, 255, 256 comic supervillains to musical, 262, 263 counterfeiting GAN, 264, 266 Doujinshi, 257–261 generative AI, 248 generative NFTs, 263, 264 goal, 247 graphic novel, 257–261 machine learning model, 249 media, 247, 249 prompt-based magic, 247 Masks, 159, 340 Masterpiece Studio, 344 Mechanical inventions, 43 Midjourney, 204, 205, 213–218, 220, 222, 287, 293 Mid-stage prototypes, 121 Mimesis, 47 Minecraft, 397 Miniature automata, 33 Miniature X-ray apparatus, 347 Modern computational fortune-­ tellers, 4 Montreal Institute for Learning Algorithms (MILA), 354, 356 437

INDEX

Motion capture, 324, 344, 382 Multi-modal AI nodes, 375 Multiple machine learning models, 204, 339, 365 Multiple variations, 150 Muses, 91, 93

N Narrative arcs, 351 Narrative designer, 397 Narrative prompts, 235–238 Narrow AI, 1, 83, 332, 356, 381, 390 Natural language models, 92, 412 Natural language processing, 49, 235, 290, 354, 356 Near-instantaneous responses, 351 Needle method, 348 Negative prompts, 106, 165, 171, 293 Nemesis, 68, 161 Neural filters, 178, 220, 340, 412, 429, 430 Neural networks, 92, 197, 307 Non-Fungible Tokens (NFTs), 263 Non-playable characters (NPCs), 57, 397 Not-for-profit Thaumazo, 377 Not Safe For Work (NSFW), 325, 326 NSFW image generation, 326 NVIDIA, 273, 376

438

O Off-the-Rail Sitting Duck, 80 OpenAI’s API, 377 Open source code repositories, 373 Optimization, 183 Outpainting technique, 163, 412 Overused story themes, 391

P, Q Paper prototyping, 122, 125, 126 Pattern-matching technique, 49 Pattern recognition, 183, 198 Pay-the-machine model, 41 Paywall Guard, 70 Persistent myths, 301, 302 Personas, 65, 82, 91, 242 activities, 84, 87 AI agents, 83 Photoleap, 412 Photoshop, 67, 154, 214, 246, 340, 412 Pipeline integration, 396, 401 Platform-specific filters, 412 Player pianos, 41, 42 Post-war systems engineering, 46 Prototyping, 103 Practical electromagnetic wave propagation problems, 353 Pre-programmed intelligent machines, 35 Programmable node-graph interface, 375

INDEX

Prompt engineers, 199, 404 Prompts, 199, 412 case sensitivity, 200 chain, 200 conversational, 199 creative, 239–241 DALL-E 2, 211–213 descriptive, 236–238 descriptive language, 201 engineer, 199 exclusions, 201 exploratory, 200 few-shot, 199 image-image generation, 219 instructional, 199 iterative process, 206 length, 200 many-shot, 199 Midjourney, 213–218 narrative, 235–238 negative prompt, 201 order, 200 parameters, 200 persona, 201 Reddit and copy, 207 rules, 201 rules, play, 202, 203 sequence, 199 Stable Diffusion V.2.1, 207, 208 subject, 200 subscription-based monetization schemes, 206 third-party AI

Stable Diffusion V.1.5, 209, 210 Stable Diffusion V.2.1, 210 weight, 200 zero-shot cognitive, 199 Prototype, 408, 409 Prototyping environments, 376 Prototyping process definition, 110 experimentation, 119–121 generative AI, 132, 135 generative testing, 140, 141 iterative play, 115 phases, 121, 122 rapid generative AI, 137, 139 refinement, 117, 118 text-image, 111 Public/private entities, 8

R Radical transformation, 300, 392 Radiography techniques, 348 “Radiology in the Third Reich: The Legacy of Fritz Lickint”, 347 Randomness, 384 Random number generators, 382, 384 Reading headlines, 352 Recurrent neural network (RNN), 343 Repeatability, 390 Replika chatbot, 378 Resolution, 124 439

INDEX

Rotoscoping, 332 Rough prototypes, 10, 160

S Saturated AI ecosystem, 333, 334 Sci-fi pulp romance story, 362 Scribble Diffusion, 122, 242, 412 Search engine, 84, 304 Self-improving AI, 2 Semantic differences, 304 Service- and resource-based industries, 340 Shapeshifting robots, 66 Shutterstock, 376 Sketch-to-code prompting, 242 Snapseed, 412 Sound design production, 393 Spanner’s invention, 346 Spanner’s x-ray unit, 347 Specialized skills, 387 Stable Attribution, 309, 313, 408, 412 Stable Diffusion, 193, 204, 205, 212, 219, 243, 285, 286, 309, 336, 375, 408 Stable Diffusion–generated image, 307, 309 Standard 52-card deck, 366 State machine, 382 Stereotypes, 295, 320, 326 Stochastic parrot, 318–320 Style transfer, 268 Substitution, 146, 158, 358 Super-realistic human robot, 278

440

T Team-based workflows, 396, 401 Technical language, 264, 314 Technology development, 313, 375 Templated production process, 393 Text-based prompt, 197, 241–244 add context, 228, 229 add words, 226, 228 AI system types, 204 benefits, 204 eliminate artists, 223, 224 emphasis, weight, simplicity, 230, 231 image-generating AI, 204 process, 204 recraft, 231 substitute words, 224, 226 Text-code, 242 Text generation model, 362 Text-image AI, 50, 87, 204 Text-image AI prototype, 111 Text-image generative AI, 31, 50, 155, 176, 186, 189–192, 204, 281, 363 Text-image platforms, 412 Text-speech AI, 85 Text-speech generative AI, 364, 365 The 42 Judges project, 377 The Lord of the Rings trilogy, 381 The Matrix, 327 The Modern Prometheus, 327 The Terminator, 327 3D artist, 233, 390

INDEX

3D image prototype, 162 3D-printed mechanism, 132 Timeboxing, 105 Tokens, 198 Traditional Hollywood-style scripting, 391 Transformer algorithm, 355 Transposition, 166–175 TurboSquid, 376

U Uncanny AI, 276–279 Uncanny Machine, 77 Uncanny valley adversarial AI role, 281, 283 AI-generated content, 292 Arts, 274, 275 creativity, 296 DALL-E 2, 285, 286 deep fakes, 295 deep learning techniques, 273 define, 272 definition, 271 deformed/distorted images, 296 generative adversarial art, 283, 284 historical precedence, 279, 280 imperfection, 281 language models, 274 machine learning model, 290 Midjourney, 287, 293 negative prompts, 293 NVIDIA’s StyleGAN2, 273

robotics, 273 simulate human, writing, 287, 289, 290 Stable Diffusion, 285, 286 Unity-based game, 340 Unity 3D game engine, 373 Unsupervised learning model, 285

V Variations, 148, 152, 153 Video game development, 56, 397 Video-generating teaching tool, 362 Video-video generative AI, 343, 404 Virtual paper prototypes, 127 Virtual reality (VR), 345 Voices, 320, 321 Voyager, 54

W Web-based animation, 356 Websites, 344 Whiteboard and prompt, 242 Wizard, 78 Wizard of Oz prototypes, 129, 131 Work experiences, 399 Workflows vs. pipelines, 340–342 Workflow Types audio book and podcasts, 343 audio creations and design, 343, 344 image, 342, 343 441

INDEX

Workflow Types (cont.) spoken/sung characters, 345 text, 342 video, 343 websites, 344 xR human models, 344

442

X, Y xR human models, 344

Z Zero-shot cognitive prompting, 199