Research Exposed: How Empirical Social Science Gets Done in the Digital Age 9780231548007

Research Exposed offers in-depth, behind-the-scenes accounts of doing empirical social science in the era of digital com

239 10 3MB

English Pages [286] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Research Exposed: How Empirical Social Science Gets Done in the Digital Age
 9780231548007

Citation preview

RESEARCH EXPOSED

Research Exposed HOW EMPIRICAL SOCIAL SCIENCE GETS DONE IN THE DIGITAL AGE

Edited by Eszter Hargittai

Columbia University Press New York

Columbia University Press Publishers Since 1893 New York Chichester, West Sussex cup.columbia.edu Copyright © 2021 Columbia University Press All rights reserved Library of Congress Cataloging-in-Publication Data Names: Hargittai, Eszter, 1973– editor. Title: Research exposed : how empirical social science gets done in the digital age / edited by Eszter Hargittai. Description: New York : Columbia University Press, [2020] | Includes bibliographical references and index. Identifiers: LCCN 2020018243 (print) | LCCN 2020018244 (ebook) | ISBN 9780231188760 (hardback) | ISBN 9780231188777 (paperback) | ISBN 9780231548007 (ebook) Subjects: LCSH: Social sciences—Methodology. | Internet research. | Social media—Research. | Digital media—Research. Classification: LCC H61 .R4647 2020 (print) | LCC H61 (ebook) | DDC 300.72—dc23 LC record available at https://lccn.loc.gov/2020018243 LC ebook record available at https://lccn.loc.gov/2020018244

Columbia University Press books are printed on permanent and durable acid-free paper. Printed in the United States of America Cover design: Lisa Hamm

To my parents for instilling in me a love of books and showing the way for thorough work.

CONTENTS

Introduction

1

ESZTER HARGITTAI

Chapter One When Social Media Data Disappear 6 DEEN FREELON

Chapter Two The Needle in the Haystack: Finding Social Bots on Twitter 30 TOBIAS R. KELLER AND ULRIKE KLINGER

Chapter Three Meeting Youth Where They Are: Challenges and Lessons Learned from Social Media Recruitment for Sexual and Gender Minority Youth 50 ERIN FORDYCE, MICHAEL J. STERN, AND MELISSA HEIM VIOX

Chapter Four Qualitative Sampling and Internet Research 78 LEE HUMPHREYS

Chapter Five Behind the Red Lights: Methods for Investigating the Digital Security and Privacy Experiences of Sex Workers 101 ELISSA M. REDMILES

viii CONTENTS

Chapter Six Using Unexpected Data to Study Up: Washington Political Journalism (and the Case of the Missing Press Pass) 123 NIKKI USHER

Chapter Seven Social Media and Ethnographic Relationships 143 JEFFREY LANE

Chapter Eight Ethnographic Research with People Experiencing Homelessness in the Digital Age 160 WILL MARLER

Chapter Nine Going Rural: Personal Notes from a Mixed-Methods Project on Digital Media in Remote Communities 184 TERESA CORREA AND ISABEL PAVEZ

Chapter Ten Stitching Data: A Multimodal Approach to Learning About Independent Artists’ Social Media Use 205 ERIN FLYNN KLAWITTER

Chapter Eleven A Measurement Burst Study of Media Use and Well-Being Among Older Adults: Logistically Challenging at Best 225 MATTHIAS HOFER

Chapter Twelve Community-Based Intervention Research Strategies: Digital Inclusion for Marginalized Populations 245 HYUNJIN SEO

Contributors Index

271

265

RESEARCH EXPOSED

INTRODUCTION ESZTER HARGITTAI

Digital media bring social scientists many exciting opportunities, both methodologically and as an object of study.1 They offer the potential for processing data more quickly, for helping find patterns that are less obvious to the naked eye, for accessing information about people’s everyday behavior, for unearthing connections that require significant processing power—and the list goes on. Nonetheless, traditional methodological considerations such as sampling bias remain important aspects of research given that more data do not necessarily mean better data. More is sometimes more of the same bad information and can result in mistaken conclusions if the large data sets represent sampling biases.2 The mere existence of more data is also not helpful if much of the data are not visible to researchers, either because social interactions heretofore accessible to researchers conducting in-person observations on digital media are now restricted or because the data are in the hands of private companies. Indeed, opportunities often come hand in hand with challenges ranging from data quality to data access and ethical considerations. This book brings together essays from scholars doing cutting-edge research both using digital media to study social science questions and asking social science questions about digital media’s increasing importance in everyday life. Its aim is to offer researchers realistic examples (with solutions!) of how empirical social science gets done in the twenty-first century

2 INTRODUCTION

when so much of what people do concerns mediated communication through digital devices. Contributors explicitly reflect on the behind-thescenes realities of their experiences, sharing details that are rarely included in the methods sections of project write-ups even though they are essential for understanding how research actually gets done. Included here are vivid firsthand accounts of original empirical social science research as it is being done in the digital age. Using these innovative research projects, this volume presents a wide range of methods— some completely novel and others more traditional—in the digital context. Project methods range from data scraping (chapters 1, 2, and 10) to ethnographic observations (chapters 6, 7, and 8), from interviews (chapters 4, 5, and 10) to focus groups (chapters 9 and 12), and from survey recruitment (chapter 3) to ecological momentary assessment (chapter 11), with chapters 9, 10, 11, and 12 reporting on mixed-methods projects. Although books on research methods abound, few of them address the sometimes brutal and often undisclosed realities of collecting and analyzing empirical evidence, whether that concerns tracking down deleted Twitter data (chapter 1) or trekking in the Chilean mountains after a flood to connect with respondents in rural villages (chapter 9). By drawing on lessons learned from over a dozen scholars’ cutting-edge research, this volume addresses methodological challenges researchers face in our digital era. By focusing attention on the concrete details seldom discussed in final project write-ups or traditional research guides, it equips both junior scholars and seasoned academics with essential information that is all too often left on the cutting-room floor. Methods books rarely read like detective novels, but the chapters in this volume can resemble the genre. For example, the authors track down tweets no longer available on the platform (chapter 1), step inside the U.S. Senate Press Gallery to locate a press pass (chapter 6), familiarize themselves with the red-light districts of Zurich to find sex workers willing to talk to them (chapter 5), figure out the social media preferences of sexual and gender minority youth in order to reach them for a survey (chapter 3), knock on the doors of rural village residents to find participants for a focus group study (chapter 9), travel across the U.S. Midwest visiting arts and crafts shows to develop a defensible sampling method for a study of the economic benefits of selling creative goods online (chapter 10), and reach early adopters of technology even while living in a rural college town

3 INTRODUCTION

(chapter 4). These are just some of the puzzles the authors in this volume solve, showing how research can benefit from creativity, resilience, and patience in addition to thoughtful and careful planning. In some chapters, the biggest challenges concern identifying, approaching, and gaining the trust of potential study participants when simply turning to an existing list of possible respondents to contact en masse is not an option. Such is the case when wanting to interview sex workers to study their security and privacy practices (chapter 5), when attempting to reach sexual and gender minority youth to learn about their health practices (chapter 3), and when figuring out how to get to know homeless people on the streets of Chicago to understand the role of digital media in their lives (chapter 8). In other studies, the main challenge is making sense of large amounts of data (think millions of observations) when forces are working directly counter to your efforts (chapter 1) or determining what counts as a relevant case when there is no established consensus on the criteria (chapter 2). But small data can also offer plenty of challenges, as shown when a deep dive into a single Facebook post by a study participant supports the continued importance of in-person observational context when analyzing online content (chapter 7). Chapters 6, 7, and 8 use ethnographic methods to highlight what it means to observe people in person when many of their interactions with others happen on devices whose content may not be as readily accessible to the in-person observer as was traditionally the case. In chapter 8, in-person observations and conversations with people on the streets of Chicago supplement online interactions, while in chapter 6, unexpected data on sites that are extremely difficult for researchers to access, such as political venues in Washington, DC, are shown to be important. In chapter 7, the other side of digital media’s challenge to ethnographers—the still relevant contributions to a project from in-person observations, even if some of the focus is on people’s mediated communication—is highlighted; for example, the researcher interpreting a single social media post can benefit greatly from spending time with respondents in person to get to know their milieu and everyday interactions. Several chapters reflect on the complexities of relying on multiple methods within one study, which is an excellent way to triangulate data but can multiply the number of challenges researchers face. The authors use both surveys and interviews (chapter 9); rely on surveys, interviews, and

4 INTRODUCTION

automated social media data scraping (chapter 10); use surveys, automated data collection of participants’ media environment, and focus groups (chapter 11); and rely on observations, interviews, and content analysis (chapter 12). Many of the chapters reflect on how important it is to think about case selection carefully, with chapter 4 making this the central focus in its examination of qualitative methods when studying people’s use of mobile media. Because the writing of a chapter for such a volume is rather unusual compared to much of the academic writing scholars do, most of the chapters went through heavy editing. For their patience and openness to addressing my many questions and comments, I am deeply grateful to the contributors to this volume. I suspect some of them did not fully internalize what they were signing up for when they agreed to contribute, but all were great sports about engaging with my feedback. Thanks to this exceptional backand-forth, the volume offers a wealth of extremely accessible and engaging information about how to conduct numerous types of research studies from collecting automated bursts of media consumption (chapter 11) to interviewing homeless people (chapter 8) and sex workers (chapter 5), from thinking through sampling for qualitative studies more generally (chapter 4) and for the study of crafters in particular (chapter 10) to reaching marginalized teens on social media (chapter 3), from collecting in-person data in remote rural villages (chapter 9) to offering a computer and internet training program to low-income African American adults (chapter 12), and from solving the mysteries of Twitter bots (chapters 1 and 2) to learning about journalists’ practices (chapter 6) and Harlem youths’ daily lives (chapter 7) in the digital age. Whether directly related to the reader’s research topic or methods, the chapters offer important insights into how empirical social science research can be—indeed should be—both innovative and rigorous when dealing with the opportunities and challenges offered by digital media. The chapters are organized as follows. Chapters 1 and 2 concern largescale analyses of Twitter data, although the focus in both is more on identifying the relevant data than on describing the analyses themselves. Chapter 3 also focuses on social media but from the angle of using it as a tool for recruiting participants in a survey study. Chapter 4 is concerned with sampling and how traditional efforts to avoid biases must still be made in qualitative studies of digital phenomena. This is followed by chapter 5, which details efforts to recruit sex workers for a study about privacy and security

5 INTRODUCTION

concerns without biasing the selection against or toward participants who are particularly anxious about or, conversely, agnostic about their privacy and security. Chapters 6, 7, and 8 report on ethnographies, while chapters 9, 10, 11, and 12 present mixed-methods studies. While it is unlikely that any one researcher will use all of the methods covered in this volume, it is essential that all researchers understand how studies using different methods get done and what quality considerations must be kept in mind. If we are to be able to converse with our colleagues and evaluate papers using methods different from our own, we must appreciate both the opportunities and the challenges of the myriad of approaches available to scholars studying digital media. NOTES 1. Christian Sandvig and Eszter Hargittai, “How to Think About Digital Research,” in Digital Research Confidential: The Secrets of Studying Behavior Online, ed. Eszter Hargittai and Christian Sandvig (Cambridge, MA: MIT Press, 2015), 1–28. 2. Eszter Hargittai, “Potential Biases in Big Data: Omitted Voices on Social Media,” Social Science Computer Review 38, no. 1 (2018): 10–24, https://doi.org/10.1177 /0894439318788322;

Chapter One

WHEN SOCIAL MEDIA DATA DISAPPEAR DEEN FREELON

After well over a decade of social scientific research on social media, a set of standard methods for collecting data has evolved. Researchers typically either purchase or extract data from application programming interfaces (APIs), using keywords, time spans, and other criteria to identify their desired content. The quality of the resulting data sets may vary greatly based on the robustness of the collection mechanism, the popularity of the topic being discussed, the state of the internet connection during the collection period, and the quality of the software used to collect the data, among other factors. But if the desired data are online and publicly available, methods are generally available to retrieve at least some of them.1 All of this is business as usual for social media researchers. This chapter is not about business as usual. Rather, it explores the question of how to collect data that have been erased from their primary locations on the Web, a category I call absent data.2 In such situations, the standard methods of data collection discussed above cannot be applied; indeed, in some cases it may not be possible to obtain absent data at all. But the four techniques I describe here are at least worth exploring when the standard methods fail. I illustrate these techniques through an extended case study of a Russian government disinformation campaign on Twitter to which I applied all four.

7 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

I begin here by defining what I mean by absent data and discussing some of the reasons people might want to study such data. I then describe the current case study data as an instance of absent data and introduce the four methods I used to gather that data. The subsequent four sections are devoted to describing these methods in detail and identifying general recommendations along the way. The final section offers a few remarks on why data will be increasingly likely to go absent in the future. WHAT ARE ABSENT DATA AND WHY SHOULD WE CARE?

The internet is, to understate the situation, overflowing with far more public data than anyone could ever analyze on nearly every topic imaginable. So why do absent data matter? Before answering that question, let me first clarify what I mean by the term. Absent data in this context are defined as data that have been removed from their primary posting locations. A data point’s primary posting location is simply the first place it is posted, as distinct from the many places it could be archived. In social media contexts, removing a message from its primary posting location simply means deleting the original message from that location. On Facebook and Twitter (and presumably other platforms as well), doing so also removes downstream shares and retweets of the deleted content. But removal from the primary posting location is not sufficient to destroy public access to a data point, as it may be archived elsewhere in a variety of formats for a variety of reasons. Indeed, secondary archiving is an essential component of all four of the absent data collection methods I will discuss later. Reasons not to bother with absent data are easy to enumerate: they are difficult to access, doing so may violate the platform’s terms of service (TOS), and in many cases, it will be impossible to ascertain the completeness of the data. I’ll discuss the first and third issues later, but right now I’ll briefly address the second. While in general researchers should avoid violating platforms’ TOS when possible, we should not allow corporations to define our empirical horizons. In other words, when theoretical and pressing social concerns conflict with TOS, the former should take precedence whenever permitted by law. Following Twitter’s TOS to the letter would make it impossible to study any activity that violates these terms because the company summarily deletes transgressive tweets as soon as it detects them. Given the social and political relevance of some forms of TOS-violating

8 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

activity, such as terrorism and clandestine propaganda, there is a strong argument to be made that violating Twitter’s TOS for research purposes is justified in at least some cases. Such arguments should be invoked sparingly and only when all TOS-compliant alternatives have been exhausted. As noted briefly earlier, absent data are especially important for socially sensitive research areas. Lies, disclosures of illegal activity, expressions of politically unpopular views, discussions of mental health, and details of sexual orientation or gender identity are just a few of the topics for which data may be especially likely to vanish. Since one of the main reasons for these data disappearances is the potential for embarrassment or harm, researchers pursuing studies that raise such risks should obtain institutional review board approval before moving forward. But the mere fact that certain data points have been removed from public view does not, as a practical or ethical matter, automatically eliminate them from empirical consideration. There are many circumstances in which the urgency of the research results may outweigh users’ interests in not having their social media content examined. One hopefully noncontroversial example is politicians’ and other public figures’ deleted posts, which are important components of their public records.3 Others include situations when the absent data may chronicle events of historical significance4 or address health issues that are difficult to study due to social desirability concerns.5 These studies and others exemplify how to balance user privacy and safety concerns with the potential benefits of their results. The foregoing explanation is intended only to demonstrate that there are, in some cases, good reasons to seek and analyze absent data. The remainder of this chapter focuses on the practical issues involved in collecting absent data, using an ongoing case study as an example. Those interested in further exploring the ethics of analyzing absent data can find brief discussions in several existing studies,6 although few seem to have addressed the topic extensively. THE CASE OF THE DISAPPEARING DISINFORMATION

I will discuss my absent-data collection methods through the empirical case of the Internet Research Agency (IRA). The IRA is a disinformation operation, or “troll farm,” funded by the Russian government that has infiltrated political conversations in multiple countries using local languages. Much of

9 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

what we know about the IRA’s operations in the United States comes from various investigative reports7 and the U.S. Justice Department’s indictment of thirteen individuals associated with the IRA.8 The organization’s U.S.focused division was well funded and professionally coordinated, set clear goals for workers, and rigorously evaluated social media responses to its communications. It organized hundreds of individual workers into purpose-specific departments and held the explicit aim of conducting “information warfare against the United States of America.”9 The IRA’s U.S.-focused division made special efforts to ensure its workers were familiar with American culture by, for example, noting holidays and important events in the election and becoming familiar with the major themes and ideas of partisan politics and social movements such as Black Lives Matter.10 Its disinformation campaign included standard social media content posted under false identities, paid advertisements, direct digital communications with activists (especially pro-Trump volunteers), and the organization of political rallies.11 And whereas initial discussion of the IRA’s work focused on the possibility that it overwhelmed networks with false information, more recent analyses are suggesting that a much greater part of their work involved the amplification of storylines that either had some basis in fact or were primarily expressions of opinion.12 The first English-language article on the IRA was published in 2014,13 but only a few of the organization’s social media accounts were widely known until more than three years later. On November 1, 2017, the U.S. House Intelligence Committee published a PDF file containing, among other information, the screen names of 2,752 suspected IRA Twitter accounts. The committee received these screen names from Twitter itself, which determined their provenance using as yet undisclosed methods. That said, the company apparently convinced the U.S. Congress that the accounts were indeed associated with a Russian disinformation campaign, and multiple media outlets corroborated their claims.14 Unfortunately for disinformation researchers, Twitter removed all IRA-related content from its platform before the screen names were made public. All of the tweets, multimedia content, and profile pages associated with these accounts were replaced with the company’s standard “Account suspended” message. Aside from a small number of screenshots provided by the House committee, neither researchers nor concerned citizens could see what kinds of content the IRA had been spreading or how successful its attempts had been.

10 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

As a highly visible exemplar of digitally mediated state-sponsored disinformation operations targeting citizens of Western democracies, the IRA is an extremely important object of study. Thus far, such activity has been studied mostly in non-Western contexts.15 Disinformation is a form of warfare,16 and its targets have a right to know about the weapons aimed at them. And while the IRA may not have altered the outcome of the 2016 U.S. presidential election, its attempts to do so are worth studying as part of a broader effort both to counter future disinformation campaigns and to understand their potential role in past events. Indeed, top U.S. officials warned that the Russians could interfere in the 2018 U.S. midterm elections,17 perhaps adapting their techniques based on lessons learned in 2016. More generally, disinformation and other deceptive content will become increasingly important to understand as tools to produce them evolve in subtlety and sophistication.18 FOUR METHODS OF PROCURING ABSENT SOCIAL MEDIA DATA

Good reasons to study a social phenomenon are important, but they do little to overcome practical obstacles to data collection. Fortunately, social media platforms are not the only places to obtain data originally posted there. In the following sections, I describe four general methods I used to obtain IRA-relevant Twitter data that were absent from Twitter at the time of my study. As I do so, I also highlight lessons, recommendations, generalizations, and limitations that will likely apply across cases. Method 1: Plumb Through Existing Data Sets

A colleague told me about the House Intelligence Committee’s Twitter screen name dump the day it was posted: November 1, 2017. I was very excited by this development for several reasons: First, I predicted there would be a great deal of interest in the IRA generally and in this data in particular. Second, since the tweets had already been removed from Twitter at that point, I knew they would be difficult to obtain, placing researchers with access in an enviable position. Third, I suspected I might already have some of the data. In March 2017, I had purchased a data set from Twitter for a research project the Knight Foundation had commissioned me to lead. This data set included all public tweets posted during 2015 and 2016

11 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

containing at least one of over 150 hashtags pertaining to black, feminist, and Asian American cultural interests (roughly 46.9 million tweets in all).19 I would not necessarily have thought to trawl through this data set for IRA content had it not been for two news articles I had recently read. One identified a Twitter/Facebook account by the name of “Blacktivist” as a propaganda outlet linked with the Russian government. The article claimed the account’s purpose was “to amplify racial tensions during the U.S. presidential election.”20 Another, which originally appeared in the Russian-language publication RBC but was readable via Google Translate,21 identified the pro-black IRA account @crystal1johnson, whose tweets I remembered seeing in the Knight data. I wondered how many IRA accounts that had used hashtags such as #BlackLivesMatter, #IfIDieInPoliceCustody, and #StayWoke were included in my data purchase. As it happened, I was sitting on a moderate trove of IRA data. Using custom Python code of my own design, I found that 354 unique IRA accounts tweeted at least once in the Knight data. An additional 29 accounts were mentioned at least once by non-IRA users but did not appear as tweet authors themselves. The combined total of 383 tweeting and/or mentioned IRA accounts represented nearly 14 percent of the 2,752 accounts in the House committee’s file—not an astounding proportion but higher than I expected given that their inclusion was essentially an accident. Because Twitter’s numerical IDs are static, while users can change their screen names at will, I matched IRA users by ID number to extract all relevant tweets. In all, I found 21,848 tweets posted by IRA-associated accounts in the Knight data set. To calculate a liberal estimate of how much data might be missing, I computed the upper bound of the proportion of these users’ tweets that were present in my data out of all the tweets they had posted at the time of data collection. To do so, I divided my IRA tweet count by the sum of all IRA users’ most recent statuses_count fields, each of which displays the total number of tweets posted by a specific user. This calculation revealed that I had collected at most 1.66 percent of all tweets posted by the IRA users that appeared at least once in the Knight data set. The true proportion is almost certainly smaller; since many IRA users did not appear in my data set at the end of 2016, their statuses_count values would have underestimated their true tweet totals. Such a small tweet proportion might not have been a major issue if it constituted a representative sample of IRA activity, but it did not. The

12 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

Knight data set was intended to support a research project on marginalized people’s use of Twitter, and, accordingly, the IRA users it picked up disproportionately mimicked black identities. However, it also attracted conservative-presenting IRA accounts that used hashtags like #BlackLivesMatter and #StayWoke to antagonize real black users. Thus, while the IRA data extracted from the Knight data set were not representative, they did contain some useful variations in terms of the different identities IRA agents adopted. This is a recurring issue when plumbing existing data sets: since they were created with a different purpose in mind, derivative data sets created based on them will likely suffer sampling bias defects of varying degrees of severity. That said, if they are the only means available to obtain data and the bias issues are not fatal given the particular research questions at hand, they may be analyzed productively, with the write-up of the analysis explicitly acknowledging the limitations and discussing their implications. While the Knight data set offered me my first taste of IRA data, I was not satisfied with analyzing what I found by itself. I felt the Knight IRA subset was too skewed to support the high-impact research paper I wanted to write. To build a better data set, I would need to look elsewhere. Method 2: Scrape the Internet Archive

My search for IRA tweets next led me to the Internet Archive,22 which maintains a permanent, public, and partial archive of the World Wide Web, in part to preserve data that would otherwise be lost when websites shut down or content is removed. The archive tends to save the most salient parts of the Web: it typically contains extensive histories for well-known web properties like Google, Amazon, Facebook, and Apple but usually far less, if anything, for most personal webpages. Aside from its better-known status as a repository of deleted internet content, two specific characteristics of the Internet Archive’s daily operations made it a useful source of IRA data. First, the archive relies on multiple automated web crawlers to collect its data. These crawlers generally operate by following hyperlinks: once they detect a given page, they can access all the other pages to which it links.23 The crawling process prioritizes pages using the same logic as Google’s original PageRank algorithm: those that are linked to most frequently tend to be crawled most often.24

13 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

While the correlation between inbound links and number of site visits on the open Web is fairly weak,25 every retweet and reply automatically generates a link on the Twitter platform. Thus, we might expect stronger positive correlations between inbound links and page views specifically within social media spaces, although I am not aware of any studies that have tested this proposition directly. The archive’s second helpful property, closely connected to the first, is that it stores multiple copies of many of its original web sources. Every time a page is crawled, a new version of it is stored as part of that page’s permanent collection. For example, as of July 18, 2018, the archive contained 78,271 versions of Apple.com’s front page.26 (Unfortunately, the archive does not reveal what determines how often a given page is crawled.) Fortunately for me, every additional crawl of a given user’s Twitter page saves twenty of that user’s tweets. In theory, then, the archive should contain at least twenty tweets for every IRA Twitter page it saved at least once (assuming the account posted at least twenty tweets in total). For especially popular and active accounts, I thought it could contain many more. I was encouraged to see that manual archive searches returned at least some data for a few of the best-known IRA accounts. But I had nearly three thousand accounts to check, and I wasn’t about to enter them all manually. A little Google searching turned up an open-source Python module called WayBackPack, which allows users to automatically download the Internet Archive’s complete collection for any web address. I wrote a Linux shell script using WayBackPack to check for any archived content based on the template [twitter.com/ira_username], where ira_username was an actual IRA screen name. The script retrieved collections for ninety-three IRA accounts, nearly half of which (forty-six) did not appear in the Knight data set. WayBackPack delivered its data as raw HTML-based webpages from which the useful data needed to be systematically extracted. The method typically used to do this, known as web scraping, has long been used in media and communication research.27 However, its use has been complicated by the increasing intricacy of contemporary webpages. It is relatively straightforward to scrape data from pages built primarily with HTML, CSS, and a little Javascript—as most were ten to fifteen years ago. Today, most social media platforms use a complex blend of code-optimizing frameworks, server-side content delivery, and HTML5 to generate pages that are much more difficult to scrape due to the intricacy of the interwoven

14 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

technologies involved. In some cases, including Twitter’s, there is a substantial discrepancy between what the browser-rendered webpage shows and the contents of the page’s source code. For studies of social media content, what the browser displays is all that matters, so converting it to a usable format is essential. Yet another obstacle is the fact that the archive saves Twitter pages in different languages, such that the tweet text remains in its original language, but all the boilerplate text around it (denoting, e.g., follower counts, tweet counts, dates, and the like) does not. Thus, for example, I could not extract retweets by simply searching for the word retweet on the page, as it might be written in a language other than English. Somewhat daunted, but nonetheless determined, I set about trying to scrape the data. Initial attempts using the Python web-scraping framework BeautifulSoup yielded some success, but some of the individual-level metadata resisted extraction due to the language issue. One day while poring over one page’s source code, I noticed that some of what I was looking for was encoded in a very long string of JSON code invisible in the browserrendered page.28 This could be a breakthrough if, as I suspected, JSONencoded content retained the same structure across Twitter pages encoded in different languages. I checked another page in a different language, and, sure enough, the second page’s JSON bore the same structure as the first’s. This meant I could scrape all the pages using a single script instead of customizing it by language, which would have been prohibitively difficult. At my earliest opportunity, I applied my scraping script to all ninetythree collections. The initial data set contained 24,885 rows (with each row corresponding to a tweet), more than the Knight data set had yielded. However, upon remembering that the data set might contain duplicates, I quickly removed them. The new data set proved highly redundant, as the deduplicated version contained only 7,243 unique tweets. Fortunately, 98 percent of the Internet Archive tweets were not present in the Knight data set, which meant the former increased my total collection of IRA tweets by roughly a third. With method 2 showing some success, I was energized to do more brainstorming about how to continue growing my data set. Method 3: Get Replies

At this point, I had collected a total of 28,964 IRA tweets. While it was better than nothing, I still felt I needed more to support a rigorous research

15 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

study. The upper-bound analysis I had conducted in method 1 showed that I had, at most, less than 2 percent of tweets by IRA users that were active in the Knight data, which themselves constituted less than 14 percent of all known IRA users. I had reached my wit’s end trying to think of new ways to get more data, so I decided to follow a brief empirical tangent that had occurred to me while working on the preceding two methods. I realized that while the IRA’s tweets may have been removed from Twitter, many replies to them would probably still be present on the site. This is because deleting a tweet does not affect its replies, although it does remove retweets and shares. While it can sometimes be difficult to understand replies without the original tweet,29 analyzing replies to IRA tweets might offer an interesting window into how real people interact with disinformation. For example, a researcher could explore the extent to which replying users treated IRA agents as authentic U.S. citizens, the kinds of people most likely to be deceived, and how often users raised suspicions about their authenticity. The first place I looked for replies to IRA accounts was in the Knight data set, where I found a total of 8,608. However, the sampling bias issue discussed earlier returned to my mind as an objection to studying this data subset by itself. Because the Knight data set was sampled based on hashtags of relevance to black Americans, Asian Americans, and feminists, it would not necessarily include data on the full scope of the IRA’s disinformation campaign. Thus, if my research were to find (for example) that black users were highly susceptible to IRA trickery, there would be no way to know whether that degree of susceptibility was high or low relative to other users. This could result in an inaccurate picture of the IRA’s effects on its target audiences. If I wanted to understand how people reacted to the IRA on Twitter, I would need a far less biased data set. And since replies to deleted tweets are not automatically deleted along with the original tweet, it was theoretically possible to assemble one. Thus, the obstacles standing between my ambition and its fruition were practical as opposed to existential. The data were almost certainly out there, but the limitations restraining Twitter’s APIs were especially unfavorable for what I wanted to do. Generally, Twitter offers four basic methods of retrieving data through its various free APIs. One can do so 1. Prospectively using keywords (Streaming API); 2. Retrospectively within a rolling seven-day window using keywords (Search API);

16 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

3. Retrospectively for individual users (up to 3,200 tweets); and 4. Using data sets of tweet IDs that can be “hydrated,” or reconstituted in their original forms, if the tweet has not been deleted or protected.

Unfortunately, none of these options fit my needs. Everything I wanted was in the distant past, so the Streaming and Search APIs were no help. I did not know in advance which users had replied to IRA accounts, so I could not rely on option 3, and even had I known, its 3,200-tweet maximum probably would not have sufficed for high-volume tweeters.30 I did not have any relevant tweet IDs for option 4, and I knew of no public data sets consisting of IRA replies. Had this been a funded project, I might have explored a fifth option—buying the reply data directly from Twitter, as it allows customers to buy replies to particular users—but it was not, so I could not. Either way, Twitter reserves the right to refuse any purchase request it does not like, so there is no guarantee that the company would have sold the data to me even if I had the money. I was at a standstill about how to proceed for some time until I happened to read a thread on a certain listserv about Twitter data collection. One of the messages in this thread mentioned a Twitter data collection program I had not heard of that could access types of data previously considered unattainable outside of a direct purchase from Twitter. This Python module, called Twint, uses advanced programming techniques to extract metadata from Twitter very quickly and without using any of the platform’s APIs. This allows Twint to circumvent two of the Twitter APIs’ most vexing limitations: first, it is not subject to any time-based data collection limits, so it can run continuously until a given task is complete, and, second, and more importantly for my purposes, it can extract tweets posted at any time, even years in the past. Making Twint perform to my specifications was an involved process that I will discuss in detail in a moment. But before I delve into such logistical details, I want to address the fact that Twint’s methods of data extraction violate Twitter’s TOS. Thus, the decision to use it or not was, if not strictly fitting the definition of ethical, at least possibly fraught with legal implications. As I mentioned earlier, if for no other reason than to keep oneself out of trouble, it is generally a good idea to use TOS-compliant methods when available. Unfortunately, in many cases (including the current one) they are not available, so the question then becomes what to do in those situations.

17 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

I cannot answer that question definitively in these pages, but I can explain how I arrived at my decision in this case. My reasons for using Twint for this project are as follows: • Studying disinformation is in the public interest due to its potential to exert a negative influence on public opinion, election outcomes, and public perceptions of electoral integrity. • Disinformation-related online content is difficult to obtain because many major web platforms remove it upon discovery. • Generally, a web platform’s TOS should not be allowed to dictate what topics researchers can and cannot study. • When appropriate privacy protections are taken, such as refraining from publishing real people’s names and messages, the odds of harm ensuing from such research are minimal. • I am willing to accept the consequences of breaking Twitter’s TOS.

These arguments will not apply in all cases, but I believe they fit my current case well. I feel it is important to have them ready and clearly articulated in case anyone ever asks me to justify my use of Twint. With my decision made, I was ready to start retrieving any replies to IRA accounts I could find. Preliminary tests revealed three problems I needed to overcome. First, Twint sometimes encounters a fatal error before finishing a job. One of the most common errors is triggered by a momentary lapse of internet connectivity, the likelihood of which grows with the duration of the job. Second, Twint sometimes quits quietly and without warning before a job is finished. Jobs can be manually resumed at the point where they left off, but I wanted to avoid the tedium of having to check constantly whether a job was complete. Third, Twint offers no default options to run queries sequentially: when a query finishes, so does the program. If I wanted to pull replies to nearly three thousand accounts, I would need to write additional code that could execute a series of queries with a single command. I immediately realized that it would be difficult to do this using only Python (or any single language). The reason is that when a programming script quits—whether by design or because of an error—the “kernel,” or environment running it, quits as well. It is therefore impossible to detect whether a script has ended, much less restart it, from within the kernel. Fortunately, Linux environments offer a tool called cron that can help solve this

18 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

problem. Cron allows users to schedule jobs to run at designated intervals. These jobs can be anything—a weekly reminder email, a monthly cleanup routine to remove old files, or an hourly backup program, for example. Cron turns out to be very useful in supporting data collection jobs that may terminate unexpectedly. I set it up to run a script (called a cron job) every five minutes that would check whether my main Twint script was still running. If the latter was, the former would simply quit without further action. If not, it would restart the main script to continue the data collection. When the Twint script was completely finished, the cron job would continue to trigger it every five minutes until I returned to stop the process manually, but with no more work to do, the main script would simply exit immediately each time. But I would need more than a simple cron job to make Twint perform as needed. The program allows users to select their desired tweet collection criteria as well as a date range within which tweets matching said criteria will be retrieved (or if the latter is omitted, all matching tweets). Imagine that I start a cron-backed collection task to find all replies to @ crystal1johnson, but Twint stops midexecution due to a brief network outage. Within five minutes, assuming network connectivity has resumed, the cron job restarts Twint—but it starts over from the beginning, not from where it left off. To resume the job properly, I needed a way to figure out automatically where it stopped. So, borrowing some sample code from Stack Overflow, the question-asking site for coders, I wrote a short program to load the last line of any text file quickly. Incorporating this into the main Twint script allowed it to check whether any data had already been retrieved for the current task and, if so, to update the collection job dynamically with the date of the last tweet collected prior to program termination. The Twint job would then restart where it left off, eliminating any redundancy in data collection. As a whole, the modifications I just described turn Twint into a research-grade Twitter data-collection engine, unfettered by rate limits or network hiccups. As a test, I configured it to collect comments for 150 of the most popular IRA accounts going back to 2010 (well before the IRA’s first known U.S.-directed activity). After running continuously for over fourteen hours, it returned a total of 1,188,429 unique replies. Remember, these are replies to IRA Twitter accounts, not the content of said accounts, but nonetheless they should serve to address important research questions.

19 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

This data set not only is much larger than the Knight subset (which contains only 21,848 tweets) but also lacks the drastic sampling bias problems afflicting the latter. Method 4: Find Unexpected Data Dumps

My success at collecting replies to IRA tweets was encouraging, but studying them before the original tweets felt like putting the cart before the horse. After all, it would make more sense to publish an in-depth analysis of whom the IRA was imitating and what the tweets were saying before any analysis of how people reacted to said content. But I was still as short on IRA tweets as I was before my reply-collecting detour. With no clear leads to pursue, I shifted my focus to computational analysis of the IRA tweets I had, in hopes of applying the scripts I created to a more comprehensive data set, should I ever acquire one. As fate would have it, I did not have to wait long. Between February and October 2018, three data sets containing IRA tweets were released to the public, each one higher in quality than the last. On February 14, 2018, three and a half months after the IRA PDF file was published, NBC News published a large data set of IRA tweets on its website. At the bottom of an article by Ben Popken31 sat links to two CSV files: one containing IRA tweets and the other containing user-level metadata for many IRA accounts. This data dump was completely unexpected: NBC News is not usually in the business of providing machine-readable data for its readers, and so far as I know, the company made no advance announcement that it planned on doing so. The article promised that the data set contained “more than 200,000” IRA tweets. Could this be the motherlode I had been hoping for? The initial euphoria wore off quickly as I read the article. The data set, Popken reported, had been sent to NBC News by anonymous sources—an undeniable red flag. Worse, no sampling procedures were reported, so I had no way of knowing how the data were collected or whether anything had been systematically omitted. Third, upon downloading and inspecting the data set, I found that some required fields were blank in some rows—most importantly, the tweet ID field, which I typically use for removing duplicates. A quick calculation revealed that 1.1 percent of the rows were missing their tweet IDs—not an astronomical number but still important to know.

20 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

Fourth, further investigation exposed 407 tweet IDs that appeared more than once, with several showing up six times each. Fixing this last issue was trivial, but it illustrates the paramount importance of always checking one’s data for duplicates. The NBC data set had some considerable flaws, but I was determined to do what I could with it. Some colleagues and I began analyzing it, but in July 2018, Darren Linvill and Patrick Warren, researchers from Clemson University, released a data set of nearly three million IRA tweets they had collected independently using Social Studio, a social media archiving platform.32 In addition to providing a data set substantially larger than the NBC data set, Linvill and Warren categorized the IRA accounts into eight useful types based on their false identities. But the Clemson data set shared some problems with its predecessor as well as having a few unique ones. First, even though almost three million tweets sounds like a large number and it logically represents a greater proportion of the full set of IRA tweets than the NBC data, we still have no way of knowing what that proportion is. Second, as with the NBC data, since the data were harvested using a proprietary social listening platform, we cannot know what biases may have skewed the data collection process. Third, the Clemson data set lacks many of the metadata fields present in the NBC data, including retweet counts, favorite counts, and whether the tweet is a reply. Fourth, a data encoding error appears to have converted all numerical tweet IDs to exponential notation and changed all but the three greatest digits in each ID to zeros (e.g., 9.06000000000e+17). This rendered the process of removing duplicates more complicated and less accurate than it would have been with the correct tweet IDs. Even after the Clemson data set had been released, our team continued to analyze the NBC data set, despite its shortcomings, because we had already invested many hours in doing so. Then, on October 17, 2018, Eszter Hargittai alerted me to the public release of the highest-quality IRA tweet data set yet. This one came directly from Twitter itself and purported to contain all tweets ever posted by all known IRA accounts. This data set proved to be the motherlode I had been hoping for all along: it was complete (or as complete as anyone was likely to get), all metadata fields were intact, it bore the imprimatur of the company that owns the platform, and a quick duplicate check revealed that all the tweets were original. I was most excited about the engagement metadata—the counts

21 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

of retweets, replies, and likes—because, for suspended accounts, they represented the total amount of engagement the accounts would ever receive. (For live accounts, there is always the possibility that more engagements will accrue after data collection.) A higher degree of data quality was scarcely imaginable. One minor obstacle remained before I could transfer my analytical energies from the NBC data set to the official Twitter data set. Twitter had decided to obscure the identities of all IRA users having five thousand or fewer followers at the time their accounts were suspended. This meant that only 4.6 percent of all the IRA accounts represented in the official data set had their screen names revealed. Fortunately, Twitter allows researchers to request access to a special version of the data set with all data unredacted. I applied, was granted access, and began analyzing the data. The first paper based on this data set was accepted for publication in March 2020.33 WHAT I HAVE LEARNED

The process of obtaining IRA data described in this chapter has taught me many lessons, but in the interest of brevity, I will note only the three most important ones. First, the process of finding absent data requires a great deal of time and effort. I implemented my four methods over the course of a year, and the limitations I encountered were quite frustrating. The happy ending to my empirical story was in no way inevitable—my research could have remained in a considerably weaker state. (Indeed, our empirical paper based on the NBC data was rejected by two journals primarily due to issues of data quality.) So I would advise anyone seeking absent data of the arduousness of the task as well as the uncertainty of success. As in all research projects, the potential for scholarly and social advancement should match the degree of effort required. Second, researchers should be aware that some of the methods I have described, especially web scraping, involve nonnegligible risks. Violating a platform’s TOS may jeopardize a researcher’s data access privileges or worse. While I have been unable to identify any clear cases where a researcher has been punished for violating TOS, no one wants to be the first. In my case, I hope that university tenure will protect me against some of the ill consequences that could conceivably manifest, but not all of us are so privileged. With that in mind, caveat researcher.

22 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

Third, as I have discussed elsewhere, we are now in the midst of a postAPI age, in which social media data streams are either growing less accessible or disappearing altogether.34 This means that tomorrow’s social media researchers will be forced to think carefully about the level of risk tolerance they are comfortable with before beginning new projects. At best, platforms will continue to ignore all but the most egregious TOS violations by researchers, although that is hardly an ideal state of affairs. At worst, if platforms decide to get tough on rule breakers, researchers may face the unenviable choice of violating TOS at substantial personal risk to obtain relevant data or abandoning social media entirely as a data source. We as social media researchers should consider how best to avoid the latter possibility, as it poses an existential threat to our work. THE FUTURE OF ABSENT DATA

In this chapter, I have distilled my trial-and-error-laden search for IRA tweets into four hopefully actionable lessons for future seekers of absent data. I expect that most searches for absent data will be similarly haphazard—and will probably identify more avenues for data acquisition than I have discussed here. The availability of data sets containing absent data will differ based on the data source, research topic, amount of time elapsed since the data were created, and TOS of the platform in question, among other factors. This multitude of possibilities suggests that we should get comfortable with outside-the-box thinking as we go about obtaining data that have been removed from their original homes. Indeed, as the NBC, Clemson, and Twitter data sets demonstrate, we should always remember that they may come from unexpected places. As I conclude, I want to draw attention to three types of actors that might increase the prevalence of absent data in the future: content platforms, governments, and individuals. The influence of content platforms is perhaps the most obvious of the three, as it is the relevant one for this chapter’s empirical case. Twitter made a business decision to remove the IRA tweets from public view even as it cooperated behind the scenes with the U.S. government, which released the organization’s screen names but not its tweets. Twitter eventually decided to release a partially redacted data set containing all IRA tweets and associated metadata, but it was not obligated to do so. To date, Facebook has publicly released only a few examples of

23 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

the IRA content posted to its platform, although at least one research team has received private access to more of that data.35 Researchers will increasingly find themselves at the mercy of such binding decisions, especially as platforms grow increasingly sensitive to bad public relations in the post– Cambridge Analytica age.36 When the hosting platform decides to remove content, there is little researchers can do but search elsewhere. Content platforms sometimes take more proactive approaches to data provision. Similarly to Twitter’s publication of its IRA tweets, Facebook recently decided to create a free, publicly accessible archive of all the paid political advertisements it hosts.37 It created this archive in response to the same revelations of surreptitious Russian political meddling that Twitter discovered. The company committed a commendable act of transparency by opening all political ads—including the suspicious ones—to public scrutiny without requiring special access permissions. I hope that Facebook and other platforms continue to make other public interest data sets available to the research community. Governments have also been known to make data go missing. This is not a common phenomenon in the United States, but it occurs daily in authoritarian regimes like China, Iran, and Turkey.38 These countries’ governments have a number of digital tools at their disposal to suppress dissent and other sentiments deemed contrary to national values. When a government licenses or directly controls online platforms, it can simply demand to have content deleted. When that is not possible, as with global social media networks like Facebook and Twitter, governments can block offending content using an array of automated and human-powered techniques.39 While deletion universally eliminates access to the targeted content, blocking does so only within a given country, and sophisticated users can run proxies and other tools to evade the latter. A third way that regimes suppress unauthorized content is by punishing those who post it, whether through termination of employment, imprisonment, torture, execution, or other means.40 This creates a chilling effect for would-be dissidents thinking of expressing themselves publicly, which is a way of rendering data absent preemptively. Finally, in some jurisdictions, individuals are gaining the power to demand the removal of content either about or created by themselves. For example, Article 17 of the European Union’s General Data Protection Regulation (GDPR) specifies the conditions under which EU citizens retain “the right to obtain from [content hosts] the erasure of personal data concerning

24 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

him or her.”41 This so-called right to erasure privileges individuals’ interests in controlling data relevant to themselves over the right of the public at large to access such data. While Article 17 includes exemptions “for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes,”42 it does not define or give examples of these. Since the GDPR is interpreted and enforced by the regulatory entities responsible for data protection in individual EU countries, there is no guarantee of any EU-wide consensus on how researchers will and will not be able to study user-generated content. Thus, whether by corporate, government, or individual decree, researchers’ access to online data grows ever more tenuous. We should avoid making assumptions about how long we may be able to collect the data we want; prudent researchers are advised to do so as quickly as possible. However, as the case of the IRA on Twitter shows, sometimes data disappear before we come to see them as worth studying. The methods I detail in these pages demonstrate that all is not lost when a data set is removed from its primary posting location. I hope they inspire readers to think boldly, creatively, and, above all, ethically about how to fill in the gaps of absent data.

NOTES 1. These methods may or may not be prohibited by the terms of service of the platform hosting the content. 2. I chose this term to avoid confusion with missing data, which refers to individual data points within a larger data set that are not available. 3. L. Meeks, “Tweeted, Deleted: Theoretical, Methodological, and Ethical Considerations for Examining Politicians’ Deleted Tweets,” Information, Communication and Society 21, no. 1 (2018): 1–13, https://doi.org/10.1080/1369118X.2016.1257041. 4. A. Bruns and K. Weller, “Twitter as a First Draft of the Present: And the Challenges of Preserving It for the Future,” in WebSci ’16: Proceedings of the 8th ACM Conference on Web Science, ed. S. Staab and P. Parigi (New York: Association for Computing Machinery, 2016), 183–189, https://doi.org/10.1145/2908131.2908174. 5. S. Chancellor, Z. (Jerry) Lin, and M. De Choudhury, “ ‘This Post Will Just Get Taken Down’: Characterizing Removed Pro-eating Disorder Social Media Content,” in CHI ’16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (New York: Association for Computing Machinery, 2016), 1157–1162, https://doi .org/10.1145/2858036.2858248. 6. H. Almuhimedi et al., “Tweets Are Forever: A Large-Scale Quantitative Analysis of Deleted Tweets,” in CSCW ’13: Proceedings of the 2013 Conference on Computer Supported Cooperative Work (New York: Association for Computing Machinery, 2013), 897–908, https://doi.org/10.1145/2441776.2441878; C. Fiesler and N. Proferes, “ ‘Participant’ Perceptions of Twitter Research Ethics,” Social Media + Society 4, no. 1

25 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

(2018): 1–14, https://doi.org/10.1177/2056305118763366; M. Henderson, N. F. Johnson, and G. Auld, “Silences of Ethical Practice: Dilemmas for Researchers Using Social Media,” Educational Research and Evaluation 19, no. 6 (2013): 546–560, https://doi .org/10.1080/13803611.2013.805656; M. Zimmer, “The Twitter Archive at the Library of Congress: Challenges for Information Practice and Information Policy,” First Monday 20, no. 7 (2015). 7. B. Popken, “Russian Trolls Pushed Graphic, Racist Tweets to American Voters,” NBC News, November 30, 2017, https://www.nbcnews.com/tech/social-media/russian -trolls-pushed-graphic-racist-tweets-american-voters-n823001; P. Rusyaeva and A. Zakharov, “Investigation of RBC: How the ‘Troll Factory’ Worked in the US Elections,” RBC Magazine 11, no. 135 (October 17, 2017). 8. United States of America v. Internet Research Agency, No. 1:18-cr-00032 (D.D.C. filed February 16, 2018). 9. United States of America v. Internet Research Agency, 6. 10. Popken, “Russian Trolls.” 11. United States of America v. Internet Research Agency. 12. R. DiResta et al., The Tactics and Tropes of the Internet Research Agency (Austin, TX: New Knowledge, 2018); P. N. Howard et al., The IRA, Social Media and Political Polarization in the United States, 2012–2018 (Oxford: University of Oxford, 2018). 13. M. Seddon, “Documents Show How Russia’s Troll Army Hit America,” BuzzFeed, June 2, 2014, https://www.buzzfeed.com/maxseddon/documents-show-how-russias -troll-army-hit-america. 14. Popken, “Russian Trolls”; Rusyaeva and Zakharov, “Investigation of RBC”; C. D. Leonnig, T. Hamburger, and R. S. Helderman, “Russian Firm Tied to Pro-Kremlin Propaganda Advertised on Facebook During Election,” Washington Post, September 6, 2017; L. O’Brien, “Twitter Ignored This Russia-Controlled Account During the Election. Team Trump Did Not,” Huffington Post, November 1, 2017; D. O’Sullivan and D. Byers, “Exclusive: Fake Black Activist Social Media Accounts Linked to Russian Government,” CNNMoney, September 28, 2017, https://money .cnn.com/2017/09/28/media/blacktivist-russia-facebook-twitter/index.html. 15. T. P. Gerber and J. Zavisca, “Does Russian Propaganda Work?,” Washington Quarterly 39, no. 2 (2016): 79–98, https://doi.org/10.1080/0163660X.2016.1204398; S. Kalathil and T. C. Boas, “The Internet and State Control in Authoritarian Regimes: China, Cuba and the Counterrevolution,” First Monday 6, no. 8 (2001); I. Khaldarova and M. Pantti, “Fake News,” Journalism Practice 10, no. 7 (2016): 891–901, https://doi .org/10.1080/17512786.2016.1163237. 16. B. Cronin and H. Crawford, “Information Warfare: Its Application in Military and Civilian Contexts,” Information Society 15, no. 4 (1999): 257–263, https://doi .org/10.1080/019722499128420. 17. G. Corera, “Russia ‘Will Target US Mid-Term Election,’ ” BBC News, January 29, 2018, https://www.bbc.com/news/world-us-canada-42864372. 18. T. Mak, “Technologies to Create Fake Audio and Video Are Quickly Evolving,” NPR, April 2, 2018, https://www.npr.org/2018/04/02/598916380/technologies-to-create-fake -audio-and-video-are-quickly-evolving; K. Roose, “Here Come the Fake Videos, Too,” New York Times, June 8, 2018. 19. D. Freelon et al., How Black Twitter and Other Social Media Communities Interact with Mainstream News (Miami: John S. and James L. Knight Foundation, 2018). 20. O’Sullivan and Byers, “Exclusive.”

26 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

21. Rusyaeva and Zakharov, “Investigation of RBC.” 22. For an early history, see M. Kimpton and J. Ubois, “Year-by-Year: From an Archive of the Internet to an Archive on the Internet,” in Web Archiving, ed. Julien Masanès (Berlin: Springer, 2006), 201–212. 23. A major exception to this is the Alexa service, which provided most of the archive’s pre-2008 content. Alexa provides a browser plug-in that can archive any webpage the user views while using it. 24. S. Brin and L. Page, “The Anatomy of a Large-Scale Hypertextual Web Search Engine,” Computer Networks and ISDN Systems 30, no. 1–7 (1998): 107–117. 25. M. Hindman, The Myth of Digital Democracy (Princeton, NJ: Princeton University Press, 2010). 26. The current count can be viewed at https://web.archive.org/web/*/apple.com. 27. K. A. Foot and S. M. Schneider, Web Campaigning (Cambridge, MA: MIT Press, 2006); T. Quandt, “(No) News on the World Wide Web?,” Journalism Studies 9, no. 5 (2008): 717–738, https://doi.org/10.1080/14616700802207664; M. Xenos, “New Mediated Deliberation: Blog and Press Coverage of the Alito Nomination,” Journal of Computer-Mediated Communication 13, no. 2 (2008): 485–503, https://doi .org/10.1111/j.1083-6101.2008.00406.x. 28. JSON (Javascript Object Notation) is a platform-independent data format that allows for the storage of complex data structures using key-value pairs. 29. This property does not apply on Facebook, where deleting a top-level post also deletes any replies to it. 30. I could have used the usernames I found in the Knight data set, but that would have reproduced the same bias issues I discussed in the preceding paragraph. 31. B. Popken, “Twitter deleted 200,000 Russian troll tweets. Read them here.” NBC News, February 14, 2018, https://www.nbcnews.com/tech/social-media/now-available-more -200-000-deleted-russian-troll-tweets-n844731. 32. D. Linvill and P. L. Warren, “Troll Factories: The Internet Research Agency and State-Sponsored Agenda Building” (paper, Clemson University, Clemson, SC, 2018). 33. D. Freelon et al., “Black Trolls Matter: Racial and Ideological Asymmetries in Social Media Disinformation,” Social Science Computer Review (in press). https://journals .sagepub.com/doi/full/10.1177/0894439320914853. 34. D. Freelon, “Computational Research in the Post-API Age,” Political Communication 35, no. 4 (2018): 665–668, https://doi.org/10.1080/10584609.2018.1477506. 35. Howard et al., “The IRA.” 36. A. Hern and J. Waterson, “Facebook in ‘PR Crisis Mode’ Over Cambridge Analytica Scandal,” The Guardian, April 24, 2018. 37. Available at https://www.facebook.com/politicalcontentads/. 38. G. King, J. Pan, and M. E. Roberts, “How Censorship in China Allows Government Criticism but Silences Collective Expression,” American Political Science Review 107, no. 2 (2013): 326–343. 39. R. Deibert et al., eds., Access Denied: The Practice and Policy of Global Internet Filtering (Cambridge, MA: MIT Press, 2008). 40. Reporters Without Borders, “Violations of Press Freedom Barometer,” 2020, https:// rsf.org/en/barometer?year=2020.

27 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

41. General Data Protection Regulation (GDPR), Regulation 2016/679, Official Journal of the European Union L 119 (vol. 59), May 4, 2016. 42. General Data Protection Regulation.

REFERENCES Almuhimedi, H., S. Wilson, B. Liu, N. Sadeh, and A. Acquisti. “Tweets Are Forever: A Large-Scale Quantitative Analysis of Deleted Tweets.” In CSCW ’13: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, 897–908. New York: Association for Computing Machinery, 2013. https//doi.org/10.1145/2441776 .2441878. Brin, S., and L. Page. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.” Computer Networks and ISDN Systems 30, no. 1–7 (1998): 107–117. Bruns, A., and K. Weller. “Twitter as a First Draft of the Present: And the Challenges of Preserving It for the Future.” In WebSci ’16: Proceedings of the 8th ACM Conference on Web Science, ed. S. Staab and P. Parigi, 183–189. New York: Association for Computing Machinery, 2016. https://doi.org/10.1145/2908131.2908174. Chancellor, S., Z. (Jerry) Lin, and M. De Choudhury. “ ‘This Post Will Just Get Taken Down’: Characterizing Removed Pro-eating Disorder Social Media Content.” In CHI ’16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, 1157–1162. New York: Association for Computing Machinery, 2016. https://doi .org/10.1145/2858036.2858248. Corera, G. “Russia ‘Will Target US Mid-Term Election.’ ” BBC News, January 29, 2018. https://www.bbc.com/news/world-us-canada-42864372. Cronin, B., and H. Crawford. “Information Warfare: Its Application in Military and Civilian Contexts.” Information Society 15, no. 4 (1999): 257–263. https://doi.org /10.1080/019722499128420. Deibert, R., J. Palfrey, R. Rohozinski, and J. Zittrain, eds. Access Denied: The Practice and Policy of Global Internet Filtering. Cambridge, MA: MIT Press, 2008. DiResta, R., K. Shaffer, B. Ruppel, D. Sullivan, R. Matney, R. Fox, J. Albright, and B. Johnson. The Tactics and Tropes of the Internet Research Agency. Austin, TX: New Knowledge, 2018. Fiesler, C., and N. Proferes. “ ‘Participant’ Perceptions of Twitter Research Ethics.” Social Media + Society 4, no. 1 (2018). https://doi.org/10.1177/2056305118763366. Foot, K. A., and S. M. Schneider. Web Campaigning. Cambridge, MA: MIT Press, 2006. Freelon, D. “Computational Research in the Post-API Age.” Political Communication 35, no. 4 (2018): 665–668. https://doi.org/10.1080/10584609.2018.1477506. Freelon, D., L. Lopez, M. D. Clark, and S. J. Jackson. How Black Twitter and Other Social Media Communities Interact with Mainstream News. Miami: John S. and James L. Knight Foundation, 2018. General Data Protection Regulation (GDPR). Regulation 2016/679, Official Journal of the European Union L 119 (vol. 59), May 4, 2016. Gerber, T. P., and J. Zavisca. “Does Russian Propaganda Work?” Washington Quarterly 39, no. 2 (2016): 79–98. https://doi.org/10.1080/0163660X.2016.1204398.

28 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

Henderson, M., N. F. Johnson, and G. Auld. “Silences of Ethical Practice: Dilemmas for Researchers Using Social Media.” Educational Research and Evaluation 19, no. 6 (2013): 546–560. https://doi.org/10.1080/13803611.2013.805656. Hern, A., and J. Waterson. “Facebook in ‘PR Crisis Mode’ Over Cambridge Analytica Scandal.” The Guardian, April 24, 2018. Hindman, M. The Myth of Digital Democracy. Princeton, NJ: Princeton University Press, 2010. Howard, P. N., B. Ganesh, D. Liotsiou, J. Kelly, and C. Francois. The IRA, Social Media and Political Polarization in the United States, 2012–2018. Oxford: University of Oxford, 2018. Kalathil, S., and T. C. Boas. “The Internet and State Control in Authoritarian Regimes: China, Cuba and the Counterrevolution.” First Monday 6, no. 8 (2001). Khaldarova, I., and M. Pantti. “Fake News.” Journalism Practice 10, no. 7 (2016): 891–901. https://doi.org/10.1080/17512786.2016.1163237. Kimpton, M., and J. Ubois. “Year-by-Year: From an Archive of the Internet to an Archive on the Internet.” In Web Archiving, ed. Julien Masanès, 201–212. Berlin: Springer, 2006. King, G., J. Pan, and M. E. Roberts. “How Censorship in China Allows Government Criticism but Silences Collective Expression.” American Political Science Review 107, no. 2 (2013): 326–343. Leonnig, C. D., T. Hamburger, and R. S. Helderman. “Russian Firm Tied to Pro-Kremlin Propaganda Advertised on Facebook During Election.” Washington Post, September 6, 2017. Linvill, D., and P. L. Warren. “Troll Factories: The Internet Research Agency and StateSponsored Agenda Building.” Paper, Clemson University, Clemson, SC, 2018. Mak, T. “Technologies to Create Fake Audio and Video Are Quickly Evolving.” NPR, April 2, 2018. https://www.npr.org/2018/04/02/598916380/technologies-to-create -fake-audio-and-video-are-quickly-evolving. Meeks, L. “Tweeted, Deleted: Theoretical, Methodological, and Ethical Considerations for Examining Politicians’ Deleted Tweets.” Information, Communication, and Society 21, no. 1 (2018): 1–13. https://doi.org/10.1080/1369118X.2016.1257041. O’Brien, L. “Twitter Ignored This Russia-Controlled Account During the Election. Team Trump Did Not.” Huffington Post, November 1, 2017. O’Sullivan, D., and D. Byers. “Exclusive: Fake Black Activist Social Media Accounts Linked to Russian Government.” CNN Money, September 28, 2017. https://money .cnn.com/2017/09/28/media/blacktivist-russia-facebook-twitter/index.html. Popken, B. “Russian Trolls Pushed Graphic, Racist Tweets to American Voters.” NBC News, November 30, 2017. https://www.nbcnews.com/tech/social-media/russian -trolls-pushed-graphic-racist-tweets-american-voters-n823001. Popken, B. “Twitter Deleted Russian Troll Tweets. So We Published More Than 200,000 of Them.” NBC News, February 14, 2018. https://www.nbcnews.com/tech /social-media/now-available-more-200-000-deleted-russian-troll-tweets-n844731. Quandt, T. “(No) News on the World Wide Web?” Journalism Studies 9, no. 5 (2008): 717–738. https://doi.org/10.1080/14616700802207664. Reporters Without Borders. “Violations of Press Freedom Barometer.” 2019. https://rsf .org/en/barometer?year=2019. Roose, K. “Here Come the Fake Videos, Too.” New York Times, June 8, 2018.

29 W H E N S O C I A L M E D I A D ATA D I S A P P E A R

Rusyaeva, P., and A. Zakharov. “Investigation of RBC: How the ‘Troll Factory’ Worked in the US Elections.” RBC Magazine 11, no. 135 (October 17, 2017). Seddon, M. “Documents Show How Russia’s Troll Army Hit America.” BuzzFeed, June 2, 2014. https://www.buzzfeed.com/maxseddon/documents-show-how-russias -troll-army-hit-america. United States of America v. Internet Research Agency, No. 1:18-cr-00032 (D.D.C. filed February 16, 2018). Xenos, M. “New Mediated Deliberation: Blog and Press Coverage of the Alito Nomination.” Journal of Computer-Mediated Communication 13, no. 2 (2008): 485–503. https://doi.org/10.1111/j.1083-6101.2008.00406.x. Zimmer, M. “The Twitter Archive at the Library of Congress: Challenges for Information Practice and Information Policy.” First Monday 20, no. 7 (2015).

Chapter Two

THE NEEDLE IN THE HAYSTACK Finding Social Bots on Twitter TOBIAS R. KELLER AND ULRIKE KLINGER

INTRODUCTION

Automation has become part of the agenda of political communication research,1 and with it has come a new type of actor: social bots. Social bots are “social media accounts controlled completely or in part by computer algorithms.”2 A key feature is that they mimic humans and human behavior. In recent election and referendum campaigns, both the public and social scientists have become aware of social bots.3 Bots were active in the 2016 U.S. presidential election,4 they participated in the Brexit debate (most notably on the “leave” side),5 they pushed the #macronleaks campaign before the French presidential election in 2017,6 and they impacted a variety of debates since then.7 Bots’ presence on social media platforms is challenging for social scientists, who are interested in the formation of public opinion. Once automation and automated actors join the game, it becomes increasingly unfeasible to analyze the reach and popularity of specific actors or content by counting and comparing popularity cues such as likes, shares, comments, or retweets. In the project described in this chapter, we tackled this new challenge, outlining theoretical implications and possible consequences for society. More specifically, we wondered whether social bots on Twitter posed a threat during Germany’s 2017 federal election. Spoiler alert: they did not.8

31 T H E N E E D L E I N T H E H AY S TA C K

Social bots are not a new phenomenon. How computer programs can effectively pass as humans—and potentially manipulate others in good and bad ways—is a question that computer scientists have been discussing since the early days of computers. The well-known Turing Test is the prime example of this research strand.9 In the 1950s, Alan Turing proposed this scenario: Person C chats (text only) with two others, one of which is a human being and the other a machine. If Person C cannot decide which of the other two is the human, the machine passes the Turing Test.10 The first chatterbot was Joseph Weizenbaum’s ELIZA, which chatted in the way a Rogerian psychiatrist would: repeating what the other said and encouraging them to open up.11 This technique would not have passed the Turing Test because interrogators would not have opened up; yet it still led psychiatrists to fear that such programs could replace them altogether.12 While early social bots were restricted to talking to only a few people, today’s social bots can spread their messages to thousands of people on social media platforms such as Twitter and Facebook. These platforms, along with dating sites and other forums of online human interaction, proved to be a fertile habitat for bots. Users encounter other users with their screen names and self-selected profile pictures, and it is nearly impossible to tell humans and bots apart from a mere gut feeling. GitHub, the largest online code repository, offers various tutorials on how to create and deploy social bots on Twitter.13 Creating bots does not require refined programming skills, and they are inexpensive to buy. Social bots are neither necessarily malicious nor very sophisticated, and there is always a human behind their automated behavior. Simple bots can tweet a quote by a famous person once a day, publish links to a news site, or create nonsensical stories. Such social bots conduct simple automated tasks and might even self-describe as bots.14 More sophisticated and potentially malicious social bots that seek to influence discussions and spread propaganda or fake news are also active on Twitter.15 They might differ from simple social bots in their degree of autonomous agency and are sometimes also referred to as cyborgs or sockpuppets. Being scholars in the field of political communication, we were interested in the potential impact of bots in election campaigns, such as the then upcoming 2017 federal election in Germany.16 In the beginning, we were both excited and cautious about this undertaking. We were, after all, not computer scientists, social bots are programmed to pass as humans, and

32 T H E N E E D L E I N T H E H AY S TA C K

German parties had pledged not to use bots during the election campaign, so why even try when there were plenty of other research projects on our desks? The headwind was strong, but our curiosity was even stronger. DATA COLLECTION: DOWNLOADING THE HAYSTACK

If you are trying to find the needle in the haystack, you have to start by gaining access to the haystack, which in this case meant downloading social media content. Having said that, access to social media data is a problem and has become increasingly difficult since we conducted our study on social bots in 2017. In the aftermath of the Cambridge Analytica scandal of 2018, many platforms have closed their gateways (application programming interfaces, or APIs) or allow only limited access to their data.17 This is all the more troublesome because Cambridge Analytica accessed private user data by offering an app via Facebook, whereas public platform APIs deliver data only from public profiles. API-based data access tools created and used by social scientists, such as Netvizz (a program for data extraction from Facebook),18 may become unavailable.19 Fortunately, we were able to buy the Twitter data we needed. In various projects, we have obtained data from data broker companies as well as directly from Twitter. If there are several waves of data collection and analysis in a project, it is advisable to retrieve data from the same source in each wave. This sounds easier than it is in reality because there is a lot of fluctuation in the market of data brokers and companies that offer social media data come and go. In our case, the British social media company Birdsong Analytics closed down shortly after the end of our project on the German federal election of 2017. The choice of a data broker should not be driven only by pricing; rather, it is also important to check the quality and completeness of the delivered data sets. In this project, we chose to work with Birdsong Analytica because the data sets they sent were complete (as far as one can tell from outside of Twitter) and perfectly structured, meaning they did not require additional data cleaning and sorting, something we really appreciated. We considered two possible approaches when studying bots: (1) starting with Twitter accounts to look at how many followers of an account are bots and (2) beginning with keywords or hashtags to see how many bots participate in the debate about a topic. Choosing an approach has consequences

33 T H E N E E D L E I N T H E H AY S TA C K

because different types of bots will emerge depending on the path taken. For our study on social bots in the 2017 German federal election, we opted for the first approach because previous studies had primarily focused on hashtags.20 Another reason was that we wanted to compare the number of bots in February and September (the month of the election). Obviously, it is hard to forecast seven months prior to voting day which keywords and hashtags will evolve during the election campaign, so we found it more feasible to focus on social bots among the political parties’ Twitter followers. We bought data sets of all Twitter accounts that followed the main German political parties—those that were represented in the preelection parliament (CDU, CSU, Green Party, LINKE, and SPD) as well as two parties that were highly likely to enter parliament in the upcoming election (AfD and FDP), which they did. This amounted to over one million follower accounts, including information about the followers such as their screen name, their last tweet, their own followers and likes, and whether they had been active—that is, liking, retweeting, replying, or tweeting—at least once in the past three months. With this approach, the data sets contained active as well as inactive accounts, and each account was labeled accordingly in the metadata. This meant that two types of bots were included in the data: bots that actively send out messages, like, share, and comment and inactive bots that do nothing but increase the number of followers for an account by simply hanging around and making the account appear more popular than it actually is. A strategy that we employed in a subsequent project followed the other path of identifying social bots.21 We wanted to ascertain whether bots pushed a specific topic or debate so we downloaded all Twitter accounts that tweeted a certain keyword or hashtag—for example, all accounts that contributed to #freepress. At the time this chapter was written, Twitter offered a pricey premium API that allowed one to download this sort of data, including historical data that is older than thirty days. These data sets obviously do not contain any inactive bots whose only function is to inflate a party’s or candidate’s follower number. Collecting tweets from a keyword or hashtag includes accounts (and bots) that are not connected to a political party—and that remain under the radar in the approach that collects political parties’ Twitter followers. Comparability must also be considered during data collection. The key question when interpreting the results will be: How many bots should be

34 T H E N E E D L E I N T H E H AY S TA C K

considered as many? For instance, would 5 percent bots be a normal or a high number? What about 15 percent? In his science fiction novel The Hitchhiker’s Guide to the Universe, Douglas Adams introduced the famous 42 theorem: humankind had asked a supercomputer to calculate the answer to “life, the universe and everything,” and the answer given was 42. Unfortunately, no one knew how to decipher this answer, although many have tried.22 The problem here is similar: How can we possibly interpret the results and make a judgment on whether we have a low, medium, or high level of bot activity and bot presence? In lieu of existing comparative studies, we opted for a longitudinal comparison of two waves of data—one long before the election campaigns during a normal time of routine communication (February 2017, 1.2 million accounts) and one directly before the election (September 2017, 1.6 million accounts). This way we could see if bot presence and bot activity increased during the election campaign. Before proceeding to the exciting part of how to detect social bots in the haystack of our data sets, there is one more highly relevant aspect of data collection to consider: research ethics. Although we had access only to public accounts and their metadata and obtained the data sets in a legal and legitimate way, it is obvious that ethical questions nonetheless apply. None of the account owners gave their explicit informed consent to being part of this research project (however, they did accept the Twitter terms of service). The data sets must be handled with the utmost care and confidentiality, and they must be safely stored on reliable servers according to local data protection guidelines. In some cases, academic journals require authors to submit the data set for review and publish it together with the article. For this reason, we pseudonymized the published data set. This respects and protects the privacy of the account owners in our data set, but it also impedes replication. HOW TO FIND BOTS: CHOOSING FROM A MYRIAD OF BOT DETECTION APPROACHES

Regardless of data collection strategy, the next step is detecting social bots. Various bot detection approaches exist. Some scholars use only a single indicator—for example, searching for well-known bot farms in the URLs of Twitter accounts.23 Because bot accounts tend to follow each other, this method can unveil bot networks by snowball sampling from a small

35 T H E N E E D L E I N T H E H AY S TA C K

number of bots identified from their URL. Others, such as scholars at the Oxford Internet Institute, focus on the activity of accounts and assume high or heavy automation when an account sends or shares more than fifty messages per day.24 Another single behavioral indicator is used by Schäfer and colleagues,25 who focus on near-duplicate detection of bots—that is, accounts that are sending and retweeting almost identical messages. In addition, there are tools based on machine learning, such as Tweetbotornot26 and the more prominent Botometer,27 known as Bot Or Not before 2017. These tools are based not on single indicators of known bot characteristics or bot behavior but on a combination of hundreds of indicators. Their advantage is that they include and weigh various aspects of bot features and activities—thus being able to detect a variety of different types of social bots. But a major disadvantage is that their classifications remain a black box, at least to some extent. On the one hand, a proprietary code hinders bot creators’ efforts to learn and easily circumvent detection by Botometer. On the other hand, this code inhibits full reproducibility of the botdetection algorithm (which could, for example, help to improve the code). In our study on the German federal election, we used Botometer for bot detection because it is a well-documented, publicly available, and free tool created and maintained by computer scientists at the University of Indiana. Its creators have published peer-reviewed articles in scientific journals about it, convincingly describing the tool and explaining how to use it.28 This gave us the confidence to conduct our own study with it. Botometer has been used in influential studies—showing, for example, that disinformation travels “farther, faster, and deeper” in social networks29 and that humans are to blame for this rather than bots. Botometer is also a tool that can be used by a broader public, thanks to a simplified online version: anyone can manually check individual Twitter accounts on the Botometer website, botometer.iuni.iu.edu. Late in 2018, Botometer received 250,000 requests per day.30 But how does Botometer work? Although it sounds a bit like magic in the Botometer FAQ (“Botometer is a machine learning algorithm trained to classify an account as bot or human based on tens of thousands of labeled examples”), it is not. In simpler terms, the creators of Botometer used a set of Twitter accounts that they knew were bots. They trained an algorithm (a calculation, in simpler terms) to find what these accounts had in common and what their specific characteristics were in terms of a variety

36 T H E N E E D L E I N T H E H AY S TA C K

of features.31 These features can be grouped into six categories: network features such as the number of connections with other Twitter users, user features such as account creation time, friends features such as descriptive statistics (for example, the median) of the number of followers, temporal features such as timing patterns of tweeting, content features such as the length of a tweet, and sentiment features such as emotions from emoticons [for example, :-)].32 By training the algorithm on bot features (and the combinations in which they tend to occur), it can assess whether another account has features similar to already identified bots and therefore has a high probability of also being a bot. This also means that Botometer can be only as good as the training data that are available. Botometer provides a website where everyone with a Twitter login can manually test Twitter accounts with the tool.33 While we could have simply used the website, it would have taken a lot of clicks to conduct a manual analysis of a few million Twitter accounts. Fortunately, Botometer also provides a Python API and a helpful tutorial with which we could conduct the analysis of accounts automatically. At least that was how we thought the project would proceed. AUTOMATED ANALYSIS? RESTARTING SEVEN COMPUTERS A ZILLION TIMES

In order to conduct an automated analysis of Twitter accounts, we needed a Python script that rendered a neat Excel or comma-separated value table with all the results of the Botometer analysis for each Twitter account. Such a script would allow us to determine how long Botometer would need to analyze a small sample of accounts, which would, in turn, let us conclude how many accounts we could analyze. As mentioned earlier, we are not computer scientists or software engineers. With some basic knowledge of programming, we were able to write a Python script that did indeed return the results from the accounts. Unfortunately, it looked as if someone had thrown the numbers chaotically into random cells of the Excel spreadsheet: the results of each case were distributed among several lines, which made it hard to analyze further. With a lot of trial and error and a little help from supportive colleagues, we overcame the initial challenges and got a clean output of the script. We tested the script with different quantities of accounts to see if it worked and how long

37 T H E N E E D L E I N T H E H AY S TA C K

the analysis would take. The script ran flawlessly for five hundred, then two thousand, and then even five thousand accounts. We were excited and estimated that with seven computers running simultaneously, we could automate the testing of over one million accounts in four weeks. The IT department delivered the machines. We were prepared. Little did we know what lay ahead. We had the data. We had the script. We had the computers. We did not want to lose any time because social media data are quite dynamic, while our data sets were not: followers set their accounts to private, and some people remove all their tweets or even delete their accounts, all of which would prevent Botometer from effectively analyzing the accounts in our data sets. The idea was to start the computers and let them do the job, checking them roughly twice a day while we tended to other projects. We were quite pleased with having managed this setup. After twenty-four hours, we checked on the machines and noted that four of the seven computers had returned an error in the automated analysis: “502 server error: Bad gateway.” This meant that the results of the analysis of eighty thousand accounts were lost and that they had to be checked again. We restarted the script and waited for five minutes. It seemed to work perfectly. We checked again after thirty minutes and were confronted with a different error: “500 internal server error.” We restarted the scripts again. But different errors kept popping up. Time passed, the data grew older, and the anxiety levels of the team members increased. The next idea was to run the analysis with batches of five thousand accounts in order to ensure an error-free analysis in a shorter time period. This worked— sometimes. But it also meant that we had to restart the script multiple times per day. This was not how we had imagined automated analysis, and it was certainly not how we wanted this study to proceed. This would mean that we would have to split our data set of over a million profiles into groups of five thousand, read in the batches in the Python script, start the analysis, possibly restart the analysis several times per day, and merge the results in the end. This would have taken more manual labor and time resources than we had planned or had available. We decided to ask the people who had to know what we were doing wrong and approached the computer scientists who created and still maintain Botometer. Clayton A. Davis wrote back immediately and was very helpful. Working on our concerns in his spare time and for free, he checked

38 T H E N E E D L E I N T H E H AY S TA C K

the servers and looked deep into the code of Botometer. It seemed as if the errors could not be resolved quickly. After several emails back and forth, Clayton asked for our Python script, which he planned to improve. And he did! Comparing our script with his improved version clearly showed that he writes code on a very different quality level than we had. He included simple but very useful features in the script: now previous results would not be lost once an error occurred, and the program would check if an account had already been analyzed, which meant that we did not have to exclude duplicate accounts manually from the analysis anymore. Despite our hopes that the seven computers would automatically run the analysis while we worked on other projects (or enjoyed some novel free time), we ended up keeping guard over seven computers and pressing CTRL+A and CTRL+R (that is, selecting all lines in the script and then restarting it) multiple times per day. On bad days, we restarted the computers every fifteen minutes, on good days never. We wondered (only halfjokingly) whether this would still count as an automated analysis—after all, it was heavily supervised. We also installed remote desktop access tools on all computers so we at least avoided having to patrol physically all the rooms where we had set up the computers. (While writing this chapter, we still wonder why we did not simply set up virtual machines that are accessible via one’s own computer—or run Python in multiple instances on one computer. Live and learn.) The month passed, and with a delay of a few weeks, we had analyzed all follower accounts of all main German parties. Although it had been exhausting, we learned a lot during the process. Our “first contact” with the computer scientists who created the tool was extremely positive. In addition, websites like github.com and stackoverflow.com, where people can ask for help and receive free advice, were crucial. We consulted Botometer’s GitHub page several times, shared our experience, and received helpful comments and answers. While many may be struggling with similar issues, the solution might be the same for all. It also helps the creators of the tools to have all problems in one place and see how many people have the same issues, ultimately improving the tool. A final lesson was that an automated analysis can require a lot of time and manual labor until the script finally runs the way it should. Similar to creating a codebook for manual content analysis, the process to set up the automated analysis takes time. Additionally, even automated analysis requires manual checks. In our case,

39 T H E N E E D L E I N T H E H AY S TA C K

we needed to restart the script and check whether all accounts had been analyzed in the end. Thus, yes, the analysis ran automatically, but hands-on work was also needed. WHEN IS A BOT A BOT? THE TROUBLE OF SETTING A THRESHOLD BETWEEN BOTS AND HUMANS

In our project on the German election, Botometer returned three types of errors for about half of the accounts in our data sets: some of them could not be analyzed (1) because the accounts’ privacy settings prevented access to information (if Botometer cannot access sufficient metadata, it cannot calculate a score) or (2) because the accounts had been deleted. The largest portion of errors occurred (3) because the accounts’ timelines were empty. If there are no tweets at all in an account’s timeline, Botometer cannot analyze the content coming from that account and cannot then calculate a score reflecting whether the account is human- or bot-like. To be clear, these are accounts that have never sent a single tweet or retweet. This was the case for about 40 percent of the accounts we downloaded. There were a lot of accounts with empty timelines following political parties in Germany—an interesting finding, whether they were bots or not. It is perfectly possible that there were inactive bots among these accounts, but there also may have been zombie accounts—that is, accounts that someone started at some point and then never revisited and used again. This large number of missing scores does not happen when approaching the data collection with keywords and hashtags because this latter approach collects only active accounts that have actually tweeted or retweeted something or contributed to a hashtag. So about half of the accounts following German political parties in our data sets received a Botometer score reflecting whether they had characteristics similar to those of known bot accounts. This cautious wording indicates that even the creators of the Botometer do not proclaim that the tool is 100 percent foolproof in detecting bots. In fact, they state that the tool has an overall classification accuracy of 86 percent.34 The tool does not label an account as a bot but calculates a bot-similarity score and leaves it open to interpretation. Our method therefore calculated a score for each account. The scores ranged between 0 and 1, where 0 meant great certainty that a human operated the account and 1 meant it was very likely that the

40 T H E N E E D L E I N T H E H AY S TA C K

account was automated to some degree. Many accounts scored around 0.33, fewer around 0.5, and even fewer around 0.8. Above which score should we talk of bots? We had to take a step back and think about what we could derive from the score. Setting the threshold directly impacts how many bots we would claim to have found. It is probably the most crucial decision in every bot study based on scores. So far, there is no gold standard for setting a threshold. This is largely because Botometer and other tools based on machine learning perform best when identifying accounts that are clearly humans or clearly bots. The large grey area in between is very challenging. In addition, some accounts are cyborg-like or semiautomated, meaning that they are not completely controlled by algorithms but they are also not purely and at all times operated by human users. The most prominent example is very active Twitter users: activity is sometimes a criterion used to identify bots, and very active and organizational Twitter users are sometimes mistakenly labeled as bots, since their activity seems to be nonhuman.35 Another example concerns users who support their platform activity with tools. This might make someone who is in fact simply well organized and actively mobilizing support around a specific topic appear like a bot. Social bot research therefore faces the “ground truth” problem. As long as an account does not describe itself as a bot, at least a little doubt remains whether it is indeed a bot or not. All these considerations led to inspiring discussions with colleagues and peers. However, at the end of the day, it was up to us to make a decision. We had to acknowledge that detecting social bots, which are programmed to mimic human tweeting behavior, is a very difficult task. We wanted to keep the number of false positives as low as possible; that is, we wanted to make sure there were not many instances of us classifying human-controlled accounts as bots. We aimed for robust findings. After identifying the factors that would support this goal and considering the distribution of Botometer scores depicted in the density plots, we opted for a high threshold. But how high? Many studies use a threshold of Botometer scores around 0.5.36 Other studies go as low as 0.43.37 The lower the threshold is, the more “bots” a study finds; in the case of Wojcik et al., two-thirds of links to popular websites were distributed by “bots”—that is, accounts with a Botometer score over 0.43.38 The danger here is that one might easily overestimate the role of bots in public discourse if the threshold is too low, leading to unnecessary

41 T H E N E E D L E I N T H E H AY S TA C K

alarmism. For this reason, it is important not just to read the number of bots a study reports to have found but also to always connect this number with the applied threshold. Reviewing the distribution of Botometer scores across a population of accounts and comparing these scores helped us set a threshold based on the patterns found in the data. In order to reduce the number of false positives, we aimed at a rather high threshold. The density plots visualizing the score distribution between 0 and 1 showed no normal distribution, but peculiarities appeared above a 0.76 score (figure 2.1). We therefore decided to interpret all accounts with a score over 0.76 as bot accounts. This led to the finding of 7.1 percent social bots before the election and 9.9 percent bots in the campaign phase among the followers of the seven main German parties. However, it is very important to note that

3

Density

2

1

0 0.00

0.25

0.50 Botometer score

0.75

1.00

FIGURE 2.1 Distribution of Botometer scores. Each line represents the Botometer score distribution of a party’s followers. For example, the lightest shaded line represents the scores of all those following the AfD party. Since the parties differed in the number of followers, the figure would not have been readable with a histogram. The density plot here normalized the distribution and made the distribution of scores among the seven parties comparable. Source: T. R. Keller and U. Klinger, “Social Bots in Election Campaigns: Theoretical, Empirical, and Methodological Implications,” Political Communication 36, no. 1 (2019): 181, https://doi.org/10.1080 /10584609.2018.1526238.

42 T H E N E E D L E I N T H E H AY S TA C K

almost all of the bot accounts were inactive and that, with one exception, none of the one hundred most followed bots and the one hundred most active bots tweeted about Germany’s election. Most of them were spam accounts that advertised paintings or wellness centers. To improve the reproducibility and transparency of our research, we included our density plot in every talk, discussion, and publication. This allows others to understand how the number of bots in our sample would change once we set a different threshold. We also tried to be as transparent as possible. Every researcher knows how hard it is to make such decisions, especially in newer research fields. Transparency helps other scholars understand why we made our decisions the way we did. Additionally, it gives them the option to criticize and challenge our decisions—and we can then improve our research. Had we set the threshold at 0.43, like Wojcik et al., we would have found that 37 percent of the followers of German political parties during the 2017 election campaigns were bots instead of roughly 10 percent.39 This illustrates some potential political consequences of bot research. It is not unlikely that reporting that such a high share of bots was involved in an election campaign might alarm the public and might provoke calls for regulation. And indeed, in December 2018, Ralph Brinkhaus, the leader of the German CDU parliamentary group, made a case for regulatory action against social bots, particularly in election campaigns. This happened just after Botswatch, a start-up in Berlin, claimed that a disinformation campaign around the UN Global Compact for Migration had presumably involved 28 percent social bots. Botswatch did not publish its report and did not provide any details about its bot detection method, claiming that the detection method was a business secret on which its business model was built.40 Brinkhaus then proposed possible paths for regulation, including an obligation that platforms label bot accounts visibly as bots. This stirred a hot debate. It would have sparked an even hotter one in the United States, where bloggers and politicians have argued that deleting and policing bot accounts is a form of censorship and violates the rights of free expression, even when they spread disinformation and lies.41 And while the German public sphere reverberated from debates on whether labeling bots was a smart move or nonsense, California passed a law that obliges any bot interacting with humans to disclose itself as a bot.42 The debate about whether

43 T H E N E E D L E I N T H E H AY S TA C K

and how bots can and should be regulated, whether they even have rights, and whether the right to lie will prevent platforms form deleting bots in disinformation campaigns will surely continue over the coming years. Researchers in the field of social bot detection need to be aware of and accept these responsibilities: to monitor the content, actors, and discourse dynamics on social media platforms closely without raising unnecessary alarms; to develop new and better tools for bot detection; to tackle questions beyond the mere description of bot numbers; to address questions of causality; and to reflect on the normative and political consequences of this research. CONCLUDING REMARKS

In the project we describe in this chapter, our aim was to determine how many social bots were part of the political sphere on Twitter during Germany’s 2017 federal election campaigns and how many of these were active contributors and distributors of messages. We were poised for failure, but we tried nonetheless and jumped down the rabbit hole. We faced a myriad of obstacles that we had not encountered in other research projects, but we also learned a great deal. Bot detection is not an exact science, and it is exciting to wrestle with new methods and their (sometimes not so obvious) limitations.43 After all, this is a great privilege of academic research: pursuing questions, riddles, and projects that are both difficult and fascinating and trying to break new ground. Replication is a major problem of bot detection—in all studies and with all empirical methods we know. This is because communication on social media platforms is a moving target and Twitter accounts constantly change. With each new follower or account followed and with any comment, retweet, or like, the network structure, content, and temporality change in the metadata, resulting, for example, in a different Botometer score. From time to time, the platforms weed out suspicious and locked accounts. In mid-2018, for instance, Twitter deleted millions of fake and locked accounts. Shortly after, the company’s value dropped by 15 percent as valuation estimates seem to be closely connected to user numbers and expected growth.44 This illustrates that platform owners have limited incentives to delete bot accounts, while their transparency reports show that they nevertheless pursue this Sisyphean task.45 Interestingly, Yoel Roth, Twitter’s

44 T H E N E E D L E I N T H E H AY S TA C K

head of site integrity, claimed in 2018 that “since nobody other than Twitter can see non-public, internal account data, third parties using Twitter data to identify bots are doing so based on probability, not certainty,”46 implying that Twitter can predict bots with certainty. What social bot research suggests, however, is that despite the limitations of data access and bot detection approaches, social bots still pollute Twitter with spam advertisements, engage in political discussions, and thus should be studied by independent researchers. It is important to study what kind of actors tweet and retweet what kind of content on Twitter, particularly in key democratic processes such as election or referendum campaigns. Where to from here? While all the uncertainty and work were worthwhile and rewarded with a top paper award by the Political Communication Division of the International Communication Association, social bot research is still in its infancy, and the results are far from conclusive. In a follow-up project, we have tested how different tools of bot detection render varied results for the same data set. As mentioned above, Botometer is not the only possible approach for identifying bots. Other studies count all Twitter accounts that send more than fifty messages per day as bots or, rather, as accounts with high automation.47 There are also more tools based on machine learning, such as Tweetbotornot.48 There is a dire need for innovation and further development of detection methods. Without valid and reliable bot detection tools, we cannot move forward to the really important questions—for example, what the impact of bots is, whether they really influence opinion formation, and what kind of information they spread. One thing we learned is that even without a computer science background, thanks to the generosity of people in that community, we were able to make unique contributions to research on digital challenges in political communication. At the same time, research on timely and critical phenomena in the field of political communication is increasingly in the public eye and on the radar of policy makers. Computational social science is a fast-moving field, and it is important to underline the fact that technology does not solve social problems: automated content analyses and bot detection algorithms are not magically capable of doing away with fake accounts, malicious automation, or disinformation. We need to learn more about these new actors and dynamics, and researchers need more and better data from social media platforms to do so.

45 T H E N E E D L E I N T H E H AY S TA C K

NOTES We would like to thank everyone who has supported our bot adventure so far—most importantly, Mike Schäfer and Otfried Jarren for providing financial resources and constructive feedback; Adrian Rauchfleisch, Martin Wettstein, and Clayton A. Davis for helping when we struggled with technical problems; and the numerous anonymous reviewers of our papers and conference contributions for pointing us to loose ends. 1. P. N. Howard, S. Woolley, and R. Calo, “Algorithms, Bots, and Political Communication in the US 2016 Election: The Challenge of Automated Political Communication for Election Law and Administration,” Journal of Information Technology and Politics 15, no. 2 (2018): 81–93, https://doi.org/10.1080/19331681.2018.1448735. 2. K.-C. Yang et al., “Arming the Public with Artificial Intelligence to Counter Social Bots,” Human Behavior and Emerging Technologies 1, no. 1 (2019): 48–61, https://doi .org/10.1002/hbe2.115. 3. S. C. Woolley and P. N. Howard, eds., Computational Propaganda: Political Parties, Politicians, and Political Manipulation on Social Media (New York: Oxford University Press, 2018). 4. A. Bessi and E. Ferrara, “Social Bots Distort the 2016 US Presidential Election Online Discussion,” First Monday 21, no. 11 (2016), https://ssrn.com/abstract=2982233. 5. P. N. Howard and B. Kollanyi, “Bots, #StrongerIn, and #Brexit: Computational Propaganda During the UK-EU Referendum” (Working Paper 2016:1, Computational Propaganda Research Project, Oxford Internet Institute, University of Oxford, Oxford, 2016), http://arxiv.org/pdf/1606.06356v1. 6. E. Ferrara, “Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election,” First Monday 22, no. 8 (2017), https://doi.org/10.5210 /fm.v22i8.8005. 7. For a global overview, see Woolley and Howard, Computational Propaganda. 8. T. R. Keller and U. Klinger, “Social Bots in Election Campaigns: Theoretical, Empirical, and Methodological Implications,” Political Communication 36, no. 1 (2019): 171–189, https://doi.org/10.1080/10584609.2018.1526238. 9. P. A. Saygin, L. Cicekli, and V. Akman, “Turing Test: 50 Years Later,” Minds and Machines 10, no. 4 (2000): 463–518, https://doi.org/10.1023/A:1011288000451. 10. A. M. Turing, “Computing Machinery and Intelligence,” Mind 59, no. 236 (1950): 433–460, https://doi.org/10.1093/mind/LIX.236.433. 11. J. Weizenbaum, Computer Power and Human Reason: From Judgment to Calculation (San Francisco: Freeman, 1976). 12. Weizenbaum, Computer Power and Human Reason; Saygin, Cicekli, and Akman, “Turing Test.” 13. B. Kollanyi, “Automation, Algorithms, and Politics: Where Do Bots Come From? An Analysis of Bot Codes Shared on GitHub,” International Journal of Communication 10 (2016), https://ijoc.org/index.php/ijoc/article/view/6136. 14. B. Moon, “Identifying Bots in the Australian Twittersphere,” in #SMSociety17: Proceedings of the 8th International Conference on Social Media and Society (New York: Association for Computing Machinery, 2017), 1–5, https://doi.org/10.1145 /3097286.3097335. 15. C. Shao et al., “The Spread of Low-Credibility Content by Social Bots,” Nature Communications 9, no. 1 (2018): 4787, https://doi.org/10.1038/s41467-018-06930-7.

46 T H E N E E D L E I N T H E H AY S TA C K

16. Keller and Klinger, “Social Bots in Election Campaigns.” 17. A. Bruns, “After the ‘APIcalypse’: Social Media Platforms and Their Fight Against Critical Scholarly Research,” Information, Communication and Society 22, no. 11 (2019): 1544–1566, https://www.tandfonline.com/doi/full/10.1080/1369118X.2019.1637447. 18. B. Rieder, “Studying Facebook Via Data Extraction: The Netvizz Application,” in WebSci ’13: Proceedings of the 5th Annual ACM Web Science Conference (New York: Association for Computing Machinery, 2013), 346–355, http://thepoliticsofsystems. net/permafiles/rieder_websci.pdf. 19. T. Hotham, “Facebook Risks Starting a War on Knowledge,” The Conversation, August 17, 2018, http://theconversation.com/facebook-risks-starting-a-war-on-knowledge -101646. 20. Howard and Kollanyi, “Bots, #StrongerIn, and #Brexit.” 21. F. Martini et al., “Social Bots and How to Find Them: Human-Machine Communication in Political Discourses on Twitter” (presentation, ICA Preconference on Human-Machine Interaction, Washington, DC, May 23, 2019). 22. P. Gill, 42: Douglas Adams’ Amazingly Accurate Answer to Life, the Universe and Everything (London: Beautiful Books, 2019). 23. S. Hegelich and D. Janetzko, “Are Social Bots on Twitter Political Actors? Empirical Evidence from a Ukrainian Social Botnet,” in Proceedings of the Tenth International AAAI Conference on Web and Social Media (Palo Alto, CA: AAAI Press, 2016), 579–582. 24. Howard and Kollanyi, “Bots, #StrongerIn, and #Brexit.” 25. F. Schäfer, S. Evert, and P. Heinrich, “Japan´s 2014 General Election: Political Bots, Right-Wing Internet Activism and PM Abe Shinzo´s Hidden Nationalist Agenda,” Bog Data 5, no. 4 (2017): 294–309. 26. M. Kearney, Tweetbotornot: R Package for Detecting Twitter Bots Via Machine Learning [software package], 2018, https://github.com/mkearney/Tweetbotornot. 27. Botometer, “Botometer—An OSoMe Project,” 2019, https://botometer.iuni.iu.edu/#!/. 28. C. A. Davis et al., “BotOrNot,” in WWW ,16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web, ed. J. Bourdeau et al. (Geneva: International World Wide Web Conference Steering Committee, 2016), 273–274, https://doi.org/10.1145/2872518.2889302; O. Varol et al., “Online HumanBot Interactions: Detection, Estimation, and Characterization,” in Proceedings of the Eleventh International AAAI Conference on Web and Social Media (Palo Alto, CA: AAAI Press, 2017), https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view /15587/14817; Yang et al., “Arming the Public.” 29. S. Vosoughi, D. Roy, and S. Aral, “The Spread of True and False News Online,” Science 359, no. 6380 (2018): 1146. 30. Yang et al., “Arming the Public.” 31. Davis et al., “BotOrNot”; Varol et al., “Online Human-Bot Interactions.” 32. Davis et al., “BotOrNot”; Varol et al., “Online Human-Bot Interactions.” 33. Botometer, “Botometer.” 34. Varol et al., “Online Human-Bot Interactions.” 35. S. Musgrave, “ ‘I Get Called a Russian Bot 50 Times a Day,’ ” Politico, August 9, 2017, https://www.politico.com/magazine/story/2017/08/09/twitter-trump-train-maga -echo-chamber-215470. 36. D. M. J. Lazer et al., “The Science of Fake News,” Science 359, no. 6380 (2018): 1094– 1096, https://doi.org/10.1126/science.aao2998; Vosoughi, Roy, and Aral, “The Spread.”

47 T H E N E E D L E I N T H E H AY S TA C K

37. S. Wojcik et al., Bots in the Twittersphere (Washington, DC: Pew Research Center, 2018). 38. Wojcik et al., Bots in the Twittersphere. 39. Wojcik et al., Bots in the Twittersphere. 40. T. Gutschker and F. Haupt, “Bundestag gegen Bots. Keine Debatte mit Robotern,” Frankfurter Allgemeine Zeitung, December 16, 2018, https://www.faz.net/aktuell /politik/inland/bundestag-gegen-bots-keine-debatte-mit-robotern-15943490 .html; M. Reuter, “Warum wir die panische Bot-Debatte beenden sollten,” Netzpolitik, December 17, 2018, https://netzpolitik.org/2018/warum-wir-die-panische-bot-debatte -beenden-sollten/. 41. J. Mervis, “An Internet Research Project Draws Conservative Ire,” Science 346, no. 6210 (2014): 686–687. 42. L. Sacharoff, “Do Bots Have First Amendment Rights?,” Politico, November 27, 2018, https://www.politico.com/magazine/story/2018/11/27/bots-first-amendment-rights -222689. 43. A. Rauchfleisch and J. Kaiser, “The False Positive Problem of Automatic Bot Detection in Social Science Research,” Berman Klein Research Publication No. 2020-3, April 49, 2020, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3565233. 44. R. Neate, “Twitter Stock Plunges 20 Percent in Wake of 1M User Decline,” The Guardian, July 27, 2018, https://www.theguardian.com/technology/2018/jul/27 /twitter-share-price-tumbles-after-it-loses-1m-users-in-three-months. 45. Twitter, “Platform Manipulation,” https://transparency.twitter.com/en/reports/platform -mani-pulation.html. Accessed September 1, 2020. 46. Yoel Roth (@yoyoel), Twitter, November 2, 2018, 2:32 PM, https://twitter.com/yoyoel /status/1058471837313589248. 47. Howard and Kollanyi, “Bots, #StrongerIn, and #Brexit.” 48. Kearney, Tweetbotornot.

REFERENCES Bessi, A., and E. Ferrara. “Social Bots Distort the 2016 US Presidential Election Online Discussion.” First Monday 21, no. 11 (2016). https://ssrn.com/abstract=2982233. Botometer. “Botometer—An OSoMe Project.” 2019. https://botometer.iuni.iu.edu/#!/. Bruns, A. “After the ‘APIcalypse’: Social Media Platforms and Their Fight Against Critical Scholarly Research.” Information, Communication and Society 22, no. 11 (2019): 1544–1566. https://www.tandfonline.com/doi/full/10.1080/1369118X.2019.1637447. Davis, C. A., O. Varol, E. Ferrara, A. Flammini, and F. Menczer. “BotOrNot.” In WWW ,16 Companion: Proceedings of the 25th International Conference Companion on World Wide Web, ed. J. Bourdeau, J. A. Hendler, R. N. Nkambou, I. Horrocks, and B. Y. Zhao, 273–274. Geneva: International World Wide Web Conference Steering Committee, 2016. https://doi.org/10.1145/2872518.2889302. Ferrara, E. “Disinformation and Social Bot Operations in the Run Up to the 2017 French Presidential Election.” First Monday 22, no. 8 (2017). https://doi.org/10.5210 /fm.v22i8.8005. Gill, P. 42: Douglas Adams’ Amazingly Accurate Answer to Life, the Universe and Everything. London: Beautiful Books, 2011.

48 T H E N E E D L E I N T H E H AY S TA C K

Gutschker, T., and F. Haupt. “Bundestag gegen Bots. Keine Debatte mit Robotern.” Frankfurter Allgemeine Zeitung, December 16, 2018. https://www.faz.net/aktuell /politik/inland/bundestag-gegen-bots-keine-debatte-mit-robotern-15943490.html. Hegelich, S., and D. Janetzko. “Are Social Bots on Twitter Political Actors? Empirical Evidence from a Ukrainian Social Botnet.” In Proceedings of the Tenth International AAAI Conference on Web and Social Media, 579–582. Palo Alto, CA: AAAI Press, 2016. Hotham, T. “Facebook Risks Starting a War on Knowledge.” The Conversation, August 17, 2018. http://theconversation.com/facebook-risks-starting-a-war-on-knowledge -101646. Howard, P. N., and B. Kollanyi. “Bots, #StrongerIn, and #Brexit: Computational Propaganda During the UK-EU Referendum.” Working Paper 2016:1, Computational Propaganda Research Project, Oxford Internet Institute, University of Oxford, Oxford, 2016. http://arxiv.org/pdf/1606.06356v1. Howard, P. N., S. Woolley, and R. Calo. “Algorithms, Bots, and Political Communication in the US 2016 Election: The Challenge of Automated Political Communication for Election Law and Administration.” Journal of Information Technology and Politics 15, no. 2 (2018): 81–93. https://doi.org/10.1080/19331681.2018.1448735. Kaiser, J. and Rauchfleisch, A. “The False Positive Problem of Automatic Bot Detection in Social Science Research.” Berkman Klein Center Research Publication No. 2020-3 (2020). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3565233. Kearney, M. Tweetbotornot: R Package for Detecting Twitter Bots Via Machine Learning [software package]. 2018. https://github.com/mkearney/Tweetbotornot. Keller, T. R., and U. Klinger. “Social Bots in Election Campaigns: Theoretical, Empirical, and Methodological Implications.” Political Communication 36, no. 1 (2019): 171–189. https://doi.org/10.1080/10584609.2018.1526238. Kollanyi, B. “Automation, Algorithms, and Politics: Where Do Bots Come From? An Analysis of Bot Codes Shared on GitHub.” International Journal of Communication 10 (2016). https://ijoc.org/index.php/ijoc/article/view/6136. Lazer, D. M. J., M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger et al. “The Science of Fake News.” Science 359, no. 6380 (2018): 1094– 1096. https://doi.org/10.1126/science.aao2998. Martini, F., P. Samula, T. R. Keller, and U. Klinger. “Social Bots and How to Find Them: Human-Machine Communication in Political Discourses on Twitter.” Presented at the ICA Preconference on Human-Machine Interaction, Washington, DC, May 23, 2019. Mervis, J. “An Internet Research Project Draws Conservative Ire.” Science 346, no. 6210 (2014): 686–687. Moon, B. “Identifying Bots in the Australian Twittersphere.” In #SMSociety17: Proceedings of the 8th International Conference on Social Media and Society, 1–5. New York: Association for Computing Machinery, 2017. https://doi.org/10.1145/3097286.3097335. Musgrave, S. “ ‘I Get Called a Russian Bot 50 Times a Day.’ ” Politico, August 9, 2017. https://www.politico.com/magazine/story/2017/08/09/twitter-trump-train-maga-echo -chamber-215470. Neate, R. “Twitter Stock Plunges 20 Percent in Wake of 1M User Decline.” The Guardian, July 27, 2018. https://www.theguardian.com/technology/2018/jul/27/twitter-share -price-tumbles-after-it-loses-1m-users-in-three-months.

49 T H E N E E D L E I N T H E H AY S TA C K

Reuter, M. “Warum wir die panische Bot-Debatte beenden sollten.” Netzpolitik, December 17, 2018. https://netzpolitik.org/2018/warum-wir-die-panische-bot-debatte-beenden -sollten/. Rieder, B. “Studying Facebook Via Data Extraction: The Netvizz Application.” In WebSci ’13: Proceedings of the 5th Annual ACM Web Science Conference, 346–355. New York: Association for Computing Machinery, 2013. http://thepoliticsofsystems.net /permafiles/rieder_websci.pdf. Sacharoff, L. “Do Bots Have First Amendment Rights?” Politico, November 27, 2018. https:// www.politico.com/magazine/story/2018/11/27/bots-first-amendment-rights-222689. Saygin, P. A., I. Cicekli, and V. Akman. “Turing Test: 50 Years Later.” Minds and Machines 10, no. 4 (2000): 463–518. https://doi.org/10.1023/A:1011288000451. Schäfer, F., S. Evert, and P. Heinrich. “Japan´s 2014 General Election: Political Bots, Right-Wing Internet Activism and PM Abe Shinzo’s Hidden Nationalist Agenda.” Bog Data 5, no. 4 (2017): 294–309. Shao, C., G. L. Ciampaglia, O. Varol, K.-C. Yang, A. Flammini, and F. Menczer. “The Spread of Low-Credibility Content by Social Bots.” Nature Communications 9, no. 1 (2018): 4787. https://doi.org/10.1038/s41467-018-06930-7. Turing, A. M. “Computing Machinery and Intelligence.” Mind 59, no. 236 (1950): 433– 460. https://doi.org/10.1093/mind/LIX.236.433. Varol, O., E. Ferrara, C. Davis, F. Menczer, and A. Flammini. “Online Human-Bot Interactions: Detection, Estimation, and Characterization.” In Proceedings of the Eleventh International AAAI Conference on Web and Social Media. Palo Alto, CA: AAAI Press, 2017. https://aaai.org/ocs/index.php/ICWSM/ICWSM17/paper/view/15587/14817. Vosoughi, S., D. Roy, and S. Aral. “The Spread of True and False News Online.” Science 359, no. 6380 (2018): 1146–1151. Weizenbaum, J. Computer Power and Human Reason: From Judgment to Calculation. San Francisco: Freeman, 1976. Wojcik, S., S. Messing, A. Smith, L. Rainie, and P. Hitlin. Bots in the Twittersphere. Washington, DC: Pew Research Center, 2018. Woolley, S. C., and P. N. Howard, eds. Computational Propaganda: Political Parties, Politicians, and Political Manipulation on Social Media. New York: Oxford University Press, 2018. Yang, K.-C., O. Varol, C. A. Davis, E. Ferrara, A. Flammini, and F. Menczer. “Arming the Public with Artificial Intelligence to Counter Social Bots.” Human Behavior and Emerging Technologies 1, no. 1 (2019): 48–61. https://doi.org/10.1002/hbe2.115.

Chapter Three

MEETING YOUTH WHERE THEY ARE Challenges and Lessons Learned from Social Media Recruitment for Sexual and Gender Minority Youth ERIN FORDYCE, MICHAEL J. STERN, AND MELISSA HEIM VIOX

INTRODUCTION

Social media offer the opportunity to reach young people where they are. For our study of a specific youth population, we turned to these relatively new media to recruit respondents. While they offer a helpful opportunity, they are not without their own challenges. In this chapter, we report on our approach to using social media to recruit a hard-to-access population for a health-related project. In 2015, NORC1 at the University of Chicago began working with the Division of Adolescent and School Health at the Centers for Disease Control and Prevention (CDC) on a study to inform early and effective intervention and prevention efforts in response to high rates of HIV infection among adolescent men who have sex with men (AMSM). These rates are disproportionately high among AMSM and transgender adolescents of color. Here we describe the design and process along with lessons learned as we went from an idea to data collection for the Survey of Today’s Adolescent Relationships and Transitions (START).2 START, supported by the U.S. health and human services secretary’s Minority AIDS Initiative, was designed to increase our understanding of the sexual risk and HIV prevention needs of AMSM ages thirteen to eighteen and transgender youth ages thirteen to twenty-four residing in the United States. The Gay and Lesbian Alliance Against Defamation (GLAAD)

51 MEETING YOUTH WHERE THEY ARE

defines transgender as “an umbrella term for people whose gender identity and/or gender expression differs from what is typically associated with the sex they were assigned at birth.”3 We also targeted black and Latino youth who are at especially high risk of new HIV infection given that they face a different set of challenges compared with their white, non-Hispanic peers.4 These challenges are often linked to culture, religion, and family structure, making the coming-out process more difficult5 and possibly leading to isolation.6 Research continues to show that black and Latino men who have sex with men have insufficient health care,7 have high undiagnosed HIV/ STI rates,8 and are not tested on a regular basis.9 This further emphasizes the importance of conducting research on this specific population. There are many challenges to surveying sexual and gender minority youth. For instance, research has shown that these young people may still be developing their sexual and gender identity, may not have disclosed their status to others, or may have less of a connection to a “gay” community.10 Prior data collection efforts have often used probability-based sampling methods, resulting in sexual and gender minorities being underrepresented.11 Given the difficulties of sampling sexual and gender minority youth and the limited resources available, we decided to use a social-mediabased recruitment effort for a web-based survey design. Social media recruitment for teens may appear on the surface to be an efficient means of collecting survey data, but you will read here about the challenges encountered throughout this process. Although we successfully recruited our targeted population within a short period of time, the research team encountered many obstacles due to the constantly shifting and evolving social media environment. We begin our discussion by providing background information on the study design and justification for using social media for recruitment, then dig deeper into the challenges encountered, and finally outline lessons learned. It is our hope that after reading about our experiences, you will have a better understanding of the intricacies of social media recruitment and will use these lessons learned to improve recruitment efforts further for hard-to-reach populations. STUDY PURPOSE

The purpose of this study was to gather information on the knowledge, attitudes, and behaviors of sexual and gender minority youth regarding

52 MEETING YOUTH WHERE THEY ARE

their acceptability of HIV risk/prevention, including risk behaviors and condoms as well as biomedical interventions such as PrEP, PEP,12 and rectal microbicides. The survey also gathered information on access, exposure, and attitudes toward sex education and HIV prevention services in school and community settings. The final outcome was to use the information gathered to improve HIV prevention efforts, to learn more about the acceptability of and adherence to HIV prevention strategies, and to translate these findings into HIV prevention tools and guidance to use with sexual and gender minority youth. We used this opportunity to incorporate social media recruitment, a non-probability-based sampling method, to assess its potential for recruiting a national sample of AMSM and transgender youth for a web survey. Research is constantly evolving with changes in technology and the ways in which people communicate. Methods of recruitment that were successful decades or even just years ago may not be applicable today. We must adjust our approaches and be willing to compromise as we continue learning and trying to improve our methods. We will analyze the sample recruited for START and evaluate its representativeness and the opportunities available for improving social media recruitment of this population for future studies. Why Social Media?

Social media platforms (e.g., Facebook, Instagram, Snapchat, Twitter) present a unique recruitment opportunity for survey researchers. They serve as an attractive alternative to traditional modes of survey data collection (e.g., telephone, face-to-face, mail), which continue to experience declines in response rates.13 Households are cutting ties with the landline telephone and opting for mobile devices that make screening incoming calls much easier. Households are also more closed off with secured housing units (e.g., gated communities, secured-access apartment buildings) and are less likely to trust or welcome an interviewer into their home.14 There are also social and technological changes associated with how people communicate as well as find and use information.15 Nowhere do we find this to be more evident than with the youth population. The communication style of youth is changing with the large number of social media sites being developed and increased access to the internet and mobile devices. The Pew Research Center publishes detailed reports

53 MEETING YOUTH WHERE THEY ARE

on social media usage in the United States. Its most recent report, in 2018, showed that of teens ages thirteen to seventeen, 51 percent use Facebook, while 69 percent use Snapchat and 72 percent use Instagram. The report also found that 95 percent of teens have access to a smartphone.16 Social media have obviously become a primary means of communication among youth, and, therefore, we wanted to explore the potential of social media recruitment to reach the subpopulation of young sexual and gender minorities. Researchers have successfully used social media for data collection efforts that are both passive (i.e., data scraping) and active (i.e., surveys).17 For example, passive data collection has been used to assess the marketing of e-cigarettes on Twitter,18 while active data collection has been used to survey respondents through ads on Facebook, Google,19 and other sites as well as to recruit participants for research trials or programs.20 National-level probability-based surveys with youth have often been conducted in household settings (e.g., National Survey of Family Growth [NSFG]) and school settings (e.g., Youth Risk Behavior Survey [YRBS]). Probability surveys involve a random selection of respondents who are representative of the population from which they were selected. This is often ideal because it allows for confidence interval estimation,21 or the ability to establish a level of confidence that the true value for an unknown population lies within a specified range. But with a targeted population of sexual and gender minority youth, it is more difficult to use probability sampling for several reasons. First, the population is relatively small. A 2017 report from the Williams Institute found that 0.7 percent of teens ages thirteen to seventeen identify as transgender,22 and surveys of teens estimate that 8–9 percent are gay or lesbian.23 Second, members of this population are difficult to identify due to both their young age and their still-developing sexual and gender identities. Third, there are confidentiality concerns regarding the possibility of “outing” youth, especially if parental consent is required, which is often the case with probability-based surveys. And with a targeted population starting as young as age thirteen, these youth are experiencing a still developing gender identity and sexual orientation. Social media therefore offer a means to reach not only those who are “out” and have an established gender identity and sexual orientation but also those who are still questioning or who may be less inclined to provide truthful responses in a household or school setting. Since questioning or unsure youth were of interest to the study, a method that would reach them would be ideal.

54 MEETING YOUTH WHERE THEY ARE

There are also significant cost implications when using probability-based sampling, especially for this population. Probability-based surveys require a much larger population from which to draw in order to get a sufficient sample of sexual and gender minority youth. In contrast, a non-probabilitybased survey involves respondents who are self-selected. To demonstrate this issue, Michaels and colleagues24 compared the START sample with subsamples of AMSM from two national probability-based surveys—the YRBS and NSFG—to assess the representativeness of the samples. The YRBS is designed to monitor health behaviors and experiences among high school students in the United States. The survey is administered to students in grades 9–12 attending public and private schools. A sample of 420 cases from the 2017 cycle of the YRBS was pulled (from a total sample of 14,765 males and females) for comparison with a subsample of 1,235 AMSM from the START sample. The NSFG gathers information on family life, marriage and divorce, pregnancy, infertility, use of contraception, and general and reproductive health. The survey is administered in households and includes men and women ages fifteen to forty-four living in the United States. When pulling a comparison sample from the NSFG, Michaels and colleagues25 combined data from 2002 to 2015 to obtain as large a sample size as possible to compare with the START sample, but they were able to pull only 395 comparable cases of AMSM from a total of 4,510 males ages fifteen to eighteen. These comparisons show that there is great value in using a recruitment method that can target a hard-to-access population, as it results in many more cases than can be obtained through probability samples. An additional consideration was that by offering the START survey online, we hoped it would give youth a sense of privacy, resulting in responses from a larger, more diverse sample of AMSM and transgender youth. Zeng and colleagues26 describe social media as “a conversational, distributed mode of content generation, dissemination, and communication among communities.” Social media present an opportunity to reach communities or subpopulations that might otherwise go undetected. Our targeted sample of AMSM and transgender youth had developed online communities where they could connect with each other in a safer space than is usually otherwise available to them. Social media and other webbased media content are a means for youth “to explore identities, behaviors, and lifestyles that might remain inaccessible offline.”27 You will read later about the importance that these communities and associated networks played in our data collection efforts.

55 MEETING YOUTH WHERE THEY ARE

We recognize that social media users are not representative or inclusive of the general population and that not all social media sites reach the same population.28 Social media users tend to be younger and female and live in urban areas.29 The demographics of those with internet access vary from those without access, the demographics of social media users vary from those who do not use social media, and the demographics of those who were targeted for the survey vary from those who decided to respond.30 This last point is especially important because with such a short field period (as we describe later in this chapter), we limited the scope of potential respondents who could see the ads and participate. The demographics of our recruited respondents fit within the targeting specifications (AMSM ages thirteen to eighteen and transgender youth ages thirteen to twenty-four), but we know there were differences in key demographics from those who elected not to participate. We conducted detailed research during the planning phase with a thirdparty social media consulting firm to understand which social media sites were most likely being used by our targeted population, and we selected our final social media sites based on these findings. The challenge with this approach is that social media site popularity changes frequently, and the approach we used for START likely will not be applicable months or years from now. Anderson and Jiang, authors of the Pew report mentioned earlier, found the percentage of teens that reported using Facebook decreased from 71 percent to 51 percent between 2015 and 2018, while those using Instagram increased from 52 percent to 72 percent.31 These quick shifts in site popularity are difficult to predict, as new social media sites continue to be developed. Our research also found that not all social media sites permitted ads or ad formats that were conducive to our research, thereby limiting our options. In short, while our general approach and lessons learned should be very applicable to future studies, it will be important to assess what platforms are most popular with the particular population of interest at the time of any future study. About the Survey

START included existing questions from high-quality surveys in addition to modified and newly developed questions. It covered such topics as sexual and gender identity, behavior, and attraction; access to sex education and other HIV prevention activities; knowledge and behaviors related to HIV prevention methods; and parental involvement. We posted survey advertisements on Facebook, Snapchat, and Instagram. We also posted a text ad

56 MEETING YOUTH WHERE THEY ARE

using Google AdWords. After clicking the ad, respondents were taken to the survey welcome screen, which informed them of their rights as a participant, provided information about the study (e.g., description, time to complete, incentive offered upon completion), and explained that they could skip any question they did not want to answer. By clicking “Submit,” respondents started the screener, which determined eligibility for the survey. There were two formats of the survey: one for AMSM respondents and another for transgender youth. To be eligible for the AMSM survey, respondents had to be assigned male at birth, be ages thirteen to eighteen, and fit into one of the following categories: • Report any attraction to males • Report having sex with males • Identify as gay, bisexual, queer, pansexual, demisexual, or questioning/unsure

To be eligible for the transgender survey, respondents had to report being assigned male or female at birth, be ages thirteen to twenty-four, and currently describe their gender as different from their sex at birth. If the respondent selected a gender identity that did not correspond with their sex assigned at birth (e.g., if a respondent assigned male at birth identified as female now), they were prompted to confirm their selection before continuing. Ineligible respondents were redirected to a screen informing them of their ineligibility and thanking them for their time. Questionnaire Design and Testing

Cognitive interviewing is a means of pretesting a questionnaire before it is used in the field. It allows interviewers to identify problematic items as they follow along with a respondent who is completing the questionnaire. We are specifically looking for a respondent’s ability to comprehend the question asked, recall information, select a response from the options listed, and make a judgment. There are many cognitive interviewing techniques, including “think aloud” interviews and structured “verbal probing” (e.g., Can you tell me more about that?), that elicit further information from the respondent. We used both techniques when conducting eight cognitive interviews in Chicago, where our team is located. Minor wording changes were made to the questionnaire and whenever possible questions eliminated where perceived redundancies were noted. Youth who completed the

57 MEETING YOUTH WHERE THEY ARE

survey were asked to provide feedback on both the questionnaire and the initial set of static image ads. The ads were also shared with our Youth Community Advisory Board (YCAB), whose members were representative of the targeted population. This consultation with the board was organized and conducted by our partner, the Fenway Institute, in Boston. The qualitative data we collected proved just as valuable as our quantitative data. The cognitive interview respondents and the representative group of our targeted population provided feedback on their overall opinions and preferences and offered new ideas for ad development (e.g., use of popular memes). The START questionnaire was programmed as a web-based survey. A traditional web survey using probability-based sampling often will include a secure link that can be accessed only using a PIN or other unique identifier. For START, we used non-probability-based sampling on social media, and, therefore, the survey link was open and accessible to anyone who clicked on our ads. We were concerned with securing not only the survey data we were collecting but also the $10 Amazon gift codes offered upon completion. As each ad mentioned the offering of a gift code, we thoroughly tested the survey and our security systems to prevent respondents from fraudulently entering the survey multiple times. In addition, we knew a large portion of our respondents would access the survey via a mobile device. This required additional testing across devices to ensure the survey functioned properly. We will discuss our testing and security measures in more detail later in this chapter, as they certainly qualify as a significant challenge we faced in preparing for data collection. We conducted data collection in four phases, with each phase representing a period of time when the ads were live on social media. After each phase, we stopped the ads and reviewed the data. We then made updates as necessary based on our findings. Ad Development and Recruitment Strategy

The initial plan was to post only ads with an image or picture, but we expanded our plan to include video ads, as these are required for Snapchat and can be very effective on Facebook and Instagram. We purchased static images as well as audio and video files from online image sites such as iStock, Getty Images, and Shutterstock. We purchased music to accompany each video ad separately from the video itself and compiled them with the help of an independent contractor who also added text to the video ads. We designed the ads to reach our targeted population in order to increase the

58 MEETING YOUTH WHERE THEY ARE

likelihood they would click on the ads and participate in the survey. Specifically, we designed the ads to appeal to one of three groups—AMSM, transgender youth, and the general teen population. The general teen population was included to help recruit younger teens who might not have established their gender identity or sexual orientation but nonetheless might be eligible for the survey. In addition, we designed the ads to be inclusive of and appealing to black and Latino adolescents. Facebook, Instagram, and Snapchat were selected based on their popularity with these groups.32 We considered other social media sites, such as Twitter, Tumblr, and Kik, but we ended up not including these for several reasons. First, with a smaller number of sites, we could better monitor ad performance. Second, while Kik had larger percentages of users who were black and Latino youth, it did not allow traditional advertisements at the time of our study. Third, if advertisements underperformed on Facebook, Instagram, or Snapchat, we could have employed the other sites at that time, but their inclusion up front did not seem necessary given the reach of the other sites.33 The ads were visually designed to appeal to the targeted population. Each social media site also allowed for specific targeting options that would narrow down which users would be shown the ad. Social media sites often collect demographic information from users upon account creation, which allows for easier ad targeting. The factors used for targeting ads on Facebook and Instagram are listed in table 3.1. TABLE 3.1 Facebook/Instagram Ad Targeting Ad Campaign

Targeting

AMSM

United States 13–18 years old Men Interested in males or unspecified

Transgender

United States 13–24 years old All genders Additional interests: Adam Bouska, Laverne Cox, National Center for Transgender Equality, RuPaul’s Drag Race, The Trevor Project, transgenderism

General population

United States 13–24 years old All genders

59 MEETING YOUTH WHERE THEY ARE

Similar to the Facebook and Instagram campaigns, the first Snapchat ad campaign was a series of ads with the same targeting specifications and budget setup. It targeted youth ages thirteen to twenty-four, regardless of gender, who lived in the United States and who followed Snapchat users popular among the target population, as listed in table 3.2, or used “like” in response to certain categories, also in table 3.2. Later during data collection, NORC posted a second ad campaign (in addition to the first) that did not include the targeting from table 3.2, thereby expanding our reach—that is, the number of Snapchat users who would be shown the ad. The addition of a second Snapchat campaign allowed us to be more inclusive of respondents who may not have been targeted in the initial campaign and to compare the effectiveness of the different campaign strategies at recruiting participants. Facebook, Instagram, and Snapchat do not allow targeting on sexual orientation. The closest option we could use was the “Interested in” category on Facebook and Instagram. We selected “Interested in males or unspecified” for the AMSM ad campaign. Over one hundred ad variations were used for this study. A few example ads that were posted to social media are included here. The ad referenced in figure 3.134 was one of the most successful ads at recruiting respondents who completed the survey, including black and Hispanic respondents. The ads in figure 3.2 were less successful. However, we note that it is difficult to

TABLE 3.2 Snapchat Targeting Snapchat users following

Zendaya Bryan Yambao Ellen DeGeneres Lady Gaga Laverne Cox Kylie Jenner Tyler Oakley

Snapchat categories

Adventure Seekers Beauty Mavens Music Fans Fashion and Style Gurus High Schoolers Film and TV Fans Fitness Enthusiasts

FIGURE 3.1 Top-performing ad: Males holding hands image. Source: Stocksy.

(a)

(b) FIGURE 3.2 Poor-performing ads: (a) Ad 1, featuring the image of a dog, was designed for general teen populations (posted under “Transgender Campaign” on Facebook*); (b) Ad 2, a Teen Beam image, was designed for general teen populations (posted under “Transgender Campaign” on Facebook*). *The ads were posted under multiple ad campaigns, with and without eligibility language. Source: iStock.

61 MEETING YOUTH WHERE THEY ARE

compare directly the performances of all ads that were posted, as each social media site utilizes different algorithms for pushing ads out to its users. Also, the approval processes for ads on Facebook and Instagram vary. If a single ad is to be posted for three campaigns, it may be approved for one campaign but not for the other two. Then the ad must be resubmitted until it is finally approved. This meant that some ads were not posted for the same period of time. We will revisit this issue later in the chapter. CHALLENGES ENCOUNTERED

The purpose of this chapter is not to discuss the survey results, but we will take this moment to boast that our recruitment efforts were quite successful, with over three thousand completed surveys in just under two weeks— approximately half AMSM and half transgender youth. With that said, it was not an easy or uneventful process to get to this point. Our goal with the remainder of this chapter is to discuss the challenges encountered throughout the process so that others who wish to conduct a similar study targeting hard-to-access populations can learn from our experiences and ultimately so that we as researchers can better reach this population to improve health services and outcomes where needed. Plan Approvals

Once we developed our overall approach to data collection, the next step was to submit our plan for approval to NORC’s Institutional Review Board (IRB) and the Office of Management and Budget (OMB). The IRB, or ethics committee, is responsible for reviewing research projects in order to protect human subjects participating in a study from harm. OMB review was required under the Paperwork Reduction Act, which aims to maximize the use of information collected while also protecting the public from the burden of being sent redundant requests for survey participation. Our primary concern when submitting our plan for approval was the consent requirement for youth to participate. We wanted to recruit a broad and representative sample of sexual and gender minority youth but understood that we needed to maintain respondent privacy and protect them from any direct or indirect harm from participating in the survey. We first requested a waiver for parental consent. Seeking parental consent would

62 MEETING YOUTH WHERE THEY ARE

require that sexual and gender minority youth be “out” to their parents/ guardians to participate, and, therefore, requiring parental consent could influence study results35 or hinder responses from youth not willing to share this sensitive information.36 We also requested a waiver of documentation of consent. Respondents completed the survey anonymously online and gave their assent/consent37 by clicking “Continue” after reading a written statement regarding their rights as a participant on the survey welcome screen (the first screen that appeared after clicking the ad). Without the documentation of consent, respondents would not be required to provide any identifiable information to participate in the survey, and, therefore, no identifiable information would be linked with respondents. Another issue arising during the approval process was the IRB’s concern that the respondents had to provide their email address and phone number in order to receive the incentive. We wanted to maintain respondent privacy by not collecting personally identifiable information, but we also wanted to collect email addresses and phone numbers for the incentives because this served as a security check to prevent respondents from receiving multiple incentives for completing the survey multiple times. This presented a dilemma in trying to balance privacy and IRB/OMB requirements with data security. At the end of the survey, respondents could elect to receive the $10 Amazon gift code incentive by email or text message; however, they were given the gift code on the final screen regardless of whether they gave their email address and/or phone number. Therefore, they did not have to provide this information and were still eligible to receive the incentive. If they did provide this information, the IRB required that it be stored separately from the survey data. This required us to build a separate database to store respondent email addresses and phone numbers and consequently involved additional time and resources for programming and testing. We will discuss this issue in more detail in the next section on security measures. Security Measures

The internet is filled with frauds—there, we said it. You have probably noticed the websites that promote “Free, Paid Surveys!” This allows a large number of internet users to access your survey, many of whom may enter false information to try to become eligible to participate. We were posting

63 MEETING YOUTH WHERE THEY ARE

ads on social media sites with an open survey link, so we had to develop extensive security measures to address concerns about possible repeated entries by respondents in order to receive additional gift codes; this was especially a concern with respect to automated bots that could access a survey and submit a large number of completes within a short period of time. To illustrate the reality of this threat, note that in 2016, automated bots surpassed human beings as the primary users of the internet.38 At the start of data collection, the survey was programmed with Google ReCAPTCHA to ameliorate this issue. ReCAPTCHA is a free software program that asks users to click a checkbox to confirm they are not a robot. It may also require users to select a series of images (e.g., all images of a bus). We also programmed an internal check that would identify respondents using duplicate email addresses or phone numbers to receive the gift code. If someone entered the same information, the original gift code provided to that email address and/or phone number would display on the screen with a message informing the respondent that they had already completed the survey and received the code. After the first phase of data collection,39 we noticed a larger number of respondents accessing the survey than anticipated—over seven hundred within nineteen hours. Our concern was that a potential bot could have completed the survey numerous times, generating a large number of completes. Another concern was the possibility that respondents were accessing the survey multiple times using different email addresses to collect new gift codes or not providing an email address or phone number when initially completing, thereby avoiding the duplicate email address/phone number check that was programmed at the end of the survey. We paused the ads to review the data and found no fraudulent or bot-type responses. Although we identified no evidence of fraudulent responses, we decided to enhance our security capabilities and employed RelevantID— a proprietary digital fingerprint software owned by Imperium. Once incorporated into our survey by one of our survey programmers, it gathered a large number of data points, such as operating system version, browser version, and plug-ins, from each respondent’s computer. When a respondent first entered the survey, RelevantID assigned a score between 0 and 100 based on the data points collected. The higher the score, the more likely the respondent had accessed the survey before. When a respondent first entered the survey, their score should have been close to 0. If the respondent tried to

64 MEETING YOUTH WHERE THEY ARE

complete the survey again on the same device, their score should have been close to 100. We started with a cutoff score of 90, considering respondents with a higher score as likely duplicates and sending them to the ineligibility screen. We delayed the start of our second phase of data collection until we had thoroughly tested the RelevantID feature on both mobile and desktop devices. After phase 2, NORC reviewed the RelevantID scores and found them to have a bimodal distribution (peaks at 0–2 and 95–100), suggesting the system was clearly effective at identifying potential duplicate respondents. We then lowered the cutoff to 75 to widen our range of potential duplicate users. RelevantID flagged 1,353 cases as duplicate respondents attempting to reenter the survey after already completing it. Our system was programmed to allow only a certain number of completed surveys each day depending on when we reached our gift code cap, the maximum number of codes that could be distributed each day. The purpose of the cap was to control the speed and length of the period of data collection. Once that cap was hit, respondents would receive a warning message that no additional codes would be given that day. They could continue to complete the survey but would need to provide their email address so we could send the code later. Otherwise, we provided a link for them to come back and access the survey the next day. We found that we were hitting our gift code cap early in the morning (between 12 midnight and 4 AM). The gift code count reset each day at midnight, and, therefore, most of our respondents were those who were awake and accessed the survey at very early hours in the morning. Our fear was that this was impacting our data by limiting the diversity of our respondents. We therefore changed the reset time from midnight to 3 PM, the time most youth would be getting out of school. Ad Approval

If you asked what our least favorite part of the process was, the majority of our team would tell you it was the ad approval process. Ads to be posted on Facebook, Instagram, and Snapchat had to go through an approval process. Each time the ads were paused and then turned back on, they had to go through the approval process again. This was one feature we were not anticipating in our schedule. While seeking approval for over one hundred ad variations posted to Facebook and Instagram,40 we found that a certain ad being posted to Facebook might be approved immediately while the

65 MEETING YOUTH WHERE THEY ARE

same ad being posted to Instagram might not be approved or might be approved only after a delay. We had to resubmit some ads several times before they were approved. But we didn’t make substantive changes to these ads; we just tweaked something like the ad title and clicked to resubmit, and it would be approved. The rules and procedures are constantly changing, making this one of the many aspects of posting ads to social media that can be very frustrating. You may need to contact Facebook support to get assistance with ad approvals depending on the length of the delay. You can also appeal the decision if your ad continues to be disapproved. During the ad review process, Facebook will check your ad image, text, targeting, and positioning in addition to your ad’s landing page.41 The landing page is similar to a typical Facebook user page. The landing page on which ads are posted must match the service/product being promoted in the ad. In other words, we could not post ads on a team member’s Facebook page. We had to create a new Facebook account with information about START. In relation to ad format, each ad must be a certain size and comply with the specified text-to-image ratio. The image text—any text that exists on your ad image—should not take up over 20 percent of the image space. There is also an extensive list of prohibited content that cannot be included: for example, ads cannot discriminate or encourage discrimination, promote the sale of tobacco or illegal drugs, include adult content, show excessive violence, or contain content that asserts or implies personal attributes (e.g., race, age, sexual orientation, per Facebook Advertising Policies). This last prohibition resulted in the disapproval of a few of our ads because they specifically mentioned that transgender individuals or men who are attracted to men were eligible to participate. After submitting an appeal and resubmitting the ads, they were finally approved. Many ads were posted with this eligibility language, yet only a few of the ads were not initially approved, proving the process can be arbitrary. Data Collection

Social media are increasingly being used by social researchers to collect data; however, it is still a relatively new means of recruitment with a lot of unknowns. Even with all our advance preparation, there were still unexpected events that served as valuable lessons learned as we experimented with the possibility of recruiting this hard-to-reach population on social media.

66 MEETING YOUTH WHERE THEY ARE

We received a much larger response during the data collection period than we had anticipated early on. Our initial projections had us collecting responses over several months, but we finished data collection within two weeks (taking into account all four phases of data collection). Because we received such a large response early in the data collection period, we did not have enough gift codes loaded into our gift code database. We paused the ads to review the data but realized shortly afterward that the ads were continuing to be shared. When an ad was paused, it stopped appearing in newsfeeds; however, if a respondent had already shared the ad or if they had not scrolled all the way through their feed when the ad originally loaded, they may have seen the ad after it had been paused. The survey was still open, so anyone able to access the ads in one of these ways would still be able to access the survey. As a result, a large number of people continued to access the survey after we had run out of gift codes, and when they got to the incentive screen at the end of the survey, they were given a message saying the code was unavailable. However, since they were not required to provide an email address or phone number to receive the gift code, we were unable to send codes to these respondents after we had purchased more. Moving forward, we (1) made sure additional codes were loaded in the system and (2) closed the survey link when the ads were paused in between each phase of data collection. This prevented us from getting additional completes when we were attempting to review data quality between phases and from running out of gift codes again. We reviewed the list of email addresses provided so that we could send incentives to those who completed the survey after we ran out of gift codes and found that several respondents entered almost identical emails or ones that were clearly fake and created for the purpose of receiving a gift code. Because we had to keep the email addresses and phone numbers in a separate database per our agreement with OMB, we were unable to link the fake or duplicate email addresses with the questionnaire data in order to remove that data from our final data sets. This is an issue encountered in any selfadministered survey where an interviewer is not present to administer the survey. There is no guarantee that the respondent completing the survey online is being truthful about their eligibility, which leads to a certain level of bias that researchers have to acknowledge while also investigating ways to prevent such fraudulent responses.

67 MEETING YOUTH WHERE THEY ARE

We implemented the security measures discussed earlier to the extent possible and thoroughly reviewed the data to try to identify unusual or atypical responses. For example, after phase 1, we noticed cases in which respondents completed the survey but answered “Don’t know” or “Prefer not to answer” for at least half of the questions in the survey. The survey was updated so that respondents answering the first five questions after the screener with “Don’t know” or “Prefer not to answer” were redirected to the ineligibility screen. The first five questions were selected because they were straightforward and did not ask for sensitive information; therefore, we would expect respondents to know the answer to at least one of these questions. This would preserve our gift codes for valid respondents and remove those who were straightlining through the survey to get the incentive. We used the Facebook Ads Manager tool to set up our own ad campaigns for Facebook and Instagram. Snapchat required that we work through a third-party vendor to post ads. On the day that we posted ads for phase 2, we turned on the ads on Facebook and Instagram and notified the thirdparty vendor to turn on the ads on Snapchat. There was a delay in the vendor getting the ads posted on Snapchat, and within only a few hours, we reached our target number of completes for phase 2 from the Facebook and Instagram ads and needed to pause the ads for review. Therefore, Snapchat ads were not posted for phase 2. Fortunately, we did not experience this delay for phases 3 and 4. In addition to the video and static image ads we posted to Facebook, Instagram, and Snapchat, we posted a text ad using Google AdWords. This was our attempt to reach potential respondents who were not active on social media or who were not “out” and feared clicking on an ad that specifically mentions an AMSM or transgender survey. We received no completed surveys through the Google ad and only about eighty ad clicks, far less than what we received for ads on the other sites. We turned off the Google ad after the second phase due to poor productivity. Facebook and Instagram users are able to click a Share icon on most pictures, posts, and ads. Some of these data are available in the Ads Manager, but we were unable to track information if respondents copied the URL and shared it outside of Facebook/Instagram. Respondents often commented on ads with the names of other users, suggesting they complete the survey. We had one respondent who asked if they could share the URL for the Facebook ad on Twitter because “no one uses Facebook anymore.” We

68 MEETING YOUTH WHERE THEY ARE

provided a unique URL so we could track how many respondents accessed the survey from this ad share. All of the respondents were transgender, and when we compared them to people recruited only through ads, we found that they were older, were more likely to be nonwhite, and identified more often as queer. The results of this ad share are an indicator of the importance of social media networking among smaller subgroups or populations. They also highlight the possibility of using Twitter for future recruitment efforts of this population. We do not know if the respondent also shared the URL via email or other means outside of Twitter. That is one challenge with ad sharing. There is no way to know exactly how many respondents viewed the ad through an ad share, so we can’t make comparisons with the numbers who viewed it through their newsfeed. We closely monitored our sample throughout data collection to ensure that we were getting sufficient samples of our targeted respondents (e.g., black, Hispanic, trans-female). Prior to the final phase of data collection, we noticed our sample consisted of a large number of trans-male respondents when we had hoped to get a larger number of trans-female respondents due to their higher risk of HIV infection. The survey logic was updated for phase 4 so that trans-male respondents would be sent to the ineligibility screen, thereby reserving more completed cases for the trans-female respondents we needed. With limited time and resources, we did not update our ads to reflect this change in eligibility. As expected, several trans-male respondents asked why they were unable to complete the survey. We responded to these comments by explaining the situation and our need to have more of a different subset of respondents. We also selected filter options in Facebook and Instagram at the end of data collection to display the ads on user profiles for those who were likely black. News of a potential Facebook data breach broke as we approached the end of data collection. We experienced delayed ad approval for phase 4 and are unsure whether this was directly related to the data breach, since we assumed that Facebook was concurrently in the process of adjusting its ad management protocols. The data breach, in conjunction with the supposed fake news being posted to social media during the 2016 presidential election, could have ramifications for anyone wanting to post ads to social media in the future, as sites continue to enhance their security and regulations for advertisers.

69 MEETING YOUTH WHERE THEY ARE

Data Editing and Cleaning

After our brief data-collection period, we finished reviewing the data and began cleaning the final data set for delivery to our partners at the CDC. As mentioned previously, we identified a large number of cases where respondents went through the survey providing nonsubstantive responses (e.g., “Don’t Know” and “Prefer not to answer”) for a majority of questions throughout the survey. These cases were removed from our final data sets. We looked closely at open-ended responses and found that some respondents had written clearly inappropriate responses for the questions being asked, raising concerns about questionnaire validity. If the respondent provided these responses to numerous questions, we flagged them as invalid and removed them from the final data set. We also found inconsistencies across responses. For example, a respondent who was thirteen years old answered they were not currently in school. We are unsure if they misunderstood the question being asked (Are you actually in a school setting while completing the survey? versus Are you currently enrolled in school?) or if they were in fact not enrolled in school. Some respondents were also answering that they did not live with their parents but rather lived in their own home or a foster home or had other living arrangements. This could be an indication that we reached a small percentage of the homeless LGBTQ population; however, the numbers were alarmingly high for our sample size of three thousand. We continue to investigate this finding. Beyond these edits, we did not alter responses to the questionnaire. There was much back-and-forth discussion about the extent of data cleaning that would be involved. We decided to leave the data as they were in order to allow us to identify the potential for questionnaire improvements in future iterations and also not to divert from what the respondent intended. For example, we do not know if the thirteen-year-old respondent mentioned earlier was indeed not enrolled in school or if they misunderstood the question. LESSONS LEARNED

This study demonstrated that social media were efficient in the recruitment of a hard-to-reach population of sexual and gender minority youth. As we

70 MEETING YOUTH WHERE THEY ARE

evaluated the quality of the data collected, we began to develop lessons learned and suggestions for how to improve similar efforts. Data security was a primary concern at the outset of this study, but additional security measures may be needed to ensure that researchers are getting responses from their targeted population and not fraudulent respondents. Any time an open survey link is used, there is the possibility of fraudulent respondents gaining access to the survey. And by posting the ads to social media sites or other online locations, there is the possibility of people sharing the URL with a wider audience on that same site or outside the site. RelevantID worked well at preventing duplicate respondents from accessing the survey; however, if someone were to complete the survey on a mobile device and then access it again using a different device, that would negate RelevantID. Another limitation is that RelevantID prevents two or more people from completing the survey on the same device. A potential solution is to provide instructions at the start of the web survey that inform respondents they should complete the survey on their own computer or mobile device. Researchers can also explore using other security measures in conjunction with RelevantID. We found RelevantID to be a valuable and effective security measure despite these limitations and would recommend its use for similar studies. Researchers wanting to offer a similar monetary incentive should have a sufficient number of gift codes prior to the start of data collection in anticipation of a higher-than-expected level of response. They should also be prepared to shut down the open survey link after a short period of time in order to analyze the data collected. This approach was incredibly helpful in our efforts, as we were able to identify issues and make adjustments before continuing with data collection. Social media are complex and require researchers to be agile and adaptive. They also require researchers to collaborate with subject-matter experts about social media and social media advertising. When working with third-party vendors to post ads, it is important to maintain communication and to ensure that they understand the fluidity of your ad campaign—that ads can be turned on and off within a short period of time. It is important to make them aware of your ad schedule and give them sufficient time to respond. When posting ads to multiple sites and if it is important for your study that these ads be posted for the same period of time, be aware that the use of third-party vendors and the ad approval

71 MEETING YOUTH WHERE THEY ARE

process can make it very difficult to ensure that all ads are posted for the same amount of time. This is a more realistic approach when posting only a small number of ads where the approval process will have a smaller effect or where ads will be posted in-house. With any non-probability-based survey, coverage bias remains a concern. As was demonstrated with the large response we received from transmale respondents, the representativeness of the sample recruited may be questionable. Social media and the internet in general are not fully representative of the larger population. Only a certain percentage of the population has internet access, an even smaller percentage uses social media, and different types of people choose to use the different social media platforms. In addition, we were recruiting only from those active on social media during the data collection period, leaving a small percentage excluded from participating. We mentioned the large number of completes we received in the early morning hours of the day at the start of data collection, meaning we continued to hit our daily cap so early that youth who are not awake at those hours could not participate. We also had to consider who were deciding to click the ad and how they may have differed from those who were electing not to and how this would impact coverage bias. There are several options for reducing coverage bias. Social media recruitment can be supplemented with other modes of data collection, such as in-person interviews or mail surveys. However, these are often costly alternatives. Other solutions include recruitment on additional social media platforms, including those frequently used by sexual and gender minorities. Third-party vendors that specialize in social media can be consulted to get information on popular platforms to use (as we did) and also to post ads. CONCLUSION

Research is a trial-and-error process. It is important to try to identify the sources of error and minimize them as much as possible. Despite the challenges we faced with recruiting members of a hard-to-reach population, the process was exciting, as we applied our knowledge and experience to an important health concern with such a group. Targeting sexual and gender minority youth introduced another layer of unknowns. NORC has used Facebook, Twitter, and Google for social media recruitment in prior work,

72 MEETING YOUTH WHERE THEY ARE

but Snapchat and Instagram were new endeavors for us. We continued our work with social media recruitment but with a much younger population. The poor performance of the Google ad is evidence of how recruitment approaches that were previously used for adults may not be effective for youth. Fortunately, we were pleasantly surprised with the large number of responses received with the remaining platforms. The availability of the internet and mobile devices across the United States provides researchers with a means of directly recruiting hard-toreach populations at potentially low costs. The privacy afforded to sexual and gender minorities allows us to gather important information regarding health care and knowledge of available resources to provide better prevention efforts for HIV infection, among many other health threats. Unfortunately, just like any other recruitment method, social media recruitment is not an error-proof means of collecting data. Researchers must be cognizant of the limitations of the research method being used and its impact on data quality and on the ability to make inferences about the larger population—and they must subsequently take steps to address these limitations. Research is a continuous process of learning and improving, and we are responsible for being aware of the limitations.

NOTES 1. NORC at the University of Chicago is an objective, non-partisan research institution that delivers reliable data and rigorous analysis to guide critical programmatic, business, and policy decisions (norc.org). 2. There were other components of the study, including designing tools for clinicians and interviewing adult stakeholders that are beyond the scope of this chapter. 3. GLAAD, “Transgender,” in GLAAD Media Reference Guide, 10th ed. (New York: GLAAD, October 2016), https://www.glaad.org/reference/transgender. 4. C. Ryan and D. Futterman, “Social and Developmental Challenges for Lesbian, Gay and Bisexual Youth,” SIECUS Report 29, no. 4 (2018): 4–18. 5. B. Greene, “Ethnic Minority Lesbians and Gay Men,” in Ethnic and Cultural Diversity Among Lesbians and Gay Men, ed. B. Greene (Thousand Oaks, CA: SAGE, 1997). 6. B. Greene, “Ethnic-Minority Lesbians and Gay Men: Mental Health and Treatment Issues,” Journal of Consulting and Clinical Psychology 62, no. 2 (1994): 243–251. 7. S. Cahill et al., “High Rates of Access to Health Care, Disclosure of Sexuality and Gender Identity to Providers Among House and Ball Community Members in New York City,” Journal of Homosexuality 65, no. 5 (2018): 600–614, https://doi.org/10.10 80/00918369.2017.1328221; T. E. Freese et al., “Real-World Strategies to Engage and Retain Racial-Ethnic Minority Young Men Who Have Sex with Men in HIV Prevention Services,” AIDS Patient Care and STDs 31, no. 6 (2017): 275–281.

73 MEETING YOUTH WHERE THEY ARE

8. D. A. Hickson et al., “Sexual Networks, Dyadic Characteristics, and HIV Acquisition and Transmission Behaviors Among Black Men Who Have Sex with Men in 6 US Cities,” American Journal of Epidemiology 185 (2017): 786–800; C. A. Latkin et al., “Social Network Factors as Correlates and Predictors of High Depressive Symptoms Among Black Men Who Have Sex with Men in HPTN 061,” AIDS and Behavior 21, no. 4 (2017): 1163–1170. 9. H. A. Joseph et al., “HIV Testing Among Sexually Active Hispanic/Latino MSM in Miami-Dade County and New York City: Opportunities for Increasing Acceptance and Frequency of Testing,” Health Promotion and Practice 15, no. 6 (2014): 867–880; S. B. Mannheimer et al., “Infrequent HIV Testing and Late HIV Diagnosis Are Common Among a Cohort of Black Men Who Have Sex with Men in 6 US Cities,” Journal of Acquired Immune Deficiency Syndromes 67, no. 4 (2014): 438–445. 10. M. J. Stern et al., “Social Media Recruitment for a Web Survey of Sexual Minorities: An Evaluation of Methods Used and Resulting Sample Diversity” (under review). 11. Stern et al., “Social Media Recruitment.” 12. Pre-exposure prophylaxis (PrEP) and post-exposure prophylaxis (PEP) are antiretroviral medications taken before or after exposure to HIV to prevent becoming infected. 13. J. Cantrell et al., “Recruiting and Retaining Youth and Young Adults: Challenges and Opportunities in Survey Research for Tobacco Control,” Tobacco Control 27, no. 2 (2018): 147–154; National Research Council, Nonresponse in Social Science Surveys: A Research Agenda (Washington, DC: National Academies Press, 2013); Pew Research Center, “Assessing the Representativeness of Public Opinion Surveys,” May 15, 2012, http://www.people-press.org/2012/05/15/assessing-the -representativeness-of-public-opinion-surveys/. 14. M. J. Stern, F. LeClere and E. Fordyce, Web Surveying Design and Implementation (Thousand Oaks, CA: SAGE Research Methods, 2019), https://methods.sagepub. com/foundations/web-surveying-design-and-implementation. 15. Stern et al., “Social Media Recruitment.” 16. M. Anderson and J. Jiang, “Teens, Social Media and Technology,” Pew Research Center, May 31, 2018, http://www.pewinternet.org/2018/05/31/teens-social-media -technology-2018/. 17. M. J. Stern, “Active Social Media Surveying” (talk given at the opening plenary, FedCASIC, Bureau of Labor Statistics, Washington, DC, 2013); M. P. Couper, “The Future Modes of Data Collection,” Public Opinion Quarterly 75 (2011): 889–908. 18. J. Huang et al., “A Cross-Sectional Examination of Marketing of Electronic Cigarettes on Twitter,” Tobacco Control 23, no. S3 (2014): iii26–iii30. 19. M. J. Stern et al., “Effective Sampling from Social Media Sites and Search Engines for Web Surveys: Demographic and Data Quality Differences in Surveys of Google and Facebook Users,” Social Science Computer Review 35, no. 6 (2017): 713–732. 20. J. S. Gordon et al., “Successful Participant Recruitment Strategies for an Online Smokeless Tobacco Cessation Program,” Nicotine and Tobacco Research 8 (2006): S35–S41; A. L. Graham et al., “Characteristics of Smokers Reached and Recruited to an Internet Smoking Cessation Trial: A Case of Denominators,” Nicotine and Tobacco Research 8 (2006): S43–S48. 21. Stern and Fordyce, “Web Surveying Design and Implementation.” 22. Williams Institute, “New Estimates Show That 150,000 Youth Ages 13 to 17 Identify as Transgender in the US,” January 17, 2017, https://williamsinstitute.law.ucla.edu

74 MEETING YOUTH WHERE THEY ARE

/research/transgender-issues/new-estimates-show-that-150000-youth-ages-13-to -17-identify-as-transgender-in-the-us/. 23. R. Schelenz, “How the Census Overlooks the LGBTQ Community,” University of California, 2018, https://www.universityofcalifornia.edu/news/census-overlooks-lgbtq -community. 24. S. Michaels, M. Stern, and M. Zheng, Comparison of START AMSM with Comparable Samples from Two National Probability Surveys: YRBS and NSFG Internal report for the Centers for Disease Control and Prevention, Division of Adolescent and School Health, 2018. 25. Michaels, Stern, and Zheng, Comparison, 2018. 26. D. Zeng et al., “Social Media Analytics and Intelligence,” IEEE Intelligent Systems 25, no. 6 (2010). 27. S. L. Craig and L. McInroy, “You Can Form a Part of Yourself Online: The Influence of New Media on Identity Development and Coming Out for LGBTQ Youth,” Journal of Gay and Mental Health 18, no. 1 (2014): 95. See also L. Hillier and L. Harrison, “Building Realities Less Limited than Their Own: Young People Practicing SameSex Attraction on the Internet,” Sexualities 10, no. 1 (2007): 82–100; C. J. Pascoe, “Resource and Risk: Youth Sexuality and New Media Use,” Sexuality Research and Social Policy 8 (2011): 5–17. 28. D. J. Solomon, “Conducting Web-Based Surveys,” Practical Assessment Research and Evaluation 7, art. 19 (2001), https://doi.org/10.7275/404h-z428; E. Hargittai, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites,” ANNALS of the American Academy of Political and Social Science 659, no. 1 (2015): 63–76, https://doi.org/10.1177/0002716215570866. 29. M. Duggan and A. Smith, “Social Media Update 2013,” Pew Research Center, 2013, http://pewinternet.org/Reports/2013/Social-Media-Update.aspx. 30. Stern, “Active Social Media Surveying.” 31. Anderson and Jiang, “Teens, Social Media and Technology.” 32. Anderson and Jiang, “Teens, Social Media and Technology.” 33. Stern et al., “Social Media Recruitment.” 34. Also posted were these two videos: “2 Cross-Dressing Men Getting Dressed as Women & Putting on Make-Up UK—April, 2016,” Helen Fields Hotelfoxtrot, https://www .shutterstock.com/video/clip-16902733-4k-2-cross-dressing-men-getting-dressed -women; “People Clap, Cheer and Hold Wave Rainbow Flags as Gay Pride Parade Marchers Walk Stock Video,” VideoPowWow, iStock by Getty Images, accessed 3/9/2020, https://www.istockphoto.com/video/people-clap-cheer-and-hold-wave -rainbow-flags-as-gay-pride-parade-marchers-walk-gm658669598-123314141. 35. B. Mustanski, “Ethical and Regulatory Issues with Conducting Sexuality Research with LGBT Adolescents: A Call to Action for a Scientifically Informed Approach,” Archives of Sexual Behavior 40, no. 4 (2011): 673–686, https://doi.org/10.1007 /s10508-011-9745-1. 36. T. Buskirk, J. Joseph, and K. Nylund, “Surveying Teens: Issues Related to Data Collection in Substance Abuse Surveys,” American Association for Public Research (2002): 343–345; L. D. Johnston and P. M. O’Malley, “Issues of Validity and Population Coverage in Student Surveys of Drug Use,” in Self-Report Methods of Estimating Drug Use: Meeting Current Challenges to Validity, ed. B. A. Rouse, N. J.

75 MEETING YOUTH WHERE THEY ARE

Kozel, and L. G. Richards (Rockville, MD: National Institute on Drug Abuse, 1985), 31–54. 37. Consent is the authorization or approval from a child’s parent or legal guardian to participate in research, while assent is the child’s agreement to participate where he or she may not be legally authorized or lack sufficient understanding to give consent competently (Levy, Larcher, and Kurz, 2003). 38. A. LaFrance, “The Internet Is Mostly Bots,” The Atlantic, January 31, 2017, https:// www.theatlantic.com/technology/archive/2017/01/bots-bots-bots/515043/. 39. Data collection was conducted in four phases. A phase is a period of time when the ads were live on social media. We paused the ads after each phase to review the data and implement changes as needed. Phases 1 and 2 were less than twenty-four hours each, while phases 3 and 4 were one week each. 40. Facebook owns Instagram, and, therefore, the Facebook Ads Manager allows you to manage ads on Instagram as well. Ads on Facebook and Instagram go through the same review process. 41. Facebook, “Advertising Policies,” accessed May 22, 2019,

REFERENCES Anderson, M., and J. Jiang. “Teens, Social Media and Technology 2018.” Pew Internet, 2018. http://www.pewinternet.org/2018/05/31/teens-social-media-technology-2018/. Buskirk, T., J. Joseph, and K. Nylund. “Surveying Teens: Issues Related to Data Collection in Substance Abuse Surveys.” American Association for Public Research (2002): 343–345. Cahill, S., S. Trieweiler, J. Guidry, N. Rash, L. Stamper, K. Conron, N. Turcotte, I. Gratch, and P. Lowery. “High Rates of Access to Health Care, Disclosure of Sexuality and Gender Identity to Providers Among House and Ball Community Members in New York City.” Journal of Homosexuality 65, no. 5 (2017). https://doi.org/10.1080 /00918369.2017.1328221. Cantrell, J., E. C. Hair, A. Smith, M. Bennett, J. M. Rath, R. K. Thomas, M. Fahimi, J. M. Dennis, and D. Vallone. “Recruiting and Retaining Youth and Young Adults: Challenges and Opportunities in Survey Research for Tobacco Control.” Tobacco Control 27, no. 2 (2018): 147–154. Couper, M. P. “The Future of Modes of Data Collection.” Public Opinion Quarterly 75 (2011): 889–908. Craig, S. L., and L. McInroy. “You Can Form a Part of Yourself Online: The Influence of New Media on Identity Development and Coming Out for LGBTQ Youth.” Journal of Gay and Lesbian Mental Health 18, no. 1 (2014): 95–109. Duggan, M., and A. Smith. “Social Media Update 2013.” Pew Research Center, 2013. http://pewinternet.org/Reports/2013/Social-Media-Update.aspx. Facebook. “Advertising Policies.” Accessed May 22, 2019. https://www.facebook.com /policies/ads#. Freese, T. E., H. Padwa, B. T. Oeser, B. A. Rutkowski, and M. T. Schulte. “Real-World Strategies to Engage and Retain Racial-Ethnic Minority Young Men Who Have Sex

76 MEETING YOUTH WHERE THEY ARE

with Men in HIV Prevention Services.” AIDS Patient Care and STDs 31, no. 6 (2017): 275–281. GLAAD. “Transgender.” In GLAAD Media Reference Guide. 10th ed. New York: GLAAD, October 2016. https://www.glaad.org/reference/transgender. Gordon, J. S., L. Akers, H. H. Severson, B. G. Danaher, and S. M. Boles. “Successful Participant Recruitment Strategies for an Online Smokeless Tobacco Cessation Program.” Nicotine and Tobacco Research 8 (2006): S35–S41. Graham, A. L., B. C. Bock, N. K. Cobb, R. Niaura, and D. B. Abrams. “Characteristics of Smokers Reached and Recruited to an Internet Smoking Cessation Trial: A Case of Denominators.” Nicotine and Tobacco Research 8 (2006): S43–S48. Greene, B. “Ethnic Minority Lesbians and Gay Men.” In Ethnic and Cultural Diversity Among Lesbians and Gay Men, ed. B. Greene. Thousand Oaks, CA: SAGE, 1997. Greene, B. “Ethnic-Minority Lesbians and Gay Men: Mental Health and Treatment Issues.” Journal of Consulting and Clinical Psychology 62, no. 2 (1994): 243–251. Hargittai, E. “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites.” Annals of the American Academy of Political and Social Science 659, no. 1 (2015): 63–76. https://doi.org/10.1177/0002716215570866. Hickson, D. A., L. A. Mena, L. Wilton, H. V. Tieu, B. A. Koblin, V. Cummings, C. Latkin, and K. H. Mayer. “Sexual Networks, Dyadic Characteristics, and HIV Acquisition and Transmission Behaviors Among Black Men Who Have Sex with Men in 6 US Cities.” American Journal of Epidemiology 185 (2017): 786–800. Hillier, L., and L. Harrison. “Building Realities Less Limited than Their Own: Young People Practicing Same-Sex Attraction on the Internet.” Sexualities 10, no. 1 (2007): 82–100. Huang, J., R. Kornfield, G. Szczypka, and S. L. Emery. “A Cross-Sectional Examination of Marketing of Electronic Cigarettes on Twitter.” Tobacco Control 23, no. S3 (2014): iii26–iii30. Johnston, L. D., and P. M. O’Malley. “Issues of Validity and Population Coverage in Student Surveys of Drug Use.” In Self-Report Methods of Estimating Drug Use: Meeting Current Challenges to Validity, ed. B. A. Rouse, N. J. Kozel, and L. G. Richards, 31–54. NIDA Research Monograph 57. Rockville, MD: National Institute on Drug Abuse, 1985. Joseph, H. A., L. Belcher, L. O’Donell, M. I. Fernandez, P. S. Spikes, and S. A. Flores. “HIV Testing Among Sexually Active Hispanic/Latino MSM in Miami-Dade County and New York City: Opportunities for Increasing Acceptance and Frequency of Testing.” Health Promotion and Practice 15, no. 6 (2014): 867–880. LaFrance, A. “The Internet Is Mostly Bots.” The Atlantic, January 31, 2017. https://www .theatlantic.com/technology/archive/2017/01/bots-bots-bots/515043/. Latkin, C. A., H. Van Tieu, S. Fields, B. S. Hanscom, M. Connor, B. Hanscom, S. A. Hussen et al. “Social Network Factors as Correlates and Predictors of High Depressive Symptoms Among Black Men Who Have Sex with Men in HPTN 061.” AIDS and Behavior 21, no. 4 (2017): 1163–1170. Levy, M. D. L., V. Larcher, and R. Kurz. “Informed consent/assent in children. Statement of the Ethics Working Group of the Confederation of European Specialists in Paediatrics (CESP).” European Journal of Pediatrics 162, no. 9 (2003): 629–633. Mannheimer, S. B., L. Wang, L. Wilton, H. Van Tieu, C. del Rio, S. Buchbinder, S. Fields et al. “Infrequent HIV Testing and Late HIV Diagnosis Are Common Among a

77 MEETING YOUTH WHERE THEY ARE

Cohort of Black Men Who Have Sex with Men in 6 US Cities.” Journal of Acquired Immune Deficiency Syndromes 67, no. 4 (2014): 438–445. Michaels, S., M. Stern, and M. Zheng. Comparison of START AMSM with Comparable Samples from Two National Probability Surveys: YRBS and NSFG. Report Prepared for the Centers for Disease Control and Prevention, Division of Adolescent and School Health, 2018. Mustanski, B. “Ethical and Regulatory Issues with Conducting Sexuality Research with LGBT Adolescents: A Call to Action for a Scientifically Informed Approach.” Archives of Sexual Behavior 40 (2011): 673–686. https://doi.org/10.1007/s10508-011-9745-1. National Research Council. Nonresponse in Social Science Surveys: A Research Agenda. Washington, DC: National Academies Press, 2013. Pascoe, C. J. “Resource and Risk: Youth Sexuality and New Media Use.” Sexuality Research and Social Policy 8 (2011): 5–17. Pew Research Center. “Assessing the Representativeness of Public Opinion Surveys.” May 15, 2012. http://www.people-press.org/2012/05/15/assessing-the-representativeness -of-public-opinion-surveys/. Ryan, C., and D. Futterman. “Social and Developmental Challenges for Lesbian, Gay and Bisexual Youth.” SIECUS Report 29, no. 4 (2001): 4–18. Schelenz, R. “How the Census Overlooks the LGBTQ Community.” University of California, 2018. https://www.universityofcalifornia.edu/news/census-overlooks-lgbtq -community. Solomon, D. J. “Conducting Web-Based Surveys.” Practical Assessment Research and Evaluation 7, art. 19 (2001). https://doi.org/10.7275/404h-z428. Stern, M. J. “Active Social Media Surveying.” Talk given at the opening plenary at FedCASIC, Bureau of Labor Statistics, Washington, DC, 2013. Stern, M. J., I. Bilgen, C. McClain, and B. Hunscher. “Effective Sampling from Social Media Sites and Search Engines for Web Surveys: Demographic and Data Quality Differences in Surveys of Google and Facebook Users.” Social Science Computer Review 35, no. 6 (2017): 713–732. Stern, M. J., and E. Fordyce. Web Survey Design and Administration. Quantitative Applications in the Social Sciences Series. Thousand Oaks, CA: SAGE (forthcoming). Stern, M. J., E. Fordyce, M. Michaels, A. Schlissel, C. Hansen, S. Avripas, M. Heim Viox, R. Dunville, C. Harper, and M. Johns. “Social Media Recruitment for a Web Survey of Sexual Minorities: An Evaluation of Methods Used and Resulting Sample Diversity” (forthcoming).University of California. (June 26, 2018). Williams Institute. “New Estimates Show That 150,000 Youth Ages 13 to 17 Identify as Transgender in the US.” January 17, 2017. https://williamsinstitute.law.ucla.edu /research/transgender-issues/new-estimates-show-that-150000-youth-ages-13-to-17 -identify-as-transgender-in-the-us/. Zeng, D., H. Chen, R. Lusch, and S. H. Li. “Social Media Analytics and Intelligence.” IEEE Intelligent Systems 25, no. 6 (2010): 13–16.

Chapter Four

QUALITATIVE SAMPLING AND INTERNET RESEARCH LEE HUMPHREYS

The social studies of the internet, and of new media more broadly, include various methodological approaches. Within what might be categorized as qualitative research, there is still a variety of methods used to explore the adoption, use, impact, meaning, affordances, and (infra)structures of communication technologies. In this chapter, I use the term qualitative research to describe an approach that systematically and rigorously analyses social phenomena using interpretive perspectives and practices.1 Typical methods within such research often include participant observation, in-depth interviews, field observation, focus groups, and textual analysis. One of the great things about working in the field of internet studies is that scholars approach it using a variety of methods, including surveys, experiments, historiography, network analysis, and data science, among others. As such, however, we need to make our methods and research choices as transparent and explicit as possible given that the people reading our work may not be practitioners of our methods. Methodological transparency is slightly different from the move toward “open science,” which involves the preregistration of hypotheses and has largely come in response to the “replication crisis” within the field of psychology.2 Transparency is a hallmark of rigorous research using qualitative methods,3 but sometimes researchers assume that their audiences share their methodological training and epistemologies, and as a result, they leave key decisions and choices

79 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

unarticulated to save space for more “findings.” Sometimes researchers do not engage in careful and rigorous qualitative methodological discussion and reflection, making it harder for colleagues to understand the parameters of their study and to learn from it. If we want our work to be read and considered by researchers across the methodological spectrum, then I argue that we need great methodological description and reflection. The point is not to fetishize method to make our research feel “more scientific.” Instead, it is to help both qualitative and quantitative researchers evaluate the quality of our research. I am someone who primarily uses qualitative methods, but I have training in quantitative social science and work in an interdisciplinary department with people who do not all have training in qualitative methods. I write this chapter explicitly for researchers like myself who work among and with others who may not conduct research using qualitative methods and yet want them to understand and appreciate our scholarship. The social phenomena that qualitative researchers tend to study are complex and typically impossible to examine in their entirety; therefore, we must sample4—i.e., we must restrict our observation to certain cases rather than gathering data about all existing occurrences. Throughout this chapter, I describe a few of the most prominent forms of sampling within qualitative internet research. I also describe some challenges that can emerge for researchers employing them. Using some of my own research on mobile and social media, I describe several examples that demonstrate how and why different sampling techniques might be appropriate for different kinds of internet studies. I conclude with a discussion of some novel methodological opportunities for qualitative internet scholars. SAMPLING: A CONTESTED TERM

Within qualitative methods, there are varying perspectives on the term sampling itself. For example, Emmel laments its use in qualitative research, arguing that it brings assumptions that there are predefined populations or that everyone within such a population has a measurable or nonzero chance of participating in a research study.5 Instead, he prefers thinking about choosing cases within qualitative research, as do many other qualitative scholars.6 Yet Emmel resigns himself to using sampling because “this is

80 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

the way most writers on qualitative research methods talk about selecting units to be included in research.”7 I am not resigned to using the term sampling. I strategically employ it because I think it is very important for qualitative researchers working in interdisciplinary areas such as internet studies, media studies, and communication to use language to describe their research processes such that others can interpret and evaluate the quality of our research. I use terms like sampling and validity to enhance the ability of others in our field to read, understand, and evaluate my research. Without shared social science methodological language, I think we create and widen unnecessary divides between qualitative and quantitative work. That said, I do not employ such language in all of my research publications. Depending on the publication outlet, the audience, and the kind of study being discussed, I will be more or less explicit in my sampling processes. One of the clearest distinctions for me was when I wrote my book on the ways people use media technologies to document their lives and share them with others.8 In writing that book, I drew on insights across a variety of my empirical research studies and then developed new cases that exemplified the key arguments in the book. This was a very different research and writing process than that which I had used for my peerreviewed research articles, where I followed a more traditional social science research paradigm and format and was much more explicit about my methodological choices. Sampling as Process

Whether you decide to use the terminology of sampling or not, the process of selecting units for study in qualitative research is never just one discrete choice or decision on the part of the researcher at the start of a project, despite what the narrative of a research article might suggest. Instead, sampling in qualitative research can best be described as a process that involves many choices over time. There is often some flexibility to qualitative sampling because of the steps necessary in identifying units of study.9 This process frequently begins with a theoretical or empirical question. For example, in my research I wanted to study how the use of mobile phones changes social interaction. Sometimes sampling involves gaining access to a particular community or population, which may need

81 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

to be explicitly attained or negotiated in order to collect data.10 I specifically decided to look at the use of mobile phones in urban public spaces in part because it was a more accessible site of study than more private spaces, such as homes or cars. Recruitment of participants may also become a sampling issue if people do not want to talk to you. For example, when I conducted intercept interviews of mobile phone users at the 30th Street Station in Philadelphia, I quickly realized people who had longer waits for their trains were more likely to agree to participate than those who were rushing to catch a train. Therefore, I tended not to interview daily commuters who had efficient travel routes with little wait time and tended to interview either tourists or people who were making a trip that was out of the ordinary for them. The train station proved a fruitful site to conduct field observations and interviews due to the high volume of mobile phone users, but they were not equally accessible to me. In many ways, sampling can be thought of as the framework guiding decisions regarding access and recruitment. At the same time, access and recruitment can shape our sampling choices regarding relevance and importance of data. Thoughtful and reflective choices about our sampling, access, and recruitment can all enhance the validity of our research. One might think of validity as an assessment of Are you actually measuring what you think you’re measuring?—or at least that is what we often teach our undergraduate students. However, that definition does not apply to a method where there is no standard of measurement (such as weight or height). Therefore, validity that is appropriate for qualitative research would be defined as “the degree to which the finding is interpreted in the correct way.”11 Validity in qualitative work can be thought of as a kind of accuracy or correctness of our findings. Am I right in my interpretations? Sometimes the more important and productive question for qualitative researchers is How might I be wrong?12 In the example study of how mobile phone use changes social interactions in public, I needed to sample different kinds of public spaces, not just train stations, to see how people were using their phones in the presence of others. Indeed, I found the transitory nature of train travel seemed to encourage mobile phone use more than was the case with dyads at a café. Sampling strategies can help a researcher reflect on questions of validity and enhance the quality of their research.

82 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

COMMON SAMPLING APPROACHES IN QUALITATIVE WORK

Sampling in qualitative research is what would be called nonprobability sampling.13 Unlike in some study designs where everyone within a population has a certain probability of being chosen to participate in the sample, nonprobability sampling uses nonstatistical criteria for identifying potential participants or units to study. That said, within qualitative methods there are different kinds of nonprobability sampling depending on the particular epistemological or theoretical approach you take. In this section, I identify two common kinds of sampling: theoretical and purposive. Theoretical Sampling

Theoretical sampling is the method of data collection most closely related to grounded theory.14 It is a particularly open, flexible, and inductive sampling strategy. One of the key aspects of theoretical sampling is that sampling and data analysis are interactive and nonlinear. That is, analysis does not come after data collection is complete. Instead, theoretical sampling is part of a cyclical pattern of theoretical sampling, data collection, and data analysis and then further theoretical sampling, data collection, and data analysis—and so on. Another key aspect of theoretical sampling is that sampling is based on the identification of important concepts or units, not necessarily people.15 Therefore, data collection must come before and after data analysis such that the researcher uses the analysis to inform the next stage of data collection. Further, data collection enables the researcher to fill in theoretical gaps in their understanding of a phenomenon. Sometimes this means collecting new data; sometimes it means going back to previously analyzed data but examining it through a different lens. “If analysts are not able to collect additional data to fill in gaps, then the gaps become part of the limitation of the study.”16 Ideally, theoretical sampling is complete when all new data are accounted for within the emergent and integrated framework of the phenomenon. Challenges in theoretical sampling. Logistically, at least in the United States, such uncertainty and openness in the sampling and recruitment stages do not align with priorities and even requirements of ethics or institutional review boards (IRBs)—and sometimes of faculty advisors or collaborators.

83 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

Typically, applications to these reviewing bodies require recruitment procedures and materials as well as clear identification of a target population before any data collection can start. My own institution, for example, also requires the wording of recruitment emails or postings as part of our institutional review to ensure we are not being deceptive or coercive in our recruitment practices. While finding participants for studies can sometimes be challenging, active participation in research should always be voluntary, and these institutional mechanisms help to ensure that. The iterative nature of theoretical sampling, however, implies that you may not be able to clearly identify all of your recruitment strategies or sources at the start of the project, which is when IRBs ask for them. Therefore, I am a big proponent of submitting amendments if and when sampling shifts away from the original conception, and I frequently do so. For example, when we realized after collecting some data that people were using a different platform to engage in the behavior we were studying, we submitted an IRB amendment to recruit users from the new platform to try to address better our research questions. The nice thing about amendments is that they tend to be reviewed much more quickly than original applications; at least this is the case at the research institutions at which I have worked. One of the critiques of theoretical sampling, and grounded theory more broadly, is that it calls for a very inductive approach, one that is not informed by previous theoretical frameworks because they can impinge on one’s ability to discover new phenomena. For the majority of graduate students and junior faculty, a “true” grounded theory study as originally described by Glaser and Strauss is not feasible.17 We always “stand on the shoulders of giants”18—or we certainly should. Even though researchers of internet studies often explore new communication technologies that may have never been studied before, we bring to these phenomena larger questions fundamental to the social sciences, such as social interaction, identity, politics, health, inequality, and political economy. Theoretical sampling therefore needs to be informed enough by previous research that it does not duplicate previous findings but not to the point that it is so shackled by established theory that novel or contrasting insights cannot emerge. Another potential challenge of theoretical sampling is the amount of time and resources needed to inductively and iteratively sample various experiences, artifacts, or situations. However, I have found two particularly

84 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

helpful concepts from theoretical sampling that can enhance all kinds of sampling. In particular, Glaser and Strauss’s notions of minimizing and maximizing difference are helpful strategies in making choices about recruiting or collecting samples for inquiry.19 Because qualitative samples tend to be smaller than quantitative samples (typically because of the depth of data collected), minimizing differences in your sample is generally a recommended course of action. “Minimizing differences among comparison groups also helps establish a definite set of conditions under which a category exists, either to a particular degree or as a type.”20 Often the goal of qualitative research is to understand and explain an experience or phenomenon and the set of conditions under which it occurs. Minimizing difference within a sample along categories or demographics that you think might matter is important because one typically cannot make categorical comparisons with small samples in qualitative work. For example, if you are doing an interview study on dating apps, you might want to minimize the differences in gender or sexuality among your participants. Men and women or people in their twenties and fifties might have very different strategies and understandings of a dating app. How do you know that any differences you find are necessarily attributed to gender or age? Unless you are collecting a lot of interview data (I think anything over thirty in-depth interviews is a lot of data to include), minimizing difference is a common strategy to ensure richness of data such that themes or categories both emerge and are substantiated. That said, sometimes maximizing differences in your sample can also be helpful. More specifically, maximizing difference can help researchers establish the boundaries of their research claims by bringing “out the widest possible coverage on ranges, continua, degrees, types,” and so on.21 Lindlof and Taylor also recommend maximizing difference as a sampling technique.22 For example, in a study of the dating app Grindr, the authors argued that such apps are used differently based on population density.23 So while they minimized differences in their sample based on gender, sexuality, and app (i.e., by only studying men who have sex with men and use Grindr), they maximized difference by recruiting Grindr users in a large city and in a rural college town. The authors found that the practices surrounding gay visibility are very different depending on those locations, in part due to differences in population density. In this example, the rural/ urban difference generated significant insights for the researchers.

85 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

Examples of theoretical sampling. I have used theoretical sampling in two different projects, but I thought of sampling differently across the projects. In my first research project (and as my master’s thesis, it was my first qualitative study), I examined mobile phone use in public spaces.24 It was an observational field study where I was exploring how mobile phone use in public impacted social interactions. I began observing public spaces in Philadelphia quite broadly. As I began to analyze my field notes, however, I began to focus and shift my observations in two important ways. First, I was originally looking at mobile phone use in general, watching phone use in public by people who were alone as well as those who were with other people. Over time, however, I became increasingly interested in dyadic interactions—that is, situations where people were using cell phones while with another person. Second, at the beginning of the study, I sought out public spaces where I thought mobile phone use would be highly prevalent. More specifically, I sought out public spaces where mobile phones were socially regulated, such as certain trains, classrooms, libraries, and theaters, and then observed spaces just beyond these regulated locales, such as lobbies, entryways, and hallways. But as my interest in dyads emerged, I began to shift focus to different kinds of public spaces, including cafés, restaurants, and bars, where mobile phone regulation was not necessarily so strict. This enabled me to explore the ambiguities regarding emerging cell phone norms in these places, which actually led to richer and more interesting findings. The second example of my use of theoretical sampling in a project was an extension of my dissertation (yes, you never really leave your dissertation). For my dissertation in the mid-2000s, I conducted a multi-case-study analysis of three mobile social networks: Dodgeball, SMS.ac, and BEDD.25 Based on the findings from those analyses, I sampled two additional platforms, Socialight and Twitter, to see if the categories of mobile social network use that emerged in my dissertation applied to these two new mobile platforms. This was my way of maximizing difference by selecting different kinds of apps. In total, I sampled five different mobile social network services over the course of five years to develop and refine the theoretical framework.26 My theoretical sampling for this analysis emerged by conceptualizing physical and social space among typical users of mobile platforms (see figure 4.1). For example, the app BEDD used Bluetooth to enable

86

Physically distant

SMS.ac

Dodgeball Twitter

Physically close

Outer space

Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

Socialight

BEDD

Socially close

Socially distant Inner space

FIGURE 4.1 Theoretical sampling of cases along inner and outer spatial characteristics.

strangers to meet and message one another, so they were physically proximate but socially distant, whereas Dodgeball enabled friends to share their locations, so they were socially close but physically distant. Socialight enabled users to leave GPS-based messages for friends, so people knew each other and were in the same space but just not at the same time. Twitter enables both socially distant and close people and physically distant and close people to communicate. SMS.ac was kind of like a mobile-based chat forum where people could leave and respond to messages with other users from all over the world. Throughout those five years, I published substantive theoretical articles based on single case studies,27 but it was not until I was able to compare across cases that I developed a more formal grounded theory of mobile social networks.28 In my research examples, theoretical sampling was used at different levels. In the first study of mobile phones in public, I collected initial observational data, conducted analyses, and then refined where I collected data next. My data collection occurred over the course of one year, which gave me time to engage in an iterative process of data collection and analysis. In the second example, I collected data regarding mobile social network use from three different mobile social networks. Based on the analyses

87 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

from those, I then sought out two additional, different cases for further comparison. In this second example, theoretical sampling occurred at a more abstract or theoretical level based on the presumed physical and social distance of users. Despite these differences, both examples involved an iterative process of collecting and analyzing data and then collecting and analyzing additional data. Purposive Sampling

Joseph Maxwell describes purposive sampling as a strategy where “particular settings, persons or activities are selected deliberately to provide information that is particularly relevant to your questions and goals, and that can’t be gotten as well from other choices.”29 Lindlof and Taylor describe a similar sampling strategy, which they call criterion sampling.30 They define it as the development of a set of criteria that determines whether someone or something will be included in a potential sample. Often in internet studies, purposive sampling is based on the criterion of technology use; thus, in our study of augmented reality (AR), we recruited users of Layar, one of the first commercially available AR mobile applications.31 In Gonzales and colleagues’ study of medical crowdfunding, they recruited users from Fundly.com and YouCaring.com.32 Interestingly, these researchers note that they tried to recruit users of GoFundMe.com, a larger crowdfunding website, but the administrators blocked their invitations. Therefore, they contacted administrators of Fundly.com and YouCaring .com before sending out recruitment messages. This points to the distinction between recruiting through the website or service itself versus recruiting through other means for users of a particular technology. For example, we are conducting a study about gender norms and experiences on the mobile dating app Bumble. We are recruiting initially for focus groups via our university’s online research participant pool and then following up with social media recruitment via Twitter and Facebook for in-depth interviews. Therefore, while we are recruiting based on Bumble use, we are not recruiting users via Bumble itself. These are sampling choices with tradeoffs. On the one hand, recruiting via the platform that researchers are studying enables them to reach the widest variety of users—plus any implicit or explicit endorsement from the platform itself may lend credibility to recruitment requests. On the other hand, often recruitment can be a violation of

88 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

the terms of service, and platforms or web administrators may not be keen to allow recruitment for research that does not directly benefit their organization. Additionally, recruiting through the platform itself may introduce bias toward certain types of users—e.g., the more engaged ones—which may be problematic for the study depending on the research questions of interest. When sampling via technology use, the relevant practices of participants may extend beyond the one service or technology used as a sampling criterion. For example, we conducted a study of small business owners’ social media use.33 Originally, the study was focused on Foursquare use, and that was our sampling criterion; however, when we finally talked to participants, it was clear that they did not conceptualize Foursquare as unique. Instead, it was just one of the myriad of social media platforms they had to manage as part of their business. Therefore, in our final write-up of the study, we discussed social media more broadly, rather than just Foursquare, which was the inclusion criterion.

CHALLENGES IN PURPOSIVE SAMPLING

The biggest challenge in purposive sampling is that the criteria identified and used for selection may be too broad (e.g., users of Facebook or Twitter) such that purposive sampling may easily turn into convenience sampling, which is seldom methodologically ideal. For example, studying Facebook users by studying college students in 2020 would not make much sense unless the researcher was explicitly interested in this demographic. Similarly, in the example of our Bumble study, the initial focus groups were really just a convenience sample of college students, which can be fine for generating initial questions and areas for further exploration; however, the second stage, using online strategies for recruitment, aimed at interviewing broader demographics of Bumble users. Another way convenience sampling manifests is in the geographic boundedness of the researcher. For most qualitative research, colocation between researcher and participants is helpful because in-person data collection helps to build trust and rapport and allows observation of people’s nonverbal and visual cues in their environment.34 There are always tradeoffs in research, and very often in qualitative research, we choose richness of face-to-face data collection over geographical breadth of data collection. Nevertheless, explicitly articulating where and when data are collected

89 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

helps others, including future readers, situate the research findings. Generally speaking, I think non–North American researchers are better at not taking for granted the geographic boundedness of their research. A strategy I have used to ensure that my purposive sample is not a convenience sample is to forgo the richness of face-to-face data collection, recruit participants online, and conduct phone interviews. This is actually pretty important for someone like myself who lives in a rural college town in the United States and often studies early adopters. In some of my studies, I have recruited users through user forums or Facebook pages. These are dedicated webpages where users of a particular platform discuss and share tips online. I have used these forums/pages to recruit for several reasons. First, people on these sites have already demonstrated they like to talk about the technology, which makes them potentially willing to talk about it in an interview. Second, people on these sites are probably active users of the technology who are likely knowledgeable about the technology because of their use of it. While their experiences may not be the experiences of average users, forum participants are also already engaging in metacommunication and reflection about the technology, which is often what I ask of my research participants in interviews. Thus, these users represent an active and reflective group of technology users who have a plethora of experiences and insights from which to learn. For example, in our study of iPhoneography—that is, photographers’ use of the iPhone as their artistic medium of choice35—we recruited through a variety of websites where people associated with iPhoneography and where people talked about and engaged in their artistic practice, such as the Hipstamatic Facebook page and Flickr group. In another study, we were examining the competitive features of Foursquare and recruited active users through a variety of websites where users would interact about Foursquare.36 We defined an active user as having been a Foursquare mayor, meaning that they had checked into a location more times than anyone else in the past sixty days. “Thus our participants were not only active on Foursquare, but also actively discussing the service outside of Foursquare itself.”37 Purposive and Then Snowball Sampling

It is not uncommon in qualitative methods to combine sampling techniques. Most frequently, I begin with purposive sampling and then engage

90 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

in snowball sampling. Snowball sampling is a strategy of asking potential or current research participants if they know of or could recommend other people who share characteristics or experiences relevant to the study.38 This has manifested in my work in two different ways. First, if I am sending an individual a recruitment message directly, I will ask if they will participate and if they know someone else who fits the criteria of the study who might be a good person to include in the study. This is useful, as it enables the recipient to help in my research not just by directly participating in the study but also by recommending someone else. This can be considered a face-saving technique, as the respondent who declines to participate can still help if they recommend someone else. The second way to use snowball sampling is to ask at the end of an interview for recommendations for other people to interview. One of the challenges is that people tend not to know others’ contact information off the top of their head, so it is helpful to follow up with an email thanking them for their participation and asking for the names and contact information of other potential participants. Snowball sampling should be used with care if a research topic is related to sensitive personal information or a stigmatized behavior, since eligibility in a study can make known personal information as soon as people recommend one another. For example, if you were studying online harassment, as soon as a participant recommends another potential recruit, they have disclosed to you personal information about that person. In this case, it may be better to ask the participant to pass along your contact information to others and have them contact you if they are eligible rather than asking for contact information of potential eligible recruits. Snowball sampling can work if the topic is relatively innocuous or professionally related or if the behaviors, experiences, or technology use of interest are already widely known and public. Relying on existing participants to recruit others into the study can act as a kind of credibility voucher for a researcher. For example, I used purposive sampling followed by snowball sampling to find early adopters and developers of a mobile app in Singapore and Indonesia.39 I was able to find and interview different kinds of users and marketers of the technology to whom I never would have had access without a recommendation from the CEO, whom I interviewed first. When using a dual-process strategy of purposive followed by snowball sampling, it is important to report the number of participants recruited through each process in the final write-up. There is a general homophily

91 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

within social networks,40 and when you use snowball sampling, you will likely be introduced to people similar to the recommender. This can be good if you are recruiting a hard-to-find population, but it may limit the sample’s diversity of perspectives and experiences regarding the phenomenon of interest in ways that could be problematic. To that end, it is essential to snowball sample from more than just one or two initial respondents. Providing as much detail as possible about the participants and how you found and recruited them enhances the transparency of qualitative research.41 Sampling Different Kinds of Data

Many people who have written about qualitative research methods advocate for sampling different kinds of data to answer a research question.42 For example, multiple kinds of data regarding identity and social media could be gathered through interviews, participant observation, and textual analyses of participants’ internet use. There are various metaphors used to describe this process of studying a phenomenon by combining different kinds of data. Glaser and Strauss use a pie metaphor: “different kinds of data give the analyst different views or vantage points from which to understand a category and to develop its properties; these different views we have called slices of data.”43 The idea is that the more slices of data you have, the fuller the pie or picture of the phenomenon of interest will be. Another metaphor for understanding social phenomena is that of the crystal or prism. Markham argues the crystal metaphor is powerful “because it values both interior and exterior aspects of the research process, giving credence to the fact that all research is situated and personal—a thoroughly human endeavor. Yet order and rigor are necessary to preserve the integrity of the outcome.”44 Social phenomena and technology, much like crystals, grow and change but are nevertheless solid and hard. It is important to approach the crystal from a variety of angles because what you see looks different depending on your angle. Different kinds of data allow researchers to get a fuller picture of the phenomenon or crystal from different angles. Triangulation is another key term used to describe the process of sampling different kinds of data about a particular phenomenon. Maxwell uses it to describe the process of employing different methods “as a check on one another, seeing if methods with different strengths and limitations all support

92 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

a single conclusion.”45 In this way, triangulation allows you to check your findings against each other. For example, one of the weakness of observational work is that a researcher cannot determine intent or personal meaning from an observable behavior.46 In my research, I observed mobile phone use in public; however, I could not observe what people thought or felt about it. Therefore, I conducted intercept interviews with people about mobile phone use in public to examine how they understood the behaviors I observed. Intercept interviews are recruited in public spaces like malls or train stations and are typically conducted on the spot. These interviews tend to be short (i.e., less than thirty minutes) but enable a recruitment pool that may be more diverse. Coupling the different data sources, I became more confident about my claims regarding how mobile phones were used in public.47 When I have collected different kinds of data, I have tended to rely on and prioritize certain data over others within a study. Sometimes this is because I have considerably more data collected through one means than another. For example, in the study of mobile phones in public, I had more observational data collected than interviews conducted, and I spent much of my analysis on the observed behavioral data.48 However, in my experience, observational data are more complicated to analyze than interview data because observational data are not always as clean and comparable as semistructured interview transcripts. Sometimes I have prioritized interviews because those data tell a richer story and are relatively easy to analyze. I typically find that a good interview, in which a respondent talks about the phenomenon of interest using their own words, is remarkably evocative, much more so than a behavioral data point. For example, we found that the personal and professional uses of social media were blurred for some small business owners.49 In our final article, we included this quote: “I post what’s going on with my life and I just put myself out there, but yeah  . . . I mean I think I can tone it down a little bit. But at the same time, I think it has brought a lot of people to our business because I do interact with customers.”50 This quote not only conveys the blurring of personal and professional uses of social media but also richly acknowledges tensions in such blurring, particularly in the interviewee’s statement that he could “tone it down.” I also typically engage in some kind of participant observation of the technology I am studying. This enables me to have a basic understanding so that I can better interpret the experiences of my participants, but I seldom

93 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

use my own experiences of the technology in research articles, prioritizing instead the experiences of my participants. For example, despite conducting over a year of participant observation of Socialight, the mobile virtual sticky note service, we did not draw on it explicitly in our write-up of the findings but instead relied on the quotes and experiences of interview participants.51 This was in part because my use of Socialight was not embedded within a larger social network of users but was mostly me experimenting with how and where the technology worked. One place where I did draw on my own technology use explicitly was in my book,52 but I did so for a very logistical reason: I wanted to show examples with screenshots of social media posts, and for copyright and privacy reasons, it was easier to use my own posts than those of research participants. SAMPLING IN DIGITAL RESEARCH

Recent methodological contributions have articulated several interesting approaches for qualitative internet researchers to consider. I highlight only a few of them here that are particularly relevant to the discussion of sampling multiple kinds of data (i.e., triangulation), although methods comprise a growing area of research and development within qualitative research and internet studies. For example, Mobile Media and Communication devoted its May 2018 issue (vol. 6, no. 2) to mobile methods. One of the most helpful methodological aspects of doing internet research is that internet and mobile technologies produce traces. Here I use the term trace to denote the text messages, phone calls, emails, map histories, photos, social media posts and likes, browsing histories, calendars, and mobile phone and computer settings that by default store and save what we do with our communication technologies. While computational social science and data science are fundamentally reliant on digital traces (e.g., clickstreams, hashtags, search queries, networked links), qualitative researchers can also use and think about how different kinds of trace data may be relevant to their research questions. In particular, individual participants’ trace data can be used to help triangulate various kinds of behaviors or experiences in which internet researchers may be interested. Kaufmann outlines a mobile media elicitation technique for qualitative interviewing that uses mobile phone log data as well as content that was produced by the user to aid in memory recall and ground the interview in mediated

94 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

practice.53 She shows how such techniques are helpful in her studies of mobile shopping practices as well as refugee smartphone use. Mobile log data can also become rich fodder on which to base qualitative interviews,54 Depending on the project, it may be necessary to have a special app to collect log data.55 However, calendars, smartphone photos, and text messages can often help remind users of specific events or experiences with which to ground their answers. One challenge to incorporating a device into an interview is that during the interview, participants may receive notifications, texts, or calls, to which they feel the need to respond, thereby being distracted from the study. This can disrupt the flow of the interview. Therefore, I typically begin the interview without the device, ask a series of questions first, and then ask participants to take out their phone or to pull up their account on a computer. This sequential interviewing means that participants are looking at and paying attention only to me at the start of the interview, which can help to establish rapport and set the tone for the interview. It is important to recognize that interviews are rather unusual and perhaps even awkward experiences for many people, so establishing rapport is essential to a rich interview. Rather than relying on found traces on phones or profiles, we can use photo elicitation to help people discuss and reflect on more abstract or mundane aspects of a particular phenomenon, such as the meaning of a mobile check-in. In this method, participants are explicitly asked to take photos related to the research question.56 When I have used this technique, I would conduct the interview first, and then at the end of the interview, once participants had a sense of what was of interest to me, I would make the ask. For example, when we were studying Foursquare use, after conducting the interview, we would ask participants to send us, within the following week, five photos, each with a description of what the photo shows, why they took it, and what happened immediately before and after they took it.57 The textual description is a kind of participant reflection that aids the researcher in interpreting the still images. Because photos are created for the research project, this kind of elicitation is different from the kinds described by Kaufmann58 and by Ørmen and Thorhauge,59 where interviewees reflect on photos already available from their prior experiences. As such, photo elicitation can be very creative and abstract, enabling participants to reflect visually on the emotions and motivations they may feel. It can also be very concrete and evidentiary depending on the researcher’s directions or prompts. For example, when researchers are

95 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

trying to gauge healthy behaviors through mobile health apps, photos of meals can become important sources of data regarding what the portion sizes are and whether there’s cheese and bacon on that “salad.”60 Logistically, it is helpful to direct participants explicitly not to include people in their photos due to privacy concerns and lack of consent from others to participate in the study. Drawing on more science and technology studies, media technologies can also be examined through their interface design. Light, Burgess, and Duguay describe the walkthrough method, which “involves the step-bystep observation and documentation of an app’s screens, features and flows of activity—slowing down the mundane actions and interactions that form part of normal app use in order to make them salient and therefore available for critical analysis.”61 By analyzing the design features, choices, icons, default settings, etc. on an app, the walkthrough method aims to combine science and technology studies with cultural studies in order to reveal the embedded cultural meanings of the app as well as its intended or imagined users. The walkthrough method can be thought of as producing another slice of data to be sampled when examining mobile technology. CONCLUSION

Sampling in qualitative research can take on several different aspects regarding what is sampled. The researcher must carefully consider issues of access, recruitment, and kinds of data collected in the sampling of qualitative research. Theoretical sampling and purposive sampling are two of the most common techniques used, with each having its particular strengths and weaknesses. Most qualitative methodologists would recommend sampling different kinds of data as a way to enhance the validity or credibility of a research project.62 Luckily, for many internet and new media studies scholars, the media technologies that we want to study provide many layers and means through which to interrogate their adoption, use, and effects. Coupling or triangulating different kinds of data regarding a phenomenon enhances a researcher’s ability to feel confident that their findings are credible and valid. Similar prolonged engagement and persistent observation can further strengthen a qualitative study.63 All research has strengths and weaknesses. A way to strengthen a research study is to make transparent the methodological details and choices. It is incredibly helpful to both reviewers and readers if the researcher articulates

96 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

by whom, from whom, when, where, and how the data were collected and why the sampling choices are well reasoned and thoughtful. Presenting a qualitative internet study in a methodologically careful, thoughtful, and reflective way enhances its readability by those who use qualitative methods as well as by those who do not, increasing the chances that the researcher is speaking not only to a like-minded group of scholars but also to a wide spectrum of fellow scholars working on similar research questions.

NOTES 1. N. K. Denzin and Y. S. Lincoln, “Introduction: The Discipline and Practice of Qualitative Research,” in Handbook of Qualitative Research, 2nd ed., ed. N. K. Denzin and Y. S. Lincoln (Thousand Oaks, CA: SAGE, 2000), 1–28. 2. S. E. Maxwell, M. Y. Lau, and G. S. Howard, “Is Psychology Suffering from a Replication Crisis? What Does ‘Failure to Replicate’ Really Mean?,” American Psychologist 70, no. 6 (2015): 487–498. 3. Y. S. Lincoln and E. G. Guba, Naturalistic Inquiry (Beverly Hills, CA: SAGE, 1985). 4. J. A. Maxwell, Qualitative Research Design: An Interactive Approach (Thousand Oaks, CA: SAGE, 2013). 5. N. Emmel, Sampling and Choosing Cases in Qualitative Research: A Realist Approach (Thousand Oaks, CA: SAGE, 2014). 6. T. Boellstorff et al., Ethnography and Virtual Worlds: A Handbook of Method (Princeton, NJ: Princeton University Press, 2012); T. R. Lindlof and B. C. Taylor, Qualitative Communication Research Methods, 4th ed. (Thousand Oaks, CA: SAGE, 2019). 7. Emmel, Sampling and Choosing Cases, 2. 8. L. Humphreys, The Qualified Self: Social Media and the Accounting of Everyday Life (Cambridge, MA: MIT Press, 2018). 9. Lindlof and Taylor, Qualitative Communication Research Methods; Maxwell, Qualitative Research Design. 10. Lindlof and Taylor, Qualitative Communication Research Methods. 11. J. Kirk and M. L. Miller, Reliability and Validity in Qualitative Research (Newbury Park, CA: SAGE, 1986), 20. 12. Maxwell, Qualitative Research Design, 121. 13. G. M. A. Higginbottom, “Sampling Issues in Qualitative Research,” Nurse Researcher 12, no. 1 (2004): 7–19; Maxwell, Qualitative Research Design. 14. J. Corbin and A. Strauss, Basics of Qualitative Research, 4th ed. (Thousand Oaks, CA: SAGE, 2015); B. G. Glaser and A. L. Strauss, The Discovery of Grounded Theory: Strategies for Qualitative Research (New York: Aldine de Gruyter, 1967). 15. Corbin and Strauss, Basics of Qualitative Research. 16. Corbin and Strauss, Basics of Qualitative Research, 141. 17. Glaser and Strauss, The Discovery of Grounded Theory. 18. Merton, R. K. On the Shoulders of Giants: The Post-Italianate Edition. Chicago: University of Chicago Press, 1993. 19. Glaser and Strauss, The Discovery of Grounded Theory.

97 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

20. Glaser and Strauss, The Discovery of Grounded Theory, 56. 21. Glaser and Strauss, The Discovery of Grounded Theory, 57. 22. Lindlof and Taylor, Qualitative Communication Research Methods. 23. C. Blackwell, J. Birnholtz, and C. Abbott, “Seeing and Being Seen: Co-situation and Impression Formation Using Grindr, a Location-Aware Gay Dating App,” New Media and Society 17, no. 7 (2014): 1117–1136, https://doi.org/10.1177/1461444814521595. 24. L. Humphreys, “Cellphones in Public: Social Interactions in a Wireless Era,” New Media and Society 6, no. 8 (2005): 813–836. 25. L. Humphreys, “Connecting, Coordinating, Cataloguing: Communicative Practices on Mobile Social Networks,” Journal of Broadcasting and Electronic Media 56, no. 4 (2012): 494–510, https://doi.org/10.1080/08838151.2012.732144. 26. Humphreys, “Connecting, Coordinating, Cataloguing.” 27. L. Humphreys, “Mobile Social Networks and Social Practice: A Case Study of Dodgeball,” Journal of Computer Mediated Communication 13, no. 1 (2007): 341–360; L. Humphreys, “Mobile Social Networks and Urban Public Space,” New Media and Society 12, no. 5 (2010): 763–778; L. Humphreys, “Who’s Watching Whom? A Study of Interactive Technology and Surveillance,” Journal of Communication. 61 (2011): 575–595; L. Humphreys and T. Barker, “Modernity and the Mobile: Exploring Tensions About Dating and Sex in Indonesia,” Media/Culture 10, no. 7 (2007), http:// journal.media-culture.org.au/0703/06-humphreys-barker.php; L. Humphreys and T. Liao, “Mobile Geotagging: Reexamining Our Interactions with Urban Space,” Journal of Computer Mediated Communication 16, no. 3 (2011): 407–423. 28. Humphreys, “Connecting, Coordinating, Cataloguing.” 29. Maxwell, Qualitative Research Design, 97. 30. Lindlof and Taylor, Qualitative Communication Research Methods. 31. Humphreys and Liao, “Mobile Geotagging.” 32. A. L. Gonzales et al., “Better Everyone Should Know Our Business Than We Lose Our House: Costs and Benefits of Medical Crowdfunding for Support, Privacy, and Identity,” New Media and Society 20, no. 2 (2018): 641–658. 33. L. Humphreys and R. Wilken, “Social Media, Small Businesses, and the Control of Information,” Information, Communication and Society 18, no. 3 (2015): 295–309. 34. Of course, within qualitative internet studies, there is a whole realm of online ethnography that is not geographically bounded (Boellstorff et al., Ethnography and Virtual Worlds; C. Hine, ed., Virtual Methods: Issues in Social Research on the Internet [Oxford: Berg, 2005]). But as soon as the researcher engages in offline data collection, geography and place very often matter methodologically again (C. Hine, “How Can Qualitative Internet Researchers Define the Boundaries of Their Projects?,” in Internet Inquiry: Conversations About Methods, ed. A. N. Markham and N. K. Baym (Los Angeles: SAGE, 2008), 1–20; S. Orgad, “How Can Researchers Make Sense of Issues Involved in Collecting and Interpreting Online and Offline Data?,” in Internet Inquiry: Conversations About Method, ed. A. Markham and N. K. Baym [Thousand Oaks, CA: SAGE, 2008], 33–52). 35. M. Halpern and L. Humphreys, “Iphoneography as an Emergent Art World,” New Media and Society 18, no. 1 (2016): 62–81. 36. Humphreys and Liao, “Mobile Geotagging.” 37. Humphreys and Liao, “Mobile Geotagging,” “Methodology,” para. 4. 38. Lindlof and Taylor, Qualitative Communication Research Methods.

98 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

39. Humphreys and Barker, “Modernity and the Mobile.” 40. M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a Feather: Homophily in Social Networks,” Annual Review of Sociology 27 (2001): 415–444. 41. Lincoln and Guba, Naturalistic Inquiry. 42. Glaser and Strauss, The Discovery of Grounded Theory; Lincoln and Guba, Naturalistic Inquiry; Kirk and Miller, Reliability and Validity; J. Lofland et al., Analyzing Social Settings: A Guide to Qualitative Observation and Analysis, 4th ed. (Belmont, CA: Wadsworth, 2006); Maxwell, Qualitative Research Design. 43. Glaser and Strauss, The Discovery of Grounded Theory, 65. 44. A. N. Markham, “What Constitutes Quality in Qualitative Internet Research? A Response to Nancy Baym,” in Internet Inquiry: Conversations About Method, ed. A. N. Markham and N. K. Baym (Thousand Oaks, CA: SAGE, 2008), 192. 45. Maxwell, Qualitative Research Design, 102. 46. P. A. Adler and P. Adler, “Observational Techniques,” in Handbook of Qualitative Research, ed. N. K. Denzin and Y. S. Lincoln (Thousand Oaks, CA: SAGE, 1994), 377–391. 47. Humphreys, “Cellphones in Public.” 48. Humphreys, “Cellphones in Public.” 49. Humphreys and Wilken, “Social Media, Small Businesses.” 50. Humphreys and Wilken, “Social Media, Small Businesses,” 299. 51. Humphreys and Liao, “Mobile Geotagging.” 52. Humphreys, The Qualified Self. 53. K. Kaufmann, “The Smartphone as a Snapshot of Its Use: Mobile Media Elicitation in Qualitative Interviews,” Mobile Media and Communication 6, no. 2 (2018): 233–246, https://doi.org/10.1177/2050157917743782. 54. J. Ørmen and A. M. Thorhauge, “Smartphone Log Data in a Qualitative Perspective,” Mobile Media and Communication 3, no. 3 (2015): 335–350, https://doi .org/10.1177/2050157914565845. 55. E.g., J. Boase, “Implications of Software-Based Mobile Media for Social Research,” Mobile Media and Communication 1, no. 1 (2013): 57–62. 56. T. M. Beckley et al., “Snapshots of What Matters Most: Using Resident-Employed Photography to Articulate Attachment to Place,” Society and Natural Resources 20, no. 10 (2007): 913–929. 57. R. Wilken and L. Humphreys, “Constructing the Check-In: Reflections on PhotoTaking Among Foursquare Users,” Communication and the Public 4, no.2 (2019): 100–117. 58. Kaufmann, “The Smartphone as a Snapshot.” 59. Ørmen and Thorhauge, “Smartphone Log Data.” 60. J. P. Pollak et al., “It’s Time to Eat! Using Mobile Games to Promote Healthy Eating,” IEEE Pervasive Computing 9, no. 3 (2010): 21–27, https://doi.org/10.1109 /MPRV.2010.41. 61. B. Light, J. Burgess, and S. Duguay, “The Walkthrough Method: An Approach to the Study of Apps,” New Media and Society 20, no. 3 (2018): 882, https://doi.org/10.1177 /1461444816675438. 62. Glaser and Strauss, The Discovery of Grounded Theory; Lincoln and Guba, Naturalistic Inquiry; Lindlof and Taylor, Qualitative Communication Research Methods. 63. Lincoln and Guba, Naturalistic Inquiry.

99 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

REFERENCES Adler, P. A., and P. Adler. “Observational Techniques.” In Handbook of Qualitative Research, ed. N. K. Denzin and Y. S. Lincoln, 377–391. Thousand Oaks, CA: SAGE, 1994. Beckley, T. M., R. C. Stedman, S. M. Wallace, and M. Ambard. “Snapshots of What Matters Most: Using Resident-Employed Photography to Articulate Attachment to Place.” Society and Natural Resources 20, no. 10 (2007): 913–929. Blackwell, C., J. Birnholtz, and C. Abbott. “Seeing and Being Seen: Co-situation and Impression Formation Using Grindr, a Location-Aware Gay Dating App.” New Media and Society 17, no. 7 (2014): 1117–1136. https://doi.org/10.1177/1461444814521595. Boase, J. “Implications of Software-Based Mobile Media for Social Research.” Mobile Media and Communication 1, no. 1 (2013): 57–62. Boellstorff, T., B. Nardi, C. Pearce, and T. L. Taylor. Ethnography and Virtual Worlds: A Handbook of Method. Princeton, NJ: Princeton University Press, 2012. Corbin, J., and A. Strauss. Basics of Qualitative Research. 4th ed. Thousand Oaks, CA: SAGE, 2015. Denzin, N. K., and Y. S. Lincoln. “Introduction: The Discipline and Practice of Qualitative Research.” In Handbook of Qualitative Research, 2nd ed., ed. N. K. Denzin and Y. S. Lincoln, 1–28. Thousand Oaks, CA: SAGE, 2000. Emmel, N. Sampling and Choosing Cases in Qualitative Research: A Realist Approach. Thousand Oaks, CA: SAGE, 2014. Glaser, B. G., and A. L. Strauss. The Discovery of Grounded Theory: Strategies for Qualitative Research. New York: Aldine de Gruyter, 1967. Gonzales, A. L., E. Y. Kwon, T. Lynch, and N. Fritz. “ ‘Better Everyone Should Know Our Business Than We Lose Our House’: Costs and Benefits of Medical Crowdfunding for Support, Privacy, and Identity.” New Media and Society 20, no. 2 (2018): 641–658. Halpern, M., and L. Humphreys. “Iphoneography as an Emergent Art World.” New Media and Society 18, no. 1 (2016): 62–81. Higginbottom, G. M. A. “Sampling Issues in Qualitative Research.” Nurse Researcher 12, no. 1 (2004): 7–19. Hine, C. “How Can Qualitative Internet Researchers Define the Boundaries of Their Projects?” In Internet Inquiry: Conversations About Methods, ed. A. N. Markham and N. K. Baym, 1–20. Los Angeles: SAGE, 2008. Hine, C., ed. Virtual Methods: Issues in Social Research on the Internet. Oxford: Berg, 2005. Humphreys, L. “Cellphones in Public: Social Interactions in a Wireless Era.” New Media and Society 6, no. 8 (2005): 813–836. Humphreys, L. “Connecting, Coordinating, Cataloguing: Communicative Practices on Mobile Social Networks.” Journal of Broadcasting and Electronic Media 56, no. 4 (2012): 494–510. https://doi.org/10.1080/08838151.2012.732144. Humphreys, L. “Mobile Social Networks and Social Practice: A Case Study of Dodgeball.” Journal of Computer Mediated Communication 13, no. 1 (2007): 341–360. Humphreys, L. “Mobile Social Networks and Urban Public Space.” New Media and Society 12, no. 5 (2010): 763–778. Humphreys, L. The Qualified Self: Social Media and the Accounting of Everyday Life. Cambridge, MA: MIT Press, 2018. Humphreys, L. “Who’s Watching Whom? A Study of Interactive Technology and Surveillance.” Journal of Communication 61 (2011): 575–595.

100 Q U A L I TAT I V E S A M P L I N G A N D I N T E R N E T   R E S E A R C H

Humphreys, L., and T. Barker. “Modernity and the Mobile: Exploring Tensions About Dating and Sex in Indonesia.” Media/Culture 10, no. 1 (2007). http://journal.media -culture.org.au/0703/06-humphreys-barker.php. Humphreys, L., and T. Liao. “Mobile Geo-tagging: Reexamining Our Interactions with Urban Space.” Journal of Computer Mediated Communication 16, no. 3 (2011): 407–423. Humphreys, L., and R. Wilken. “Social Media, Small Businesses, and the Control of Information.” Information, Communication and Society 18, no. 3 (2015): 295–309. Kaufmann, K. “The Smartphone as a Snapshot of Its Use: Mobile Media Elicitation in Qualitative Interviews.” Mobile Media and Communication 6, no. 2 (2018): 233–246. https://doi.org/10.1177/2050157917743782. Kirk, J., and M. L. Miller. Reliability and Validity in Qualitative Research. Newbury Park, CA: SAGE, 1986. Light, B., J. Burgess, and S. Duguay. “The Walkthrough Method: An Approach to the Study of Apps.” New Media and Society 20, no. 3 (2018): 881–900. https://doi .org/10.1177/1461444816675438. Lincoln, Y. S., and E. G. Guba. Naturalistic Inquiry. Beverly Hills, CA: SAGE, 1985. Lindlof, T. R., and B. C. Taylor. Qualitative Communication Research Methods. 4th ed. Thousand Oaks, CA: SAGE, 2019. Lofland, J., D. Snow, L. Anderson, and L. H. Lofland. Analyzing Social Settings: A Guide to Qualitative Observation and Analysis. 4th ed. Belmont, CA: Wadsworth, 2006. Markham, A. N. “What Constitutes Quality in Qualitative Internet Research? A Response to Nancy Baym.” In Internet Inquiry: Conversations About Method, ed. N. Markham and N. K. Baym, 190–197. Thousand Oaks, CA: SAGE, 2008. Maxwell, J. A. Qualitative Research Design: An Interactive Approach. Thousand Oaks, CA: SAGE, 2013. Maxwell, S. E., M. Y. Lau, and G. S. Howard. “Is Psychology Suffering from a Replication Crisis? What Does ‘Failure to Replicate’ Really Mean?” American Psychologist 70, no. 6 (2015): 487–498. McPherson, M., L. Smith-Lovin, and J. M. Cook. “Birds of a Feather: Homophily in Social Networks.” Annual Review of Sociology 27 (2001): 415–444. Merton, R. K. On the Shoulders of Giants: The Post-Italianate Edition. Chicago: University of Chicago Press, 1993. Orgad, S. “How Can Researchers Make Sense of Issues Involved in Collecting and Interpreting Online and Offline Data?” In Internet Inquiry: Conversations About Method, ed. A. Markham and N. K. Baym, 33–52. Thousand Oaks, CA: SAGE, 2008. Ørmen, J., and A. M. Thorhauge. “Smartphone Log Data in a Qualitative Perspective.” Mobile Media and Communication 3, no. 3 (2015): 335–350. https://doi.org /10.1177/2050157914565845. Pollak, J. P., G. Gay, S. Byrne, E. Wagner, D. Retelny, and L. Humphreys. “It’s Time to Eat! Using Mobile Games to Promote Healthy Eating.” IEEE Pervasive Computing 9, no. 3 (2010): 21–27. https://doi.org/10.1109/MPRV.2010.41. Wilken, R., and L. Humphreys. “Constructing the Check-In: Reflections on PhotoTaking Among Foursquare Users.” Communication and the Public, Special Issue on Geomedia 4, no. 2 (2019): 100–117. https://doi.org/10.1177/2057047319853328.

Chapter Five

BEHIND THE RED LIGHTS Methods for Investigating the Digital Security and Privacy Experiences of Sex Workers ELISSA M. REDMILES

“There’s one!” I exclaimed, trying to be quiet. We darted down the nearby alleyway, headed for the red-and-pink-neon-lit windows. I was getting better at spotting brothels. As we ducked inside the entrance, we saw three women clustered around a high table smoking. Opening the door next to them, we stepped into a dark bar. “Wie kann ich dir helfen?” (How can I help you?) asked the sole man in the room. My colleague explained that we were researchers from the university. We were conducting an interview study. We wanted to leave a flyer to see if anyone would be interested in talking to us; it paid well, we said. He shooed us away. “My girls,” as he called them, “are only here for a month. They wouldn’t be interested.” As we walked out, I slipped a few flyers on the high table, next to the “girls” who were smoking. “Thank goodness for smoking,” I murmured to my companion. I came to Zurich with an explicit goal: to study the online safety experiences of sex workers, in an effort to understand how online security and privacy practices are shaped by risk and how online and offline experiences of threat can blend together to create a singular experience of safety.1 This research started long before I began plotting out city maps filled with brothels and dragging friends and family with me into the red-light districts of Switzerland and Germany to recruit door-to-door at brothels. When beginning research in uncharted territory—arguably an apt description of

102 BEHIND THE RED LIGHTS

doing research on the technology use of sex workers, and specifically on the online security and privacy practices of sex workers—we (or I, at least) often start with inductive qualitative work. In this case, four collaborators and I read hundreds of online forum posts and inductively generated a high-level framework of sex workers’ technology uses. Following months of forum reading, I began to put together a semistructured2 interview protocol, anchored by the frameworks that emerged from our forum coding. A German-speaking colleague and I then conducted twenty-seven semistructured interviews with sex workers in Switzerland and Germany. We discussed a wide range of online and offline safety-related experiences during the interviews. In this chapter, I discuss my motivations for studying this group as well as my online forum analysis methods and how this analysis helped develop the foundational understanding of technology use in sex work that informed the creation of a safety-focused interview protocol. I then detail my experiences and pitfalls in attempting to recruit participants in the insular, justifiably authority-wary community of sex workers across two countries—Switzerland and Germany—as a non-German speaker. I then discuss the process of conducting the interviews, including adapting to interviewee formality, learning to manage appropriately bearing witness to workers’ intense experiences, and maintaining consistency in data collection across two languages, two interviewers, and three interview modes (video, audio, and text). Finally, I conclude with a brief discussion of next steps and lessons learned. MOTIVATION: WHY STUDY SEX WORKERS TO FIX CYBERSECURITY?

Computer science researchers who focus on security—security researchers—make a lot of noise about risk. If only the users understood the risks, we bemoan, then they would behave securely. But, alas, as one of my interview participants for another study, with a general population of users, told me early in my PhD work, “With computer security, I’m securing myself from threats that I don’t even know anything about . . . I know when somebody walks up with a gun that I should be worried.”3 Over the past decade, the field of usable security and privacy has focused on trying to understand what drives users to adopt, or reject, digital security

103 BEHIND THE RED LIGHTS

and privacy practices. These efforts have led to more user-friendly privacy settings as well as improvements in warning messages and password policies. Despite these successes, many open questions remain about how and why users make security and privacy decisions and how to help them make safer choices. One particularly difficult challenge is making risks salient to users. This challenge also manifests as a methodological limitation: studying those who do not feel at risk limits the utility of surveys and laboratory studies about security conducted with general users. In my own work, I have built scalable online platforms to run highly controlled behavioral economics experiments in an effort to simulate properly the risk and cost trade-offs that users make in security situations.4 Yet little can serve as a proxy for true risk, particularly for the blur between online and offline risks that face high-risk populations, such as journalists, undocumented immigrants, and sex workers. Sex workers use the internet to find and communicate with clients and to create and maintain a professional image, often simultaneously concealing their “real” online identity. While recent work shows that the internet has provided many benefits for sex workers,5 their online presence may also put them at increased risk of stalking, physical violence, and harassment. Thus, sex workers are a population of internet users for whom security and privacy risks are especially salient. Further, sex workers are especially at risk of digital compromise from people they know, so their experiences and protective techniques provide a unique view into a growing, yet little studied, area of online threat: compromise by people known to the user. This form of compromise has been shown to be especially relevant in cases of domestic violence,6 yet traditional methods of security (e.g., asking for answers to personal questions) do little to defend against such threats.7 Finally, unlike more-studied groups like journalists,8 sex workers rarely receive specialized digital security and privacy training, and unlike undocumented immigrants,9 they often must maintain a potentially risky online presence in order to sustain their livelihood. Using the methodology described in the remainder of this chapter, I aimed to collect data regarding (1) what makes a privacy or security risk salient, including contrasts between legal and physical risk; (2) how online identity among those in a marginalized group manifests online and how this online identity intersects with privacy threat models and defenses; (3) how sex workers defend themselves against a little studied, yet broadly applicable, threat model: attacks from people who know then;10 and

104 BEHIND THE RED LIGHTS

(4) how technology can be improved to help keep sex workers and those with similar threat models (e.g., domestic violence victims) safe. As the first of my four research questions related to the contrast between legal and physical risk, I sought to study two countries in which sex work is legal (Switzerland and Germany) in contrast to a country in which sex work is not legal (United States). As I speak a small amount of German and I have lived in both Switzerland and Germany for periods of time, I selected these countries over, e.g., the Netherlands, another country where sex work is legal. In this chapter, I discuss only the Europe portion of the study, which was conducted first. FOUNDATIONS: UNDERSTANDING THE LANDSCAPE OF TECHNOLOGY USE IN SEX WORK

While there are an estimated forty-two million sex workers in the world (0.6 percent of the world population) who drive over $180 million in business per year,11 there has been little prior work on technology use among sex workers.12 Thus, before launching into a project about online safety among sex workers, I first needed to develop an understanding of how sex workers use technology in their work. To do so, I decided to read forums in which sex workers discussed their work and experiences. After many Google searches, I eventually identified four relatively active forums: the “sex workers only” subreddit (reddit.com/r /sexworkersonly); the SAAFE forum for UK sex workers; sex work subforums on the website FetLife, which is “the Social Network for the BDSM, Fetish & Kinky Community”; and “sexworker.at.” In the first three forums, users converse in English, while in the last, they converse in German. I began by simply reading the English forums, immersing myself in the experiences of those who were posting. I learned about different types of sex work, became invested in debates regarding whether conversing with clients for free by text message was “giving away the cow for free,” and empathized with the concerns of those who were not sure how to begin setting boundaries with a good but quirky client. After a few days of reading the forums, I realized that the posts were rich with data and merited a more formal analysis beyond my cursory reading. I enlisted three collaborators: two other English speakers and one native German speaker who also speaks English fluently. We divided up the forum analysis, each analyzing one of the four forums. To avoid data contamination,

105 BEHIND THE RED LIGHTS

we each performed our own inductive open coding13 on the forum data, developing a set of codes and noting exemplar quotes. Over a period of four months, we developed our codebooks by reading through posts on “our” forum. After each person felt that they had reached saturation and had gone through at least six months of posts, I consolidated the four codebooks into one high-level framework of technology use among sex workers. I included a few exemplar forum posts for each theme and then asked each of my collaborators to review the consolidated codebook for any omissions. I found that each of us had identified the same four types of technology use: client acquisition (advertising and setting up appointments), client maintenance (conversing with clients, giving gifts, etc.), payment processing, and support seeking (looking for advice, watercooler conversation, and other support from other sex workers). We also found, especially among illegal sex workers, a high frequency of discussion about security and privacy-preserving tools such as bitcoin, Tor, and country-specific anonymous payment platforms. However, the discussions were sufficiently vague that they did not answer our main research questions but rather provided starting points for interview discussions. In our codebooks, we made note of each technology discussed and the nature of the discussion. Finally, these months of forum reading provided me with not only a framework for thinking about technology use but also a dictionary of sex-workrelevant words and phrases, such as full service, everything up to and including sex; out call, going to a client’s home or location; in call, having a client come to your home or location; and gfe, girlfriend experience, a type of sex work that involves acting as if you are in a romantic relationship with your client. RECRUITMENT

The portion of the project about which I was most worried was recruitment. Also, when I told anyone about the project, it was the research step that drew the most skepticism. A Test Run: “Would You Be Willing to Talk to Us?”

To assuage my fears, I did a trial run of recruitment strategies before I was ready to begin recruitment for real. While located in Saarbrucken, Germany, Kathrin—the colleague mentioned earlier who speaks both German

106 BEHIND THE RED LIGHTS

and English fluently—and I called a number of German brothels to ask those working there if they would be interested in being paid to talk to us about how they used the internet. We also visited a brothel in Saarbrucken that has open windows in the city center. It is located across from a playground and in between a number of bars and restaurants, so those who work there are used to talking with a variety of people who are not potential clients. In response to our inquiries, we were repeatedly told to drop off a flyer or email a letter with information about what we wanted. This trial run provided me with a basis on which to develop a recruiting plan and gave me at least some reassurance that the answer to “Would you be willing to talk to us?” would not be a flat-out no. I planned to recruit by emailing sign-up information to brothels and sex work organizations and by going to brothels in person to drop off informational materials. Thus, I created recruitment flyers and emails in both English and German. In order to track the success of these different recruiting methods, I created a separate vanity URL (a short, customized URL; e.g., go.umd.edu /arbeit-studie-[some extension that I used for tracking origin of signups]) for each recruitment mode: street recruitment, emails to brothels, and emails to organizations. The vanity URL system I used through my university provided no privacy-sensitive information (e.g., IP addresses) about who had clicked the link but did keep a count of link clicks. Recruiting by Emailing Brothels and Sex Work Organizations

Once I had my interview protocol, recruitment flyers, and recruitment emails approved by my institutions’ Ethics Review Boards, I started with email recruitment. I compiled a list of all the brothels that I could find in three cities in Switzerland (Basel, Lugano, and Zurich) and in Germany (Berlin, Saarbrucken, and Hamburg), including their phone numbers, email addresses, and/or links to online contact forms. I also compiled a list of contact information for sex work organizations and unions in both Switzerland and Germany. In the end, I had over fifty email addresses and twenty online contact forms for brothels in both countries and eleven sex work organizations. In order to avoid sending a multitude of individual emails, I used Google Sheets to send customized emails automatically to each organization and brothel with one click. Google Sheets is Google’s version of Excel. You can

107 BEHIND THE RED LIGHTS

link a script in Google Sheets to a Google email address (my university email address is a Google-linked email address) and then use the script to send batch emails from that email address (a tutorial is available at https:// developers.google.com/apps-script/articles/sending_emails). Recruiting on the Street

In addition to recruiting using email, I recruited participants by directly visiting brothels—part of my purpose in being located in the countries from which I was trying to recruit. For recruitment, I created flyers advertising the study and a recruitment survey that allowed participants to sign up for interview time slots. I created the flyers and recruitment sign-up survey in both English and German because sex workers are often not from the country in which they are working and their clients are often from different countries; thus, even though people in Germany and the eastern portion of Switzerland are German-speaking, sex workers in these areas do not always speak German, and English is often the language used in the brothels. The first time I went out to recruit directly by visiting brothels, I asked a colleague who spoke French, German, and Swiss German—a German dialect spoken in the eastern half of Switzerland—to come with me. I anticipated talking to brothel managers or those working in the brothels and was not sure if English would be the language of communication. I printed out the flyers (figure 5.1) and constructed a map of brothels in Zurich using Google searches and input from colleagues about the “hot” red-light areas, where there was a high density of erotic massage parlors, brothels, and cabaret or strip clubs and sex workers were typically standing outside brothels. Association with Institutions or Authority Does Not Build Trust

During our evening recruiting, my colleague and I attempted to discuss the project with the (male) managers who approached us the minute we entered a brothel or the (female) workers who were standing outside the brothels smoking. We quickly found that being affiliated with a university— providing the implication of authority—did not help us build trust but rather raised suspicion, inspired immediate fear, and led to curt responses. Work in survey methodology shows that in research with more traditional populations, affiliation with a university or trustworthy organization

108 BEHIND THE RED LIGHTS

FIGURE 5.1 English version of the first iteration of the recruitment flyer.

improves response rates.14 However, for marginalized populations that may have a justifiable fear of authority,15 associating with institutions may not necessarily be effective. After determining that associating with authority was not helping us, my colleague and I took a less direct approach to recruitment. We slid flyers

109 BEHIND THE RED LIGHTS

under ashtrays outside brothels and in between bars on brothel windows, we popped inside brothels and placed the flyers on cigarette dispenser machines, and we even put flyers in decorative trees outside brothel doors. Since women very, very rarely enter brothels, our gender identity and the unusual nature of our activity (dropping off flyers and leaving) led to few questions and no challenges. Throughout two hours of visiting over thirty brothels, cabarets, and erotic massage parlors, only one worker who was sitting outside smoking asked us about the flyers. Yet when I got home that night, four people had already clicked the sign-up link, and three had signed up. As a result of this first night of recruitment, I learned that deemphasizing association with authority on the flyer (e.g., reducing the emphasis on university logos) was likely to help with recruitment. Additionally, from the questions asked by the one worker who spoke to us—What did she need to do for the study? Would the study hurt her brain? What would she get paid?—I determined that the flyers should be refined. As shown in figure 5.2, in the second iteration I significantly reduced the amount of

FIGURE 5.2 English version of the second iteration of the recruitment flyer. The flyer was significantly shortened, and affiliations with authority (universities) were deemphasized.

110 BEHIND THE RED LIGHTS

information provided (all relevant information was still contained in the study consent form shown at the beginning of the sign-up form), emphasized the payment amount and clarified what the study entailed (“talking with a researcher”), and removed the university logos, leaving only the sentence at the bottom of the flyer stating the university sponsorship. This also conveniently made the flyer short enough that I could put both the English and the German versions on the same page. I had asked many colleagues and a U.S. sex worker to review the original recruitment materials, but this experience shows that nothing can substitute for real-life recruitment experiences and feedback. At this point in the recruitment process, I had observed three clicks on the sign-up form from the links I had sent to the brothels, and I had received one reply from a brothel, asking if we could do the interviews in Italian—we could not. I had also observed four clicks on the links on the flyers we had distributed door-to-door. I aimed to recruit twenty participants, so I was getting a bit worried. Given the usual rate of participant no-shows for qualitative work—around 50 percent in my experience with general populations recruited on Craigslist—I would have at most three interviews despite sending all the emails and walking around Zurich for two hours in the cold. Determined to do better with recruitment, I printed out my edited flyers and asked another friend to join me the next week to go “brothel hopping.” Once again, we spent over two hours and walked more than four miles throughout Zurich handing out flyers. Additionally, as I lived on one of the red-light-district streets, I distributed a few flyers on my street every night as I came home from work—a perk of working late, as most brothels in Zurich open around 7 PM. Safety as a Researcher

I was only comfortable distributing flyers on my street as I came home from work, as it was in the center of Zurich and well populated enough that I could easily ask for help if needed. I always brought someone with me when recruiting in other red-light districts, as at times I was followed by clients who thought I might be working. Unfortunately, red-light districts also can come with drug dealing and violent crime due to the prevalence of sizable cash-based transactions in the brothels.16 Thus, I took care in

111 BEHIND THE RED LIGHTS

ensuring that I was safe when doing street recruitment by bringing a companion. I also considered how to dress and eventually decided that I did not know what type of clothes would “fit in” among those working in the brothels. In the end, I chose to go with my usual work clothes: pants, a professional top, and a backpack in which I kept the flyers that we would hand out. Respect and Etiquette for “Street” Recruitment

In addition to taking care with my own safety, I was careful to be respectful when I entered brothel-heavy areas. Workers often congregate outside convenience stores, go to eat in local cafés, and are otherwise highly visible outside of the brothels in these areas. While it is typically easy to perceive visually who is a worker, I thought it would be disrespectful to hand someone a flyer about a sex-work-related study. Thus, I always left flyers on tables and near ashtrays or cigarette dispensers, as described earlier. Lucky Breaks with Sex Work Organizations Created a Landslide of Sign-Ups

As I continued street recruitment, clicks on the links slowly trickled in. Kathrin and I began conducting interviews in German and English, respectively. With the hope of getting more participants, we also told interviewees that if they referred friends and emailed us with the email address the friend used to sign up, we would provide them with an additional incentive of 10 CHF. One of our first two interviewees was the public relations chair for a sex work union in Germany. She complimented us on the respectfulness of our interview and appreciated that we were trying to help those in the sex industry. She promised to pass along information about the interviews to members of her union. Through the interview, we ascertained that she had wanted to participate in an interview before passing along information to make sure that we meant well and were conducting our research respectfully. Eight days into recruitment and interviewing we got a second lucky break: one of the sex work organizations that I had contacted emailed back enthusiastically and said that it would send out a link on its listserv. Less than five hours later there were over thirty-five sign-ups through our sign-up form.

112 BEHIND THE RED LIGHTS

While thrilling, this led me to a late-night scramble. I stayed up until three in the morning emailing to confirm some participants and reschedule others, as Kathrin and I each had ended up with over twelve interviews a day. After this point, clicks continued to trickle in from door-to-door recruitment and the brothels I had emailed, but the bulk of our participants came through this listserv blast and the pass-alongs from the woman who was a representative for a sex work union. In total, we ended up with 12 link clicks from door-to-door recruitment, 25 from brothel emails, and 127 from sex work organizations. CONSTRUCTING THE INTERVIEW PROTOCOL

Ultimately, using interview-based data collection, I sought to understand sex workers’ security and privacy experiences and practices. In the interviews, before digging into security and privacy experiences, I needed to gain a bit of background regarding my participants in order to anchor my questions appropriately in their personal experience. Thus, in the interviews I first asked briefly about what the respondent did for (sex) work and how long they had been in the sex industry. I next asked about nonwork technology use, including the length of technology use and typical behaviors. I then asked about technology use specifically for sex work. While I asked broadly about sex work technology use, I used the four types of sex work technology use we found in the forum-coding portion of the study as an anchoring point to guide my own prompting and to help me keep track of our conversation. This anchoring approach turned out to be invaluable when conducting the interviews. Participants often shared multiple, disconnected anecdotes about sex work technology use. Especially when shared via a chat-based interview, this was a lot to keep track of and engage with throughout the exchange. Having a framework to organize the shared anecdotes helped me refine my follow-up questions and make sure that I had covered the full scope of technology use. The sex work technology use categories we found through the forum analysis covered all but one use of technology that emerged in the interviews: covering. Covering is the practice of telling a friend where you will be and for how long and making an agreement that if you do not text or call by a particular time, the friend will follow a series of protective steps (coming to where you are, contacting the police, etc.).

113 BEHIND THE RED LIGHTS

Next, I constructed a series of questions to probe security and privacy topics of interest, including persona separation between work and personal life, definitions of safety (What is safety to you as a sex worker? How do you define safety?), negative prior safety experiences (both online and offline), and support sources/learning methods for safety skills. Respecting Participant Privacy

You may notice that in the interview I did not ask participants about their age, gender, country of origin, socioeconomic status, or other demographic information. I also did not ask these questions on the short interview signup form. To avoid marginalizing participants, in all questions I asked in any part of the study, I worked to be highly respectful and ask for only the bare minimum of information that I needed to answer my research questions. This offered a tangential benefit: I thought deeply about each and every question I asked and how I would use it in future analysis. Doing so helped me identify gaps where I needed to add questions and reduce information collection in places where I was trying to gather background only to better personalize the interview (e.g., technology use for nonwork activities). Considering the minimum set of data necessary to collect from participants is important for research projects even with nonmarginalized participants. I, at least, often worry that I will not have a second chance to collect my data and thus try to include a relative kitchen sink of demographic variables “just in case.” This project, as well as increasing conversations about the ethics of online platforms that overcollect data,17 served as a good reminder for me to reconsider the necessity of each of the variables I typically include in my research. Additionally, when designing the interview sign-up form, I was cognizant of privacy considerations. Typically, when I conduct interviews, I collect participant email addresses in order to schedule interview appointments, remind participants of their interview appointments, and send payment (either via Amazon gift cards or PayPal). Initially, I wanted to avoid collecting any personal information, including email addresses, from participants. However, I realized this would not be practical, as qualitative studies typically have a high no-show rate and removing interview reminders was only likely to make this worse. Also, providing gift card codes only

114 BEHIND THE RED LIGHTS

during the interview with no backup would be a risky proposition. As a compromise, I asked participants to select an interview time slot as part of the recruitment survey—to minimize email interactions—and I provided information about how to create an encrypted throwaway email address just for this study using ProtonMail (protonmail.com). Three of those who signed up ended up creating ProtonMail accounts. Finally, to ensure that the interviews themselves would be sufficiently private and that potential participants felt comfortable with the method of the interviews, I conducted all interviews using the service appear.in (since renamed whereby.com). It is an end-to-end encrypted video, phone, and chat conferencing platform. This means that no communication between two or more parties in a conference room is transferred through any central server. Instead, this information is transferred only between the two parties in the conversation, and the transfer is done in an encrypted manner; the service also allows you to create conversation “rooms” with permanent URLs (e.g., appear.in/arbeit-studie). It is a free platform, but for a small subscription fee, you can record any conversation and lock your conversation rooms, such that only you as the account owner can allow people to enter (this helps prevent accidentally having two participants enter the room at once). I provided a brief description of the appear.in privacy guarantees in the sign-up form and also provided a link to its relatively easy-to-read privacy policy. Review by a Participant

In addition to having my draft interview protocol reviewed by five collaborators with different domains of expertise (sociology and communication, computer science, human computer interaction, and cybercrime), I had it reviewed by a sex worker to ensure that the questions and language used in the protocol were respectful and appropriate. One of my collaborators was connected with a sex worker in the United States who was willing to review the protocol. We paid this consultant for their time and got helpful feedback on rephrasing a few of the questions. Ensuring Multilingual Equivalency

Once the interview protocol was finalized in my native language (English), we needed to translate it into German, as many of the interviews

115 BEHIND THE RED LIGHTS

were likely to be conducted in German. Kathrin, who would be conducting the interviews in German, took great care while translating the protocol to ensure the intent and phrasing of the questions were maintained given such nuances as the German language’s distinction between feminine and masculine and the use of formal and informal language. INTERVIEWING

In total, Kathrin and I conducted twenty-seven interviews with sex workers. Kathrin conducted sixteen interviews in German, and I conducted eleven in English. We conducted our first two interviews on the same day: one in English and one in German. After these first interviews, we realized that the interview protocol was taking us between 90 and 120 minutes to complete. Thus, I quickly started revising the protocol, making significant cuts, especially to the introductory sections, and reducing the number of prompts. The original protocol had forty-three questions (some of which had subquestions or prompts), and the revised version had twenty-four, nearly a 50 percent reduction. After the protocol revision, our interviews consistently took between fifty and seventy-five minutes. Conversational Style

In addition to noticing the interview protocol length within our first two interviews, Kathrin and I observed a high level of conversational informality. Typically, when I have conducted interview studies, I have asked questions in a conversational but relatively formal way and received relatively formal replies (no cursing, etc.). However, in this study, participants were far less formal. The first German-language participant specifically preferred using informal pronouns and addressed Kathrin that way as well. The first English-language participant wanted to do the interview via chat, and our conversation was peppered with smiley faces and colorful anecdotes such as “and then I blocked his ass so fast.” As we continued our interviews, we continued to find that participants wanted to converse very informally, both in German and in English, and we adapted our interview style accordingly. The mode of the interview (chat, audio, or video) also altered the conversational style. Prior to conducting this study, I had done interviews

116 BEHIND THE RED LIGHTS

only by audio, by video, and in person. As is well described in Markham’s Life Online, conducting chat-based interviews requires a large amount of patience.18 People do not read or type nearly as fast as they listen and speak. Additionally, the appear.in interface I was using did not have typing “bubbles” or any way to indicate that the other person was typing. So at the beginning, I ended up “chatting over” my participants. I quickly learned that I needed to wait an extra thirty to sixty seconds after every message they sent to ensure they were done with their thought. Chat-based interviews can also offer benefits, however. It was far easier for me to make notes of topics on which I wanted to follow up while I was waiting for my participant to respond, and I could copy and paste certain questions from the interview guide into the chat box, making the question-asking portion of the interview go more quickly. Learning as You Interview

In addition to refining the interview protocol and conversational styles during the early portion of the interview process, I was constantly learning about the sex work industry while conducting the interviews. Participants in the interviews worked in a broad range of sex-work roles. For example, Kathrin and I spoke with erotic massage parlor workers, female dominants (referred to as “femdoms”), bondage specialists, and performance artists who were also kink-positive sex workers (typically meaning that the worker and their clients do not have binary gender identities). To explain their experiences, participants sometimes shared assets such as links to their performance videos or carefully described how their work was conducted. This type of interaction outside the interview conversation, through what I call additional research assets, was not something I anticipated. However, when participants chose to share these assets, I found that this improved my understanding of their experiences, which were very different from my own. Staying in Sync: Multilingual and Multi-interviewer Considerations

Throughout the interviewing process, Kathrin and I needed to ensure that we stayed in sync despite the fact that we were conducting interviews in different languages. To do so, we touched base after every five or so

117 BEHIND THE RED LIGHTS

interviews that we conducted to check interview length, briefly recap our findings, and talk through any issues that may have occurred. After shortening the interview protocol and discussing interview formality, we found it quite easy to stay in sync while interviewing, in contrast to other multilingual, multi-interviewer studies where I used a more complex process of interviewer training and syncing19 to ensure data from each interviewer were comparable. Bearing Witness: Interview Intensity

The greatest surprise for me while conducting the interviews was the emotional intensity of the experience. In each conversation, I was bearing witness to someone’s experiences of being sexually assaulted; of coming to terms with their own sexual preferences and cultural disapproval of those preferences; of finding joy in a community in which they were fully accepted, admired and appreciated; and more. While the vast majority (all but two) of the sex workers we interviewed primarily had positive experiences with sex work, there were still many intense experiences that shaped the participants’ paths to their work, their sense of safety, and their everyday lived experience of being sex workers. As a researcher asking questions about these experiences, especially to some participants who were not open or “out” about their work in the rest of their lives, I was bearing witness to deeply personal experiences while seeking answers to my less personal research questions. This required deep emotional work on my part—and on Kathrin’s part—in order to respond to the sharing of experiences empathetically and in a way that made participants feel heard but without biasing research data with too much interviewer commentary. While I am not sure there is research training—aside from social work or therapy training—that would have prepared me for this portion of interviewing, I discuss it here so that future researchers who plan to conduct similar studies can prepare themselves. In addition to thinking about how they might respond to the sharing of intense experiences, interviewers should consider how interview scheduling may affect their own well-being and their ability to bear witness to participants’ experiences properly. In prior, less personal interview studies, I have conducted five to eight interviews in a single day. In this study, I quickly discovered that I could do

118 BEHIND THE RED LIGHTS

at most four interviews with one-hour breaks in between. During these breaks, it was important for me to focus on self-care, which in my case included doing something completely mindless like watching TV or cooking to recover my stamina.20 NEXT STEPS AND LESSONS LEARNED

After finishing twenty-five interviews, Kathrin and I met to determine if we had reached data saturation. We summarized our high-level findings (which we had been doing regularly throughout the study) and found that we had reached saturation. As we had two more interviews scheduled, we finished those interviews and then closed recruitment. Best Laid Plans

That I needed to revise the recruitment materials significantly after the first round of street recruiting, stay up very late to schedule and reschedule participants, and shorten the interview protocol considerably after the first few interviews is evidence that you can only prepare so well. Despite aiming to send out recruitment emails in the morning and during a slow week so that I would have time to deal with adjustments and scheduling, with snowball samples you never know when you may end up having a flood of participants sign up for a study. Similarly, I had research materials reviewed by multiple collaborators and by a sex worker who was a paid consultant for the project, yet there is no substitute for real-world experience working to collect data. Thus, especially when working with understudied populations, it is best to prepare yourself for unexpected surprises and timelines. Offline Networks and Place Matter, Even for Digital Research

In addition to flexibility, support systems are very important, especially for work with marginalized communities and in cultural contexts different from your own. While designing and conducting this project, I was a visiting researcher at the Max Planck Institute for Software Systems in Saarbrucken, Germany, and then, for the bulk of the project, I was a visiting researcher at the University of Zurich. Initially, when I realized that emails to sex work organizations had recruited the most participants, I questioned

119 BEHIND THE RED LIGHTS

whether I even needed to be physically present in Switzerland and Germany to do this work. However, upon reflection, I realized that sense of place comes with multiple important gains for the research. Colleagues and contacts who were from the places in which I wanted to recruit provided contact information for some sex work organizations that were hard to locate online, context for the best places to do street recruitment, and companionship and safety in numbers for doing street recruitment. Further, discussing my project with and asking for help from colleagues and friends in these places allowed me to find Kathrin, who conducted sixteen interviews in German and translated all the study materials. Finally, as I moved forward with interview analysis, local colleagues helped me find someone who specializes in German-to-English interview transcription and translation. Finally, talking about my project with colleagues—particularly those who were located in Germany and Switzerland, where I was doing the research— provided me with an important source of support. While I did not anticipate the intensity of this research ahead of time, as discussed in the interviewing section, bearing witness to very personal experiences—even positive ones—while rapidly immersing oneself in an unfamiliar subculture can be extremely draining and intense. Because my Swiss and German colleagues were more familiar with the basics of how the sex industry worked, due to living in countries where sex work is legal, I did not have to provide context first or justify why it was important to address the needs of sex workers. This shared sense of culture and place allowed me to debrief with a wide variety of people, which was incredibly helpful in maintaining my well-being as a researcher and in gaining the interesting insights and perspectives that I did.

NOTES 1. E. M. Redmiles, J. Bodford, and L. Blackwell, “ ‘I Just Want to Feel Safe’: A Diary Study of Safety Perceptions on Social Media,” in Proceedings of the Thirteenth International AAAI Conference on Web and Social Media (2019). 2. M. C. Harrell and M. A. Bradley, Data Collection Methods: Semi-Structured Interviews and Focus Groups (Santa Monica, CA: Rand National Defense Research Institute, 2009), 147. 3. E. M. Redmiles, A. R. Malone, and M. L. Mazurek, “I Think They’re Trying to Tell Me Something: Advice Sources and Selection for Digital Security,” in 2016 IEEE Symposium on Security and Privacy (San Jose, CA: IEEE, 2016), https://doi.org/10.1109 /SP.2016.24.

120 BEHIND THE RED LIGHTS

4. E. M. Redmiles, M. L. Mazurek, and J. P. Dickerson, “Dancing Pigs or Externalities? Measuring the Rationality of Security Decisions,” in EC ,18: Proceedings of the 2018 ACM Conference on Economics and Computation (New York: Association for Computing Machinery, 2018), 215–232, https://doi.org/10.1145/3219166.3219185. 5. S. Cunningham and T. D. Kendall, Examining the Role of Client Reviews and Reputation Within Online Prostitution, vol. 1 (Oxford: Oxford University Press, 2016); S. Cunningham and M. Shah, “Decriminalizing Indoor Prostitution: Implications for Sexual Violence and Public Health,” Review of Economic Studies 85, no. 3 (July 2018): 1683–1715, https://doi.org/10.1093/restud/rdx065. 6. D. Freed et al., “Digital Technologies and Intimate Partner Violence: A Qualitative Analysis with Multiple Stakeholders,” Proceedings of the ACM on Human-Computer Interaction 1, no. CSCW, art. 46 (December 2017), https://doi.org/10.1145/3134681; T. Matthews et al., “Stories from Survivors: Privacy and Security Practices When Coping with Intimate Partner Abuse,” in CHI ,17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (New York: Association for Computing Machinery, 2017), 2189–2201, https://doi.org/10.1145/3025453.3025875. 7. E. M. Redmiles, “ ‘Should I Worry?’: A Cross-Cultural Examination of Account Security Incident Response,” in IEEE Security and Privacy (San Jose, CA: IEEE, 2019). 8. S. E. McGregor et al., “When the Weakest Link Is Strong: Secure Collaboration in the Case of the Panama Papers,” in Proceedings of the 26th USENIX Security Symposium (Vancouver, BC: USENIX, August 26, 2017), 19; S. E. McGregor et al., “Investigating the Computer Security Practices and Needs of Journalists,” 17. 9. T. Guberek et al., “Keeping a Low Profile? Technology, Risk and Privacy Among Undocumented Immigrants,” in CHI ,18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (New York: Association for Computing Machinery, 2018), https://doi.org/10.1145/3173574.3173688. 10. Redmiles, “ ‘Should I Worry?’ ” 11. ProCon, “How Many Prostitutes Are in the United States and the Rest of the World?,” accessed December 18, 2018, https://prostitution.procon.org/view.answers .php?questionID=000095. 12. A. Jones, “Sex Work in a Digital Era,” Sociology Compass 9, no. 7 (2015): 558–570, https://doi.org/10.1111/soc4.12282. 13. D. R. Thomas, “A General Inductive Approach for Analyzing Qualitative Evaluation Data,” American Journal of Evaluation 27, no. 2 (June 2006): 237–246, https://doi .org/10.1177/1098214005283748. 14. R. M. Groves, R. B. Cialdini, and M. P. Couper, “Understanding the Decision to Participate in a Survey,” Public Opinion Quarterly 56, no. 4 (1992): 475–495. 15. S. P. Kurtz et al., “Barriers to Health and Social Services for Street-Based Sex Workers,” Journal of Health Care for the Poor and Underserved 16, no. 2 (June 2005): 345–361, https://doi.org/10.1353/hpu.2005.0038. 16. G. Lidz, “Why Zurich Is Turning Its Red-Light District into a Drive-Through,” Newsweek, November 16, 2016. 17. M. Lecuyer et al., “Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization,” in 2017 IEEE Symposium on Security and Privacy (San Jose, CA: IEEE, 2017), 78–95, https://doi.org/10.1109/SP.2017.60. 18. A. N. Markham, Life Online: Researching Real Experience in Virtual Space (Lanham, MD: Rowman Altamira, 1998).

121 BEHIND THE RED LIGHTS

19. Redmiles, “ ‘Should I Worry?’ ” 20. In our discussions, Kathrin stated that she felt similarly about the intensity of the interview experience and the need for refreshing downtime between interviews.

REFERENCES Cunningham, S., and T. D. Kendall. Examining the Role of Client Reviews and Reputation Within Online Prostitution. Vol. 1. Oxford: Oxford University Press, 2016. Cunningham, S., and M. Shah. “Decriminalizing Indoor Prostitution: Implications for Sexual Violence and Public Health.” Review of Economic Studies 85, no. 3 (2018): 1683–1715. https://doi.org/10.1093/restud/rdx065. Freed, D., J. Palmer, D. E. Minchala, K. Levy, T. Ristenpart, and N. Dell. “Digital Technologies and Intimate Partner Violence: A Qualitative Analysis with Multiple Stakeholders.” Proceedings of the ACM on Human-Computer Interaction 1, no. CSCW, art. 46 (2017). https://doi.org/10.1145/3134681. Groves, R. M., R. B. Cialdini, and M. P. Couper. “Understanding the Decision to Participate in a Survey.” Public Opinion Quarterly 56, no. 4 (1992): 475–495. Guberek, T., A. McDonald, S. Simioni, A. H. Mhaidli, K. Toyama, and F. Schaub. “Keeping a Low Profile? Technology, Risk and Privacy Among Undocumented Immigrants.” In CHI ,18: Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. New York: Association for Computing Machinery, 2018. https:// doi.org/10.1145/3173574.3173688. Harrell, M. C., and M. A. Bradley. Data Collection Methods: Semi-Structured Interviews and Focus Groups. Santa Monica, CA: Rand National Defense Research Institute, 2009. Jones, A. “Sex Work in a Digital Era.” Sociology Compass 9, no. 7 (2015): 558–570. https:// doi.org/10.1111/soc4.12282. Kurtz, S. P., H. L. Surratt, M. C. Kiley, and J. A. Inciardi. “Barriers to Health and Social Services for Street-Based Sex Workers.” Journal of Health Care for the Poor and Underserved 16, no. 2 (2005): 345–361. https://doi.org/10.1353/hpu.2005.0038. Lecuyer, M., R. Spahn, R. Geambasu, T. Huang, and S. Sen. “Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization.” In 2017 IEEE Symposium on Security and Privacy, 78–95. San Jose, CA, 2017. https://doi.org/10.1109/SP.2017.60. Lidz, G. “Why Zurich Is Turning Its Red-Light District into a Drive-Through.” Newsweek, November 16, 2016. Markham, A. N. Life Online: Researching Real Experience in Virtual Space. Lanham, MD: Rowman Altamira, 1998. Matthews T., K. O'Leary, A. Turner, M. Sleeper, J. P. Woelfer, M. Shelton, C. Manthorne, E. F. Churchill, and S. Consolvo. “Stories from Survivors: Privacy and Security Practices When Coping with Intimate Partner Abuse.” In CHI ,17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 2189–2201. New York: Association for Computing Machinery, 2017. https://doi.org/10.1145/3025453.3025875. McGregor, S. E., P. Charters, T. Holliday, and F. Roesner. “Investigating the Computer Security Practices and Needs of Journalists.” In Proceedings of the 24th USENIX Security Symposium.

122 BEHIND THE RED LIGHTS

McGregor, S. E., E. A. Watkins, M. N. Al-Ameen, K. Caine, and F. Roesner. “When the Weakest Link Is Strong: Secure Collaboration in the Case of the Panama Papers.” In Proceedings of the 26th USENIX Security Symposium. ProCon. “How Many Prostitutes Are in the United States and the Rest of the World?” Accessed December 18, 2018. https://prostitution.procon.org/view.answers.php ?questionID=000095. Redmiles, E. M. “ ‘Should I Worry?’: A Cross-Cultural Examination of Account Security Incident Response.” In 2019 IEEE Symposium on Security and Privacy. IEEE, 2019. Redmiles, E. M., J. Bodford, and L. Blackwell. “ ‘I Just Want to Feel Safe’: A Diary Study of Safety Perceptions on Social Media.” In Proceedings of the Eleventh International AAAI Conference on Web and Social Media. 2019. Redmiles, E. M., A. R. Malone, and M. L. Mazurek. “I Think They’re Trying to Tell Me Something: Advice Sources and Selection for Digital Security.” In 2016 IEEE Symposium on Security and Privacy, 272–288. San Jose, CA, 2016. https://doi.org/10.1109 /SP.2016.24. Redmiles, E. M., M. L. Mazurek, and J. P. Dickerson. “Dancing Pigs or Externalities? Measuring the Rationality of Security Decisions.” In EC '18: Proceedings of the 2018 ACM Conference on Economics and Computation, 215–232. New York: Association for Computing Machinery, 2018. https://doi.org/10.1145/3219166.3219185. Thomas, D. R. “A General Inductive Approach for Analyzing Qualitative Evaluation Data.” American Journal of Evaluation 27, no. 2 (2006): 237–246. https://doi.org /10.1177/1098214005283748.

Chapter Six

USING UNEXPECTED DATA TO STUDY UP Washington Political Journalism (and the Case of the Missing Press Pass) NIKKI USHER

For the past decade, I have been studying elite news media and elite actors associated with news innovation and technology, examining some of the most prestigious newsrooms in the English-speaking world—the New York Times, the Wall Street Journal, the Guardian, the BBC, and start-ups like Medium and Vox. My quest has been to understand the constraints and the routines that impact decision-making in journalism, which, in turn, are incredibly consequential for the news that we see. I have done this research primarily through ethnographic fieldwork and interviews, including an ethnography of the New York Times’ digital transition1 and a book exploring hacker culture and programming in journalism.2 However, there are limits to how far this approach can take you, particularly when you are interested in studying people and institutions that have more power than you do as an academic researcher, sometimes dubbed “studying up.”3 My ongoing research on Washington political journalists has presented difficulties that have prompted me to explore what might not be typically considered data by qualitative researchers—a methodological innovation I call unexpected data. I introduce some of these alternative sources of data here and show how they can be used by the researcher as a solution for situations where interview-based research and fieldwork are not possible. This chapter provides a way to think about moving beyond the site of the fieldwork itself and even beyond the traditional limits of

124 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

qualitative research to generate insights about the social construction of knowledge via elite actors. Access to powerful institutions and powerful people is an old problem, but there are new challenges, too. Studying elite actors (not to mention doing fieldwork more generally) has gotten harder in an increasingly litigious society,4 as organizations are worried that an outsider might see behavior that would lead them to file a lawsuit or give readers of academic work reason to do so. Journalists exhibit a new sense of caution—even in the United States, being a journalist has gotten more dangerous5—and they worry that being observed will expose their vulnerabilities when the news industry faces extreme financial instability. Other new problems emerge even if you are able to get access. Much of what I might observe within a news organization is now happening digitally, meaning that conversations that could once be heard and meetings that could once be observed are now happening over email and increasingly on digital collaboration platforms like Slack or on Google Docs. While my proposal for unexpected data comes out of my experience as a scholar in journalism studies, how I dealt with the challenges I address is likely to be helpful to scholars who study areas where research participants hold considerably more institutional and social power than the researchers. First, I will explain why I became interested in this research and how it presented new problems. Then I will discuss strategies associated with studying up by suggesting different sources of data aside from fieldwork and interviews: objects, public and semipublic documents, and public and semipublic digital trace data. Through the steps of recovering a digital image of a press pass and through the example of a press pass more generally, I will show how material objects and public and semipublic digital trace data can supplement and bolster fieldwork. THE CASE AND ITS CHALLENGES

Despite years of doing research on elite newsrooms while based in Washington, DC, I had generally avoided specifically studying Washington political journalists. Their focus on the day-to-day activities in the White House and Congress frustrated me, as did the tiresome source-journalist “tango”: political elites need to stay in the news to establish their political currency and remain in the public eye, and journalists need access to these elites for

125 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

their reporting, so access is often “traded” for favorable news coverage.6 I was similarly bothered by the cult of personality and the penchant of political journalists for “palace intrigue” (stories about White House personalities) as well as horse-race journalism, the winner/loser frame found so often in political journalism.7 However, I found myself reconsidering research on political journalism. The combination of journalism’s increasingly disconcerting financial sustainability and the 2016 election of Donald Trump made research into Washington journalism seem both more relevant and perhaps more dynamic than I had previously considered. President Trump had dredged up an old mantra employed by the Republican Party,8 attacking Washington media as “coastal elites” and, worse, “fake news.” Public opinion polls about trust in journalism had reached all-time lows, and so many journalists living and working in Washington seemed dreadfully out of touch with the American public, especially in presuming a clear win by Hillary Clinton. And while the financial outlook for local newspapers was growing worse, Washington journalism and national news outlets were enjoying a “Trump bump,” benefiting both financially and reputationally from the increased public interest in what was happening in Washington.9 Now I wanted answers to these questions: Were these journalists indeed “media elites”? Who gets to live and work in Washington as a journalist? Are their lives and experiences indeed shaped by the Washington “Beltway Bubble”? What kind of work practices, norms, and values inform the way U.S. political journalists do their work? Of course, these are not new questions, and they lend themselves to observational data gathering: fieldwork to help understand both the functional aspect of how your research participants do what they do and the more phenomenological considerations of how your research participants make sense of the world; how they establish values, norms, and routines; and what the implicit and explicit structures are that influence this process.10 For the host of reasons outlined earlier, from access to litigiousness, these journalists have generally not been studied in the contemporary media environment through qualitative approaches that rely on firsthand observations and interviews.11 Rather, scholars have generally considered questions of news frames and the agenda-setting power of the news media via surveys, experiments, and content analysis.12 This presented an opportunity, then, for a scholarly intervention. I wanted to understand political journalists as a professional subcategory;

126 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

however, I was looking at a case that was far different from my other research on elite news media. Earlier I had focused on some element of how digital changes were being adopted by newsrooms, including such innovations as the prioritization of digital news production over legacy media like newspaper or radio,13 the rise of web analytics,14 and the rebuilding of the physical architecture in newsrooms.15 I knew from previous research that newsroom observation would have limited utility; at best, I could observe a few news organizations and a few political journalists at a time—if the journalists were even in the newsroom at all. Political journalists, especially reporters, were more likely to be on location at the White House, in Congress, at the Supreme Court, or in other corridors of power where access is limited to those with appropriate credentials (e.g., a White House press pass). To study these journalists at scale and to understand their peer-topeer dynamics, I would need to go beyond the newsroom, but beyond the newsroom was almost certainly off-limits for sustained observation. After much negotiation and some luck, I was able to secure access to the Senate Press Gallery for a single day, an experience that highlighted the challenges of doing fieldwork on political journalism. Press credentials to the House and Senate are strictly controlled, and journalists are approved by vote of the Standing Committee of Correspondents, an elected committee of other Washington journalists. Access to the White House is controlled by the White House, a situation that stymies even professional journalists.16 I was not eligible for a press pass, so, at most, I could get what is called a “day pass” for a single day’s access to the Senate Press Gallery, thanks to the kindness of a research participant for a related study who was on the board of the Standing Committee of Correspondents and could approve the temporary pass. This exception, made for me, to the standard operating procedure, which allows only full-time journalists access to the congressional galleries, highlights the challenges of doing fieldwork on Washington political journalism. Regular observation for the purpose of understanding daily routines was simply not going to be possible. Not only was access a challenge, but also the composition of journalists at each of these locations varies, from what news outlets are represented to the journalists on the specific story for the day. Even if I could be present, systematic observation of the same people in the same places every day for a sustained period, an expectation in much of the writing about ethnographic practice, would be impossible.17

127 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

But as Markham18 succinctly posits, “Fieldwork, as a specific intervention practice, is work done in the field, close to people.” If you are not going to be able to do what is conventionally understood as fieldwork, iteration is required. Thinking that a robust self-made methodology might be required, for renewed inspiration I turned to the now classic work “Up the Anthropologist,” Nader’s19 call to action to study the powerful. UP THE ANTHROPOLOGIST FOR COMMUNICATION RESEARCH IN THE DIGITAL ERA

Much of the ensuing literature in dialogue with Nader’s20 call has been about the position of the researcher relative to those studied or, as Hannerz21 rounds up, “studying down, up, sideways, through, backwards, forwards, away and at home.” But there is another important aspect of Nader’s argument that has been missed by many scholars: for those studying up, the traditional anthropological standard of immersive, years-long observation is simply not tenable. New ways of thinking about both fieldwork and data are thus required, and Nader suggested some possible sources of data that could support qualitative research on the powerful; the relevance of these data in a digital era is worth revisiting—and thus my focus in this piece. Nader provided two important rejoinders: first, that the length of time in the field does not correlate with the quality of research and, second, that other sources of empirical, qualitative data might support—or even supplant—fieldwork. Nader pointed out that to study people in powerful places for the length of time typically expected by anthropology is just unreasonable. In our contemporary era, the expectation that a PhD student might well spend a full calendar year, if not more, in the field is standard, and short-term ethnography is dismissed as “quick and dirty” and diminished as “blitzkrieg ethnography.”22 The debate is shifting, though, and sociologists have argued for “focused ethnography” or “rapid ethnography” and have suggested that indeed, insofar as conducting theoretically informed ethnographic research, it is not the time but the data and the intensity of the ethnographic experience that matter.23 In previous projects, I had drawn from Nader to think about how to answer this concern about time spent in the field. In Interactive Journalism,24 I proposed my own approach, hybrid ethnography, as a way to think about short-term ethnography across multiple sites, combining rapid

128 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

successions of interviews with strategies about observation, along with documents, photos, emails, and other data—even just ephemera like pencils or T-shirts. Then I would often meet my research participants in other professional settings and start following them on Twitter, and the relationship would endure, digitally and in person, far beyond our initial period of contact.25 At this point, I had not yet determined how to deal with this other layer of data systematically, but the seeds had been planted for proposing a methodological approach using unexpected data. Nader26 also suggested ways to augment qualitative data about powerful people that can be extended and enhanced in the digital era. She proceeded to suggest a series of data collection strategies that anticipates online ethnography: personal memoirs and documents, public relations documents that present the “preferred self-image” of an organization, and internal strategy documents. Predicting the justification often used for social network analysis today,27 she argued that work history, biographical information, and social and professional networks are especially important. Giving the example of a Washington law firm, she reasoned that even after a lawyer leaves the firm, the networks of association made there remain trenchant for analysis, drawing a parallel to the kinship networks studied by other sociologists.28 Anticipating critiques that such methods would be derided as “journalistic,” she noted that these strategies are indeed not only critical but also methodologically appropriate for overcoming the obstacles faced when studying those in power. In short, she makes the case for drawing on what I call unexpected data—data that you might not otherwise think would be part of ethnographic data gathering and subsequent analysis. Of course, scholars have started to think through these issues and, in some cases, have quite successfully adopted new methodological approaches. To deal with observing those working in a digital environment, Leonardi29 pioneered a careful, deliberate process for capturing screenshots while shadowing in a workplace and then embedding them in field notes so as to provide full context. He also suggested ways to track digital actions taken by the research participant and then used each action as a unit he could translate into statistical techniques, as per Becker.30 Similarly, in order to create more longitudinal opportunities for observation of flash mobs, which in physical form are intentionally fleeting and ephemeral, Molnár and Hsiao31 suggested combining digital trace data by building a database of Google videos of flash mobs. This enabled the authors to broaden their

129 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

sample and chart patterns over time. These are examples of unexpected data—for Leonardi, screenshots and for Molnár and Hsiao, the metadata of online videos. Thus, unexpected data might include objects; public and semipublic documents; semipublic observational opportunities; biographical data; public digital trace behavior such as social media, blog posts, and product reviews; and inductive computational approaches like looking at a known set of Twitter accounts to observe online behavior at scale. These data are unexpected in part because they are not traditionally thought of, particularly collectively, as potentially contributing to an ethnographic project and in part because they do yield insights but often after leaving the field. Through my own case of Washington political journalism, I aim to build on existing work that has attempted to solve some of the problems I faced by offering the unifying concept of unexpected data for qualitative research. Material Objects: The Press Pass as an Example

Studying people in places also means studying the people and the things in those places or, as Marcus32 explains, “to examine the circulation of cultural meanings, objects, and identities in diffuse time-space.” Perhaps more dramatically, as Molotch33 argues, “Objects are storehouses of the tacit ‘documents of life,’ ” and even more so, “talking to people in the presence of their stuff and watching them interact with artifacts  .  .  . provides a route into the understanding and orderings of mechanisms of life.” Yet this approach has been vastly underused in ethnographic work. Even Goffman,34 who used objects as a way to understand status and interpersonal interaction, nonetheless failed to interrogate the objects on their own terms, whereas Molotch35 emotes, “What kind of quote? What kind of chair?” Thus, while scholars have been advocating for the inclusion of objects in fieldwork, it has yet to become common practice to a meaningful extent, even though it could be particularly useful for places where access is restricted or for research sites that themselves vary from day to day; as I show shortly, a press pass had tremendous value for my analysis. In historical research, the material turn has been well developed: what were once unexpected objects for analysis—locks of hair, keepsakes, baskets, purses, and beyond—are now seen as leading to critical insights, particularly for those who have left few traditional historical records in their

130 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

own voices.36 In anthropology, Laviolette37 explains that objects hold much insight for scholars interested in questions of affect and culture and stresses that there is an element of serendipity in both what those objects are and what they tell us: Things make people as much as people make things. Serendipitously, things often tell more about people than people themselves can actually tell us about those things. The relationships we develop and share with a tangible arena of artworks, buildings, infrastructures, monuments, relics and everyday trinkets varies from the remote to the intimate, from the fleeting to the durable, from immediate to mediated, from the passive to the passionate, from the philosophized to the commonsensical.

Certainly, in science, technology, and society studies—particularly in actor-network theory approaches—the objects themselves become a central point of inquiry—though these objects tend to be deliberately chosen at the outset of the investigation.38 This is even the case with Turkle’s39 “evocative objects,” where the everyday and mundane are intentionally the starting point for analysis. Rather, I suggest the objects of unexpected data are not the focus of the research question but are the ancillary background items, the ephemera, of everyday life.40 While there has been a turn in communication toward “ordinary studies,”41 journalism studies has only started to see this potential. In particular, my subfield has defined what objects interest us at the outset of a research question and thus far has tended to focus far more on questions of negotiating tensions of digital innovation.42 Typical examples include news apps, content management systems, social media adaptation, and virtual reality viewers. However, the serendipity of what might be called backstage objects and what they can tell us, rather than the frontstage objects of news innovation, has received less attention. When journalism studies has considered objects in this light, it has been primarily through historically driven analysis rather than in situ observation. For example, scholars have looked at photographs that provide key insights into the routines and processes of news work in order to understand work routines.43 In the context of studying contemporary political journalists, these unexpected objects can be observed in situ and might include press passes, campaign ephemera, desk toys, buttons, pins, postcards, lanyards,

131 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

notebooks, clippings (e.g., cutout articles, photos), clothing, menus, wall hangings, and beyond. Digital examples might include tweets (and not just tweets but also photos and selfies posted), Spotify playlists and public Goodreads lists of favorite books, Facebook posts, restaurant reviews, and electronic party invitations. In my case, I did not expect a press pass to yield the insights that it did. It was “unexpected”—I did not think about how it could help me answer a critical question raised after I had already left the field. Typically, it is the credentialing process that receives analytical attention from journalism scholars, as it encompasses the defined rules that set professional boundaries for what type of journalist with what qualifications gets access to restricted places.44 I was not deliberately thinking about how the material object of the press pass—the way it looks, its size, etc.—could provide insights, though now I argue that such ephemera should not provide accidental insights but instead should be folded deliberately into data collection. How the materiality of the press pass as an object came to matter is an instructive story—and unexpected. The first paper I prepared with my team on Beltway journalism looked at the peer-to-peer gender dynamics of political journalists on Twitter.45 Initially, I used the press pass in a fairly routine way as a justification for my sample. The political journalists we were studying on Twitter comprised a cohesive, purposive sample because they all had been credentialed by their peers on the Standing Committee of Correspondents and were listed by name in the 116th Congressional Directory. They also had to, by definition, live in the Washington area defined by the Beltway. Certainly, the debate over who gets a press pass is important for media law scholars,46 but the object of the press pass itself—the hard card worn around journalists’ necks—was not yet part of my analysis. In the paper’s peer review process, a reviewer raised the concern that while all these journalists were certainly elite Washington journalists, there were differences among them in terms of their status (some worked for more elite outlets), the type of media outlet for which they worked, their job title, and so forth. More specifically, could we really say that a New York Times journalist and a journalist working for a specialty medical device trade press outlet were comparable? I worried that indeed all the work we had done had perhaps been for naught. I returned to my field notes from my day of observing in the Senate Press Gallery to see if there was a way to respond to this critique. I recalled

132 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

that when I was observing and interviewing, I had to keep asking journalists where they worked. Even from as close as two feet away, I could not see their names or news outlets on their passes. I could not immediately tell a Times reporter or a Politico journalist from a member of the trade press there to report on nuanced issues like medical devices. In the halls of Congress, all journalists are of roughly the same stature—indeed, daily congressional reporting was highly opportunistic; in the crowded scrums, gaggles, and stakeouts (terminology for ad hoc reporting attempts), a journalist was lucky if they got to ask a question. It would be fair to group them together in any sample where I was trying to generalize about the press corps, but I did not have in my field notes any specific details about what the press pass looked like. The humble press pass was important for justifying my sample—not because of who got one and who did not but because it was brown and hard to read. However, there was just a jotting in my field notes about press passes, later put into sentence form as “press passes were hard to read. Brown.” To figure out what the press pass itself looked like, I would need to turn elsewhere, and this is where public digital data and semipublic digital data generated by the research participants themselves can be helpful. Digital Trace Crumbs and Digital Publicity

Sometimes when I am writing and analyzing my data, I realize that I have missed a critical detail like the above description of a press pass. To avoid what is called “missing qualitative data,”47 data that did not exist to be gathered or data that a researcher had forgotten to collect, scholars have turned to a number of tactics that take advantage of new technologies. They now use digital tools to record data from the field site and consider the public and semipublic digital data generated by the research participants themselves, often on social media. Tools like screenshots and photos, as Leonardi48 explained, can provide essential support for data analysis and lead to unexpected discoveries as well as fill in functional gaps in sometimes unexpected ways. Even though I did not have a description in my notes of what the press pass looked like, I hoped that I had perhaps taken a photo of one at the field site. Thus, I will explain how my need to describe the press pass drew on this kind of data, and as a result, I have now tried to systemize this approach as one element of the unexpected data approach:

133 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

while you may not know exactly which data you will need in the process of analysis, you can plan for the data that you might want—sometimes by taking into account what you have used in the past or what strikes you as simply interesting in the moment, without any real academic rationale presenting itself just yet. I had begun the practice of taking photos of my field sites after writing my first book, Making News at the New York Times.49 During the writing process, I realized that my field notes lacked the kind of detail that made a book interesting to read—particularly descriptions of surroundings. After the approximately ninetieth email to a trusted confidant (this email was to double-check the color of the second-floor carpet), I realized this could not happen the next time I wanted to write a book. Thankfully, just as I was about to enter the field for new research, Pinterest had become popular. I began using it to capture visual data,50 and in most cases, after a day in the field I now dutifully upload photos and append captions.51 This becomes part of translating jottings into field notes, though in a photographic form. Like Leonardi,52 I am not sure how I would explain this to the Institutional Review Board, but I avoid focusing on faces and get general shots of surroundings—the details I may later need either for analysis or for writing. While doing my fieldwork at the Senate, I took over two hundred photos, about sixty of which I tagged and placed on Pinterest, with captions containing the details that reminded me of why I took each photo. Unfortunately, none of these included a press pass. Fortunately, there were other possible ways to get a photo of a press pass. Nader53 suggested using insider accounts, industry documents, and personal memoirs that have various degrees of publicity and perceived public audiences to augment fieldwork. In the case of Washington journalists, some forty years later, digital equivalents include email newsletters; some are free, others have a subscription cost, and still others require approval by an administrator to subscribe. The digital trace data of the research participants themselves include social media posts and biographical selfpresentations found on personal or professional websites or professional networking sites. These types of data offer rich, underexplored opportunities for inquiry and ethnographic triangulation—and because they can be collected and drawn upon later, the data serve not just to fill in gaps but also to aid in the interpretive work. Because I was now aware that I was going to use the object of the press pass rather than the credentialing process to justify my argument about

134 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

my sample, I realized I was in new methodological territory. I decided to experiment deliberately with using these forms of digital data to see if I could find a photo of a press pass. So I thought about how these different types of data, when applied to the case of Washington journalism, might provide the kinds of clues Nader suggested are waiting to be discovered. Each morning insider DC outlets put out email newsletters. These include notes about media gossip in addition to the rundown of big stories from the last twenty-four hours and predictions about what might lead the news for the morning. The “Morning Media” report from Politico contains helpful insights into new jobs and new appointments. If I could find someone who had recently moved to Washington, DC, to work as a political reporter, I could get their name and then turn to more private but still semipublic data they had generated themselves to see if their Facebook public posts or Twitter might have a photo of their new pass. A semiprivate email list could also help with a name of a new arrival to Washington. I was part of a “secret” DC list called DC-lady-journos, a moderator-approved Google group for female journalists in Washington, technically off-the-record but generally used for job announcements (a semipublic listserv). Hopefully, I would see someone new to the area, and thus newly credentialed, requesting to join the group (while moderators could be emailed privately, often current members emailed the list vouching for potential new additions), someone who had perhaps taken a picture of their new press pass and posted it on social media. This approach worked, as I’ll explain next. The digital trace data of Washington journalists, such as their social media posts and biographical self-presentations found on personal or professional websites, seemed particularly promising. Over time, a portrait of a journalist going far beyond their work unfolds, adding depth for researchers seeking to make claims about how the habitus of living and working in Washington might well impact the contours of political elitism. This may be particularly true for political journalists, who might be considered “micro celebrities,”54 balancing presentations of their authenticity, an imagined audience of known and unknown followers, and the selfpresentation of being “on brand.” Professional semipublic networking sites such as LinkedIn seemed promising. LinkedIn provides the twenty-firstcentury shortcut to understanding the professional networks of a would-be research participant—a shortcut for Nader’s call to think about the networks of lawyers who leave their firm but retain social and professional

135 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

links. Similarly, Twitter is public for those who keep public accounts, but the context collapse of Twitter and other forms of social media55 means that journalists also use Twitter for a branding strategy: forms of self-disclosure and distinct social networks now are visible and overlap in ways they did not before, so what you post in your downtime on your public Twitter timeline is seen not just by your friends but also by your professional networks. Part of this strategy is showing “more” than just their journalism as a way to show they are authentic people rather than just worker bees. Quite literally, a picture a journalist posts of her avocado toast on Twitter is indeed a less invasive version of “digital bread crumbs”56 left for the researcher. Given how essential Twitter is to political journalism,57 I figured that Twitter would likely be an ideal first stop in looking for a picture of the press pass. With an advanced search, I had located Michelle Prince, who had indeed posted a picture of her newly minted press pass for her followers.58 As I remembered, the badge was just a dull brown, laminated pass that read “United [hole punch] Senate and House of Representatives News Galleries” in Times New Roman Italics; the easiest bit to read was the large expiration date of the pass. The vertical fine print against a white background detailed the journalist’s news organization, and the journalist’s name was visible in a close-up photo, but as I noted earlier, it was unreadable from even a short distance. I could now provide a more detailed answer to the reviewer who had suggested that status in Washington reporting demolished my claims about my sample. I had found a way to justify my sample by explaining not only how the credentialing process designated a single elite level of professional journalist but also how the design of the press pass itself eliminated significant status differentials among the members of the press corps. This process of finding the press pass, then, shows the ways in which different kinds of public and semipublic digital data can be used to fill in the gaps in data from field sites and can in fact on their own terms provide invaluable insights for a researcher. Not every subfield or occupation or area of inquiry will have the density of this kind of data, but chances are the ones involving the elite study-up types do. Objects and the digital trace data I used in this case are a starting point; the unexpected data approach can be even further augmented by computational analysis of social media data. When they cannot observe behavior firsthand, ethnographers can use computational data to answer inductive questions that would typically require in-person or firsthand observational

136 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

data. Later I would create a purposive sample of Twitter accounts of credentialed congressional journalists, and using this known sample and relying on existing observational and theoretical knowledge about political journalists, a colleague and I would “interview the data.” This research is ongoing, the first installment of which can be found in the International Journal of Press/Politics,59 along with a much abbreviated reference to how the press pass justified the sample but nothing about our hunt for an image of one online. The process for working with computational social scientists more generally is a methods article in its own right, to be continued. UNEXPECTED DATA AND SERENDIPITY

Methodological innovation comes from solving problems, either ones presented at the outset or ones found during the analysis. Ongoing research on Washington journalism presented me with new dilemmas to solve— the problem being that to do fieldwork, as it is traditionally understood, requires being in the field near people but that access in this particular case is extremely limited. Even if you can gain access, this can be of limited utility. Observational work is particularly complicated when you cannot see or hear interactions because they are taking place digitally. Thus, to build on the limitations of data, I have suggested opportunities for thinking about different kinds of data and being open to the serendipity that can come from realizing that data can come from unexpected origins. More access and more mechanisms for observing interactions in the field without being in the field are available than perhaps have previously been considered. Overall, I am arguing for rethinking qualitative methods in an adaptive, iterative way. My need to innovate came from my interest in studying up, but unexpected data can be employed in many domains. Material objects, especially those that are not at the center of a research inquiry, can provide critical clues for scholars. Digital trace data from public and semipublic sources can provide additional information on anything from daily routines to institutional insights. However, the formulation of unexpected data that brings together these approaches is new—if not in triangulation, then in spirit. These data can be helpful in unexpected ways at unexpected times. The key to making these innovations useful is to move from the unanticipated discovery of possible sources of data to the planned serendipity of surprises that can be found through the deliberate collection of unexpected data.

137 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

NOTES 1. N. Usher, “Reshaping the Public Radio Newsroom for the Digital Future,” Radio Journal: International Studies in Broadcast and Audio Media 10, no. 1 (2012): 65–79. 2. N. Usher, Making News at the New York Times (Ann Arbor: University of Michigan Press, 2014). 3. V. Molnár and A. Hsiao, “Flash Mobs and the Social Life of Public Spaces: Analyzing Online Visual Data to Study New Forms of Sociability,” in Digital Research Confidential: The Secrets of Studying Behavior Online, ed. E. Hargittai and C. Sandvig (Cambridge, MA: MIT Press, 2015), 55–78. 4. F. Le Cam, “Photographs of Newsrooms: From the Printing House to Open Space Offices. Analyzing the Transformation of Workspaces and Information Production,” Journalism 16, no. 1 (2015): 134–152. 5. See the U.S. Press Freedom Tracker, https://pressfreedomtracker.us/. 6. R. M. Entman, Projections of Power: Framing News, Public Opinion, and U.S. Foreign Policy (Chicago: University of Chicago Press, 2004). 7. W. L. Bennett, R. G. Lawrence, and S. Livingston, When the Press Fails: Political Power and the News Media from Iraq to Katrina (Chicago: University of Chicago Press, 2008). 8. N. Hemmer, Messengers of the Right: Conservative Media and the Transformation of American Politics (Philadelphia: University of Pennsylvania Press, 2016). 9. J. Lepore, “Does Journalism Have a Future?,” New Yorker, January 29, 2019, https:// www.newyorker.com/magazine/2019/01/28/does-journalism-have-a-future. 10. C. Geertz, The Interpretation of Cultures: Selected Essays (New York: Basic Books, 1973). 11. Though see A. Davis, “Journalist-Source Relations, Mediated Reflexivity and the Politics of Politics,” Journalism Studies 10, no. 2 (2009): 204–219; R. G. Lawrence, “Campaign News in the Time of Twitter,” in Controlling the Message: New Media in American Political Campaigns, ed. V. A. Farrar-Myers and J. S. Vaughn (New York: New York University Press, 2015), 93–112. 12. Entman, Projections of Power; S. Iyengar and D. R. Kinder, News That Matters: Television and American Opinion (Chicago: University of Chicago Press, 2010). 13. Usher, “Reshaping the Public Radio Newsroom”; Usher, Making News; N. Usher, Interactive Journalism: Hackers, Data, and Code (Urbana: University of Illinois Press, 2016). 14. N. Usher, “Al Jazeera English Online: Understanding Web Metrics and News Production When a Quantified Audience Is Not a Commodified Audience,” Digital Journalism 1, no. 3 (2013): 335–351. 15. N. Usher, “Newsroom Moves and the Newspaper Crisis Evaluated: Space, Place, and Cultural Meaning,” Media, Culture and Society 37, no. 7 (2015): 1005–1021. 16. K. Kiely and M. McCurry, “The White House Shouldn’t Be in Charge of Granting Press Credentials,” Washington Post, November 18, 2018, https://www.washington post.com/opinions/the-white-house-shouldnt-be-in-charge-of-granting-press -credentials/2018/11/21/e5e5bc76-ed10-11e8-8679-934a2b33be52_story.html. 17. S. Pink and J. Morgan, “Short-Term Ethnography: Intense Routes to Knowing,” Symbolic Interaction 36, no. 3 (2013): 351–361. 18. A. Markham, “Fieldwork in Social Media: What Would Malinowski Do?,” Qualitative Communication Research 2, no. 4 (2013): 437.

138 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

19. L. Nader, “Up the Anthropologist—Perspectives Gained from Studying Up,” in Reinventing Anthropology, ed. D. Hymes (New York: Vintage Books, 1974), 284–311. 20. Nader, “Up the Anthropologist.” 21. U. Hannerz, “Studying Down, Up, Sideways, Through, Backwards, Forwards, Away and at Home: Reflections on the Field Worries of an Expansive Discipline,” in Locating the Field: Space, Place, and Context in Anthropology, ed. S. Coleman and P. Collins (New York: Berg, 2006), 23. 22. Pink and Morgan, “Short-Term Ethnography,” 352; J. Hughes et al., “The Role of Ethnography in Interactive Systems Design,” Interactions, April 1995, 57–65; R. C. Rist, “Blitzkrieg Ethnography: On the Transformation of a Method Into a Movement,” Educational Researcher 9, no. 2 (1980): 8–10. 23. H. Knoblauch, “Focused Ethnography,” in Forum qualitative sozialforschung [Forum: Qualitative social research] 6, no. 3, art. 44 (2005); D. R. Millen, “Rapid Ethnography: Time Deepening Strategies for HCI Field Research,” in DIS ’00: Proceedings of the 3rd Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques (New York: Association for Computing Machinery, 2000), 280–286. 24. Usher, Interactive Journalism. 25. E.g., P. Galison, Image and Logic: A Material Culture of Microphysics (Chicago: University of Chicago Press, 1997). 26. Nader, “Up the Anthropologist.” 27. A. Marin and B. Wellman, “Social Network Analysis: An Introduction,” in The SAGE Handbook of Social Network Analysis, ed. John Scott and Peter J. Carrington (London: SAGE, 2011), 11–25. 28. J. Scott, “Social Network Analysis,” Sociology 22, no. 1 (1988): 109–127. 29. P. M. Leonardi, “The Ethnographic Study of Visual Culture in the Age of Digitization,” in Digital Research Confidential: The Secrets of Studying Behavior Online, ed. E. Hargittai and C. Sandvig (Cambridge, MA: MIT Press, 2015), 103–138. 30. H. S. Becker, “The Epistemology of Qualitative Research,” in Essays on Ethnography and Human Development, ed. R. Jessor, A. Colby, and R. Schweder (Chicago: University of Chicago Press,1996), 53–71. 31. Molnár and Hsiao, “Flash Mobs.” 32. G. E. Marcus, “Ethnography in/of the World System: The Emergence of Multi-sited Ethnography,” Annual Review of Anthropology 24, no. 1 (1995): 98. 33. H. Molotch, “Objects in Sociology,” in Design Anthropology: Object Cultures in Transition, ed. A. Clark (New York: Bloomsbury, 2018), 19, citing K. Plummer, Documents of Life: An Introduction to the Problems and Literature of a Humanistic Method (London: SAGE, 2001 [1983]). 34. E. Goffman, Relations in Public (New York: Harper Collins, 1971). 35. Molotch, “Objects in Sociology,” 20. 36. L. T. Ulrich, The Age of Homespun: Objects and Stories in the Creation of an American Myth (New York: Vintage, 2009). 37. P. Laviolette, “Introduction: Storing and Storying the Serendipity of Objects,” in Things in Culture, Culture in Things: Approaches to Culture Theory, ed. A. Kannike and P. Laviolette (Tartu, Estonia: University of Tartu Press, 2013), 13. 38. B. Latour, Reassembling the Social: An Introduction to Actor-Network-Theory (Oxford: Oxford University Press, 2007).

139 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

39. S. Turkle, ed., introduction to Evocative Objects: Things We Think With (Cambridge, MA: MIT Press, 2007), 5. 40. C. Hine, “Multi-sited Ethnography as a Middle Range Methodology for Contemporary STS,” Science, Technology, and Human Values 32, no. 6 (2007): 652–671. 41. L. Humphreys, The Qualified Self: Social Media and the Accounting of Everyday Life (Cambridge, MA: MIT Press, 2018). 42. E.g., P. J. Boczkowski, “The Material Turn in the Study of Journalism: Some Hopeful and Cautionary Remarks from an Early Explorer,” Journalism 16, no. 1 (2015): 65–68. 43. S. Keith, “Horseshoes, Stylebooks, Wheels, Poles, and Dummies: Objects of Editing Power in 20th-Century Newsrooms,” Journalism 16, no. 1 (2015): 44–60; Le Cam, “Photographs of Newsrooms.” 44. Y. Benkler, “A Free Irresponsible Press: Wikileaks and the Battle Over the Soul of the Networked Fourth Estate,” Harvard Civil Rights-Civil Liberties Law Review 46 (2011): 311–382. 45. N. Usher, J. Holcomb, and J. Littman, “Twitter Makes It Worse: Political Journalists, Gendered Echo Chambers, and Amplification of Gender Bias,” International Journal of Press/Politics 23, no. 3 (2018): 324–344. 46. Benkler, “A Free Irresponsible Press.” 47. J. R. Cutcliffe and H. G. Harder, “Methodological Precision in Qualitative Research: Slavish Adherence or ‘Following the Yellow Brick Road?,’  ” Qualitative Report 17, no. 41 (2012): 1–19. 48. Leonardi, “The Ethnographic Study of Visual Culture.” 49. Usher, Making News. 50. P. Howard and N. Usher, “Why We Like Pinterest for Fieldwork,” Tow Center Blog/ Social Media Collective, July 14, 2014, https://socialmediacollective.org/2014/07/14 /why-we-like-pinterest-for-fieldwork/. 51. I try to respect confidentiality, but I also let my gatekeepers to field access know that I will be taking photographs and may post them on my Pinterest page. I provide the link to the page to these gatekeepers and have in the past taken down photos from public view upon request (one example was a detail shot of digital performance analytics at a newspaper with the exact numbers visible). 52. Leonardi, “The Ethnographic Study of Visual Culture.” 53. Nader, “Up the Anthropologist.” 54. A. E. Marwick and d. boyd, “I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience,” New Media and Society 13, no. 1 (2011): 114–133. 55. Marwick and boyd, “I Tweet Honestly.” 56. T. J. Miller, “Surveillance: The ‘Digital Trail of Breadcrumbs,’ ” Culture Unbound: Journal of Current Cultural Research 2, no. 1 (2010): 9–14. 57. P. Hamby, “Did Twitter Kill the Boys on the Bus? Searching for a Better Way to Cover a Campaign,” Discussion Paper D-80, Harvard University, Joan Shorenstein Center, Cambridge, MA, 2013, https://shorensteincenter.org/wp-content/uploads/2013/08 /d80_hamby.pdf. 58. Michelle Price, “I got my congressional press pass today . . . ,” Twitter, September 19, 2017, 6:31 PM, https://twitter.com/michelleprice36/status/910270201140215810. 59. Usher, Holcomb, and Littman, “Twitter Makes It Worse.”

140 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

REFERENCES Becker, H. S. “The Epistemology of Qualitative Research.” In Essays on Ethnography and Human Development, ed. R. Jessor, A. Colby, and R. Schweder, 53–71. Chicago: University of Chicago Press, 1996. Benkler, Y. “A Free Irresponsible Press: Wikileaks and the Battle Over the Soul of the Networked Fourth Estate.” Harvard Civil Rights-Civil Liberties Law Review 46 (2011): 311–382. Bennett, W. L., R. G. Lawrence, and S. Livingston. When the Press Fails: Political Power and the News Media from Iraq to Katrina. Chicago: University of Chicago Press, 2008. Boczkowski, P. J. “The Material Turn in the Study of Journalism: Some Hopeful and Cautionary Remarks from an Early Explorer.” Journalism 16, no. 1 (2015): 65–68. Cutcliffe, J. R., and H. G. Harder. “Methodological Precision in Qualitative Research: Slavish Adherence or ‘Following the Yellow Brick Road?’ ” Qualitative Report 17, no. 41 (2012): 1–19. Davis, A. “Journalist-Source Relations, Mediated Reflexivity and the Politics of Politics.” Journalism Studies 10, no. 2 (2009): 204–219. Entman, R. M. Projections of Power: Framing News, Public Opinion, and U.S. Foreign Policy. Chicago: University of Chicago Press, 2004. Galison, P. Image and Logic: A Material Culture of Microphysics. Chicago: University of Chicago Press, 1997. Geertz, C. The Interpretation of Cultures: Selected Essays. New York: Basic, 1973. Goffman, E. Relations in Public. New York: Harper Collins, 1971. Hamby, P. (2013). “Did Twitter Kill the Boys on the Bus? Searching for a Better Way to Cover a Campaign.” Discussion Paper D-80. Harvard University, Joan Shorenstein Center, Cambridge, MA, 2013. https://shorensteincenter.org/wp-content /uploads/2013/08/d80_hamby.pdf. Hannerz, U. “Studying Down, Up, Sideways, Through, Backwards, Forwards, Away and at Home: Reflections on the Field Worries of an Expansive Discipline.” In Locating the Field: Space, Place and Context in Anthropology, ed. S. Coleman and P. Collins, 23–42. New York: Berg, 2006. Hemmer, N. Messengers of the Right: Conservative Media and the Transformation of American Politics. Philadelphia: University of Pennsylvania Press, 2016. Hine, C. “Multi-sited Ethnography as a Middle Range Methodology for Contemporary STS.” Science, Technology, and Human Values 32, no. 6 (2007): 652–671. Howard, P., and N. Usher. “Why We Like Pinterest for Fieldwork.” Tow Center Blog/ Social Media Collective, July 14, 2014. https://socialmediacollective.org/2014/07/14 /why-we-like-pinterest-for-fieldwork/. Hughes, J., V. King, T. Rodden, and H. Anderson. “The Role of Ethnography in Interactive Systems Design.” Interactions, April 1995, 57–65. Humphreys, L. The Qualified Self: Social Media and the Accounting of Everyday Life. Cambridge, MA: MIT Press, 2018. Iyengar, S., and D. R. Kinder. News That Matters: Television and American Opinion. Chicago: University of Chicago Press, 2010. Keith, S. “Horseshoes, Stylebooks, Wheels, Poles, and Dummies: Objects of Editing Power in 20th-Century Newsrooms.” Journalism 16, no. 1 (2015): 44–60.

141 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

Kiely, K., and M. McCurry. “The White House Shouldn’t Be in Charge of Granting Press Credentials.” Washington Post, November 18, 2018. https://www.washington post.com/opinions/the-white-house-shouldnt-be-in-charge-of-granting-press -credentials/2018/11/21/e5e5bc76-ed10-11e8-8679-934a2b33be52_story.html. Knoblauch, H. “Focused Ethnography.” Forum Qualitative Sozialforschung [Forum: Qualitative social research] 6, no. 3, art. 44 (2005). https://doi.org/10.17169/fqs-6 .3.20. Latour, B. Reassembling the Social: An Introduction to Actor-Network-Theory. Oxford: Oxford University Press, 2007. Laviolette, P. “Introduction: Storing and Storying the Serendipity of Objects.” In Things in Culture, Culture in Things: Approaches to Culture Theory, ed. A. Kannike and P. Laviolette, 13–33. Tartu, Estonia: University of Tartu Press, 2013. Lawrence, R. G. “Campaign News in the Time of Twitter.” In Controlling the Message: New Media in American Political Campaigns, ed. V. A. Farrar-Myers and J. S. Vaughn, 93–112. New York: New York University Press, 2015. Le Cam, F. “Photographs of Newsrooms: From the Printing House to Open Space Offices. Analyzing the Transformation of Workspaces and Information Production.” Journalism 16, no. 1 (2015): 134–152. Leonardi, P. M. “The Ethnographic Study of Visual Culture in the Age of Digitization.” In Digital Research Confidential: The Secrets of Studying Behavior Online, ed. E. Hargittai and C. Sandvig, 103–138. Cambridge, MA: MIT Press, 2015. Lepore, J. “Does Journalism Have a Future?” New Yorker, January 29, 2019. https://www .newyorker.com/magazine/2019/01/28/does-journalism-have-a-future. Marcus, G. E. “Ethnography in/of the World System: The Emergence of Multi-sited Ethnography.” Annual Review of Anthropology 24, no. 1 (1995): 95–117. Marin, A., and B. Wellman. “Social Network Analysis: An Introduction.” In The SAGE Handbook of Social Network Analysis, ed. John Scott and Peter J. Carrington, 11–25. London: SAGE, 2011. Markham, A. “Fieldwork in Social Media: What Would Malinowski Do?” Qualitative Communication Research 2, no. 4 (2013): 434–446. Marwick, A. E., and d. boyd. “I Tweet Honestly, I Tweet Passionately: Twitter Users, Context Collapse, and the Imagined Audience.” New Media and Society 13, no. 1 (2011): 114–133. Millen, D. R. “Rapid Ethnography: Time Deepening Strategies for HCI Field Research.” In DIS '00: Proceedings of the 3rd Conference on Designing Interactive Systems: Processes, Practices, Methods, and Techniques, 280–286. New York: Association for Computing Machinery, 2000. Miller, T. J. “Surveillance: The ‘Digital Trail of Breadcrumbs.’ ” Culture Unbound: Journal of Current Cultural Research 2, no. 1 (2010): 9–14. Molnár, V., and A. Hsiao. “Flash Mobs and the Social Life of Public Spaces: Analyzing Online Visual Data to Study New Forms of Sociability.” In Digital Research Confidential: The Secrets of Studying Behavior Online, ed. E. Hargittai and C. Sandvig, 55–78. Cambridge, MA: MIT Press, 2015. Molotch, H. “Objects in Sociology.” In Design Anthropology: Object Cultures in Transition, ed. A. Clark, 19–36. New York: Bloomsbury, 2018. Nader, L. “Up the Anthropologist—Perspectives Gained from Studying Up.” In Reinventing Anthropology, ed. D. Hymes, 284–311. New York: Vintage Books, 1974.

142 U S I N G U N E X P E C T E D D ATA T O S T U D Y U P

Pink, S., and J. Morgan. “Short-Term Ethnography: Intense Routes to Knowing.” Symbolic Interaction 36, no. 3 (2013): 351–361. Rist, R. C. “Blitzkrieg Ethnography: On the Transformation of a Method Into a Movement.” Educational Researcher 9, no. 2 (1980): 8–10. Scott, J. “Social Network Analysis.” Sociology 22, no. 1 (1988): 109–127. Turkle, S., ed. Introduction to Evocative Objects: Things We Think With. Cambridge, MA: MIT Press, 2007. Ulrich, L. T. The Age of Homespun: Objects and Stories in the Creation of an American Myth. New York: Vintage, 2009. Usher, N. “Al Jazeera English Online: Understanding Web Metrics and News Production When a Quantified Audience Is Not a Commodified Audience.” Digital Journalism 1, no. 3 (2013): 335–351. Usher, N. Interactive Journalism: Hackers, Data, and Code. Urbana: University of Illinois Press, 2016. Usher, N. Making News at the New York Times. Ann Arbor: University of Michigan Press, 2014. Usher, N. “Newsroom Moves and the Newspaper Crisis Evaluated: Space, Place, and Cultural Meaning.” Media, Culture and Society 37, no. 7 (2015): 1005–1021. Usher, N. “Reshaping the Public Radio Newsroom for the Digital Future.” Radio Journal: International Studies in Broadcast and Audio Media 10, no. 1 (2012): 65–79. Usher, N., J. Holcomb, and J. Littman. “Twitter Makes It Worse: Political Journalists, Gendered Echo Chambers, and the Amplification of Gender Bias.” International Journal of Press/Politics 23, no. 3 (2018): 324–344.

Chapter Seven

SOCIAL MEDIA AND ETHNOGRAPHIC RELATIONSHIPS JEFFREY LANE

Many academics are excited about the breadth of social and behavioral data available online. Using data-scraping and machine-learning processes, scholars can examine millions of social media posts to explain patterns in how we live today. This powerful research offers all sorts of aggregatelevel insights. But what any one post actually means to its author is usually beside the point in such aggregated analyses. Most big-data studies take social media at face value without asking questions of the people behind the content or observing the wider social context. As an urban ethnographer, I believe that the personal and situational details matter, and I look for online meaning in the everyday lives I study. Such access to the lived meanings of online content—the most exciting parts of social media in my mind—requires building ethnographic relationships.1 Studying closely the same people in person and online is key to understanding why people aspire to be who they say they are online or what their posts on social media really mean. Social media data, if you are thinking like an ethnographer, cannot be skimmed off the top of complex social identities or collected apart from the people creating and engaging with this content. An ethnographic approach to social media recognizes that the content people share online is carefully curated,2 tailored to particular audiences,3 and tied to the most immediate and local circumstances of their lives.4 To gather this richness,

144 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

an ethnographer must come to know their subjects in person over time in relationships that carry over to social media. By participating in the lives of their subjects and observing the things they do—including on social media—ethnographers can relate to them and appreciate their lives, even given sometimes significant social differences between ethnographers and their subjects. In this chapter, I show how to use ethnography to examine social media content in the context of a person’s life. I focus on the Facebook feed of one of the young men I observed during my five-year ethnographic study of street life in Harlem.5 I focus specifically on one post that addresses his feelings about his education. You read it right, just one post. I want to take this deep dive on data this small because uncovering the embedded context of social media use is one of the most exciting parts of doing ethnographic research at this moment. Before I walk through my process of linking this young man’s Facebook post to his routines, life history, and personality, I start by describing how our relationship developed within my larger ethnographic study. THE ETHNOGRAPHIC STUDY

In October 2011, I met Christian (a pseudonym), an eighteen-year-old Harlem resident who identified as black and Jamaican. We met when I was approximately two years into my doctoral research on how teenagers in Harlem involved in adolescent street life used social media to manage their exposure to neighborhood violence and to access needed social support and resources. I became especially interested in the diverse and even contradictory social identities they appeared to live out online, including relationships and reputations based on the street and in school. I came to this topic after I had moved to Harlem with my girlfriend (now wife) to be near her master’s degree program and by chance met a local black pastor. I began to volunteer as an outreach worker in this pastor’s antiviolence ministry, which reached out to youth on the street and through social media. This became my first of several fieldwork roles based on what local youth organizations needed and what was emerging as a research interest around teenage street life, intervention, and social media. I started a computer lab for teens and senior citizens, volunteered at a major city-funded summer employment program for three consecutive summers, coordinated and

145 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

chaperoned college trips, and got involved in other ways. For me, as a white man in my thirties, these forms of engagement were the starting points of ethnographic relationships with black teenagers and their families. My project began very conventionally, as other urban ethnographers have taken similar steps to study the multifaceted lives of street-involved youth and the adults concerned about them.6 But to do this neighborhood fieldwork in the digital age, I had to exchange mobile phone numbers and social media handles with the same young people I met in the neighborhood. I discussed with neighborhood teenagers and the parents and family members of those who were minors that I had a research agenda on top of the community service I was doing and that I wanted to compare what I was seeing in the neighborhood to life online. Along with these in-person conversations, I messaged youth immediately after I sent a social media request to confirm that this was a connection I wanted to write about and that they had every right to delete me at any moment without explanation. Based on these consented ethnographic relationships, I composed field notes about what I observed of the same people in person and online. Usually, I jotted on my phone key things I saw or quotes I heard in the moment and later completed full, copious notes at the end of the day. I also took screenshots that I showed to the authors of the content to ask why they posted what they did and to compare their self-reports to what I knew from our relationships. I included these annotated screenshots in my field notes. Finally, I conducted recorded interviews to dive more deeply into the experiences of my primary subjects. My analytical goal was to compare what my subjects were saying and doing in person and on social media, using these two layers of data to complement and confirm one another.7 One of the first teens (along with his mother) to agree to my research was JayVon (also a pseudonym), who at the time was sleeping on an upstairs couch at Christian’s family brownstone. JayVon introduced me to Christian, who told me the first time we met that he was going to school to get his General Equivalency Diploma (GED) in lieu of a high school diploma. He said he was nearing the number of practice hours to take the Official Practice Test (OPT), and if he met the cutoff score on that test, he would be eligible to take the real GED exam. This route to high school completion was a major resource for the youth I studied, so I wanted to understand this process better. I had also assumed that people pursued a GED because they were not in school, so I wondered what a GED school would look like.

146 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

As Christian and I started spending time together around the neighborhood, his GED remained a prominent topic of conversation, and when I eventually expressed interest in observing at his school, Christian suggested I meet with his principal. Christian’s principal agreed to allow me inside the school as a graduatestudent observer, and after a formalized permission process, she introduced me to several of her staff, and we designated a set of observation days. I was now in a position to connect my observations of Christian in the neighborhood to his routines in school and to see how his Facebook content related to both settings. By the time I had begun to shadow Christian in school, he had passed the OPT and taken his GED, and he was continuing classes while awaiting his scores. In March 2012, he learned from his math teacher that he had indeed passed his GED exam. That same afternoon he posted the following message on his Facebook account: Blasting Juicy—Biggie Smalls This is for all the Teachers that Told me i’d never amount to Nothing! SUCK MY G.E.D.

It would be difficult for a machine to handle the complexity of this post, but a critical cultural scholar8 familiar with rap music might understand aspects of its meaning without any personal knowledge of Christian. This best-case analyst would recognize the reference to a rap song called “Juicy” by Biggie Smalls (also known as Notorious B.I.G.), which begins with this dedication: “To all the teachers that told me I’d never amount to nothing.” This opening sets up the song’s narrative about Biggie’s rise from poverty to financial and critical success in the music industry, which proves these teachers wrong and challenges “stereotypes of a black male misunderstood.” Christian’s addition of the words “SUCK MY G.E.D.” might suggest a personal critique of scoffing teachers and institutional racism in education, but the language leaves ambiguous whether Christian himself has received his GED and whether his view of the diploma is positive or negative. Such speculation would be about as far as this content analysis could get. As an ethnographer, I’m interested in these racial and structural aspects of social media. But by getting to know Christian personally and online, I was able to get at the context of his messages. I came to see this single Facebook post as an embodiment of the way Christian moved through his

147 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

school-day routines, with its roots in Christian’s past experiences in public school before entering the GED school. I also understood this post as part of an ongoing performance of Christian’s personality, for which he used social media as a staging area. I am now going to go through my study of Christian’s routines, life history, and personality to bring to bear the deeper, embedded meanings of the Biggie Smalls Facebook post. I start by placing this post in the context of Christian’s school-day routines at his GED school. To do this, I need to backtrack to my observations in the months leading up to the news of his passing score in March. STUDYING ROUTINES

When Christian wrote “SUCK MY G.E.D.,” he had become only the second student to pass the exam at a GED school in its first year of operation, making Christian a success story. As part of the city’s alternative schools and programs, Christian’s GED school served students whom the Department of Education considered “disconnected.” I started shadowing Christian’s school days about two months before he learned of his GED results and continued to shadow him as the focus shifted to applying to and getting ready for college. The way he moved through his day—like his Facebook post—made it clear that he was doing this program on his terms. Christian wanted a GED without becoming institutionalized in the process. Allow me to explain by way of illustration. On a warm day for January, Christian asked me to meet him at his house at 9 am, having planned ahead to miss at least some of first period, which started at 8:50 am. On subsequent visits to the school, I learned that because students had until 10 am to enter the building, many treated 8:50 am to 10 am as an arrival window. I arrived at Christian’s house at 8:58 am after first calling him at his request around 8:30 am, when he had just gotten up. I called his BlackBerry again as I reached the brownstone where he lived with his aunt (his guardian), his grandmother, two adoptive male cousins (ages twenty and twenty-one) and their mother and her sister, and, temporarily, JayVon. Christian needed “five minutes,” so I waited outside. I scrolled through Christian’s Facebook and took a screenshot to include in my field notes. He had just greeted his 1,690 Facebook friends, or the city, broadly: “Good Morning New York its Friday!” Next, he wrote:

148 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

“I Need a Girlfriend who will Faithfully wake me up in the Morning so i could be on Time =/.” The post garnered “likes” from a number of girls and a punchy exchange with one who suggested that what he really needed was an alarm clock. “Lmao [Laughing my ass off] Alarm Clocks wont love you back,” Christian responded. Christian exited under the stairs to the parlor floor at about 9:20 am, wearing Air Jordan 5 reissued sneakers; camouflage cargo pants; a faded black T-shirt that read “Beach Bum”; a blue, red, and white Red Tag jacket; and a maroon knit Polo hat. He carried a Nike backpack. He had a thin beard and two studs in his ears, and he smelled like Axe fragrance. We walked to school, which took roughly fifteen minutes. He looked repeatedly over his left shoulder. I asked if he was looking for the young woman he had jokingly said was “stalking” him after she had liked his post about needing a girlfriend. Christian replied that he was watching for “the Ds [undercover police] in they black Crown Victorias. They will stop you if something happen or you match the description.” He added that “they might mess up your whole day.” Christian led us through the school entrance to a small desk where a security guard oversaw students as they signed a cellular phone contract. This document explained the chancellor’s ban on cell phones and indicated that any device that was visible would be confiscated and held by the school for one week. One of Christian’s classmates, an African American mother pushing her child in a stroller, waited behind us. Christian signed the contract and entered with his phone. Christian led us to an elevator, ostensibly to call it for his classmate with the stroller. The dean derailed Christian’s plan and walked us to the stairwell. The dean, a tall African American man with salt-and-pepper hair, admonished Christian for attempting to break the rules, adding that he needed to pull up his pants. When Christian’s ringtone went off, the dean told Christian to check in his phone; after Christian returned, the dean explained that doing the right thing was part of being a man. He explained to both of us that Christian was doing his GED because he had deviated from the rules and was now back on track. The dean brought us to the third floor, where Christian’s second-period science class was already in progress. The science teacher, a tall white man with graying hair who misplaced his glasses more than once, stood at the board. A middle-aged white woman,

149 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

the college counselor and social worker, was also present, and she reminded Christian about their upcoming meeting. The science teacher told us to get two extra desks from next door. We entered Christian’s first-period math classroom, where a casually dressed white teacher in his twenties asked Christian where he had been that morning. “I had to meet up with him,” nodding to me. I had already been introduced to the math teacher, and we had chatted earlier about the needs of his students. Now I was embarrassed being led around by Christian and blamed for his being late. The math teacher, half-smiling, told Christian that he always had an excuse and then gave us permission to take the desks. We added the desks to the island in the science classroom where eight students were already seated; some were taking assessment tests on Assessing the Adequacy of Visual Information and Work and Energy while others were simply sitting and seemed disinterested. The assessments were in ragged and dated workbooks; the students were not permitted to write in them, and they were given out only in the classroom. Before settling down, Christian chatted briefly with a young woman, flirting with her and complaining about the dean: “You gonna have my man [me] take the stairs?” Christian seemed to be enjoying the attention of my shadowing. Then Christian got to work and said almost nothing for the rest of the period, looking up only when his math teacher came in to hand Christian the worksheets he had missed. Christian appeared to be managing the material without difficulty. Some students seemed to become frustrated with the science teacher. When one student’s phone rang and the teacher told her to put it away, she complained that the students were not learning anything. Other students said that the questions were difficult and then consulted each other or the answer section when the teacher told them he could not help because they were taking an assessment. With about ten minutes remaining in the period, Christian said he had finished, which seemed to cue the teacher to call an end to the allotted time. The teacher read the answers aloud. Some students asked for explanations, but the teacher did not pause and indicated that the assessments would tell him where to focus. One student pushed back, maintaining that he was supposed to be a teacher. Third-period language arts proved more interactive and, at first, chaotic. The teacher, a middle-aged Puerto Rican woman, wore dark sunglasses, apparently out of form for her, since as students entered, some called out

150 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

ideas about why she was wearing them and what had happened to her. She informed the class that nobody had punched her. The banter eventually died down, and students settled into a discussion of a Paul Tillich quote about cruelty toward others being also cruel toward oneself. After this opening discussion, the teacher segued to the next activity by distributing a handout on characterization in novels. Three separate student volunteers read consecutive parts of the handout aloud, including Christian, who appeared to be the only comfortable reader. During this particular morning and other school days, I had an opportunity to see for myself how Christian experienced the same school system Biggie Smalls had critiqued and how Christian had done what he needed to do to get his GED. I saw that within a school environment with varied teaching and student competence, inadequate learning materials, chronically low attendance, and other problems, Christian had seemed to identify the staff who would be most helpful to him, especially his math teacher, who he said was “the best teacher at the school.” The math teacher also felt that Christian was an exceptional student. After we had spoken in his classroom one day, we met outside the school for a more candid conversation. The math teacher had little sense of his students outside of class and as per Department of Education policy was not permitted to follow students on social media. He was curious about my observations in the neighborhood and online. The math teacher said that after Christian had come very close to a passing score on his initial OPT, he had persevered and done the work required to pass the next predictor test. According to the math teacher, this was unusual. Most students mistakenly believed that after they came close like that, they do not need to do the work. After hearing what his math teacher had to say about Christian and learning that his college counselor was also a major booster, I began to consider the control and ease with which Christian seemed to move through his day—and this slice of the school system more generally. He was working the system, I thought. To reflect further on my fieldwork, I composed a memo in response to the Biggie Smalls post that I titled “Christian gets a G.E.D. his way.” Christian’s post signaled his version of resistance and a sense of self-confidence. In this memo, I analyzed the morning just described, and I realized that Christian had flowed9 through his schedule. He got what he wanted out of the first hour of his day without succumbing

151 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

to the temporal expectations of his school (or this researcher). He received instant female attention and validation for his wish for a girlfriend instead of an alarm clock to wake him. His lateness was met with no penalty from school or from me. By taking his time that morning, Christian secured “temporal autonomy” over the organizational demands10 of his school. When he finally arrived, he moved just as he pleased, although he received mild chastisement from his dean and later his math teacher, who smiled in a half-conspiring manner. Their concern, however, was consistent with the goal Christian himself shared of passing the GED. Even when his phone rang, the device was not taken away for a week as per city rules. When he entered his science class, he took his time to chat with a young woman before getting to work. His math teacher brought him the worksheets he had missed from first period—not the other way around. Christian worked efficiently in science class and ignored the frustration and struggles of his classmates, who could not seem to get started or needed to be taught the material. Christian completed his assessment with ten minutes to spare. He then read aloud proficiently in his third-period class, which validated his competence in contrast to that of the other two struggling readers. Because Christian was in control, his morning appeared satisfying, with the right amount of time for play and work. The Biggie Smalls post reflected Christian’s attitude about the best way to get an education given the system he was in and the options available to him. I heard this same sentiment in Christian’s account of an incident of pushing that involved students from a second school, which was cohoused in the building: The kids from [name of the other school] always try to front on us [GED students]. . . . Mind you, you’re in an alternative school. Where are you going? You not going to Harvard. You’re going to the same [City University] school as me. I’m just gonna get there quicker.

But what about the question of disbelieving teachers? That did not seem to apply here. He appeared to have advocates and maybe even enablers. Even the dean was supportive by being stern with Christian. This, of course, begged the question of why Christian, who seemed more capable than his classmates, was getting a GED in the first place. Had he deviated from the rules as his dean had claimed? What happened at his previous schools?

152 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

STUDYING LIFE HISTORY

After I had observed Christian’s routines at his GED school, I wanted to understand what had taken him off the path of regular schooling and how his previous schools had failed him, if indeed they had. For Christian, growing up in Harlem but also on social media platforms, the answers to these questions were interwoven with the identities that he projected online. Young people especially use their social media feeds to curate identities that are in flux and under construction11 and that change situationally.12 Feeds indicate who people want to be at particular moments and for certain reasons, identities about which ethnographers can ask questions and investigate. I often sat with Christian and other young people in my research to review their social media profiles. Sometimes I even put together a packet or PowerPoint presentation on what appeared to me to be various, sometimes clashing identities so we could discuss these different versions of self and try to understand why and for whom they were created. Looking through Christian’s profile pictures on his Facebook account, I saw photograph after photograph of Christian smiling or preening for the camera. His photos seemed designed to conjure his attractiveness to the opposite sex. In nine of the forty-four photos, he posed with a girl, often embracing or touching her in some way. When I asked how these images compared to his earlier profiles on other sites, Christian mentioned his MySpace profile, which he said he tried to find for me but was unable to remember his password. I asked what he remembered and why he wanted me to see it. Christian recalled his MySpace profile picture: he said he had two blunts in his mouth and was holding two guns—the sort of stereotypical image of black masculinity that was detailed in Biggie’s lyrics. Christian called the image “embarrassing.” Our discussion of MySpace provided a point of entry into Christian’s earlier school days. Beginning with kindergarten, the GED school was the ninth school he had attended, and several had been deemed failing and either had closed or were in the process of closing during my fieldwork period. Christian’s education coincided with a local and national shift in public education toward “small schools.” New York’s large public schools, including some that Christian had attended, were being replaced by or divided into smaller, autonomous, and sometimes specialized schools and

153 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

programs, like Christian’s GED start-up. These small learning communities (SLCs) were considered safer and more effective and accountable learning environments for high-needs students, teachers, and administrators alike.13 Christian said his school troubles started at a large (now closed) middle school in West Harlem that was chaotic and unsafe. He told me he had joined his first gang in seventh grade, recalling its name and three-letter acronym, and said he had acted out and cut class often. As a ninth grader at his first high school, a midsized public school in West Harlem, Christian said he had fought in front of the building against older boys from a gang across the street. This had resulted in a “safety transfer” to a large public high school in the Murray Hill section of Manhattan before it was phased into SLCs. He spoke of difficulty learning there because of routine fights and pranks in school and students coming in and out of class. In the next breath, he added, “I ain’t gonna lie to you, I was running with the wrong crowd.” Christian said he received a superintendent’s suspension in his tenth-grade year for his part in a fight he characterized as a “riot” between black and Latino students. After finishing the school year at a suspension site, Christian returned to the school in Murray Hill before drifting and later enrolling at two alternative schools, including a GED program in the Bronx, prior to his current GED school. Christian spoke of his own agency in the disruption of his education, but his experience was of a child unprotected from gangs and violence inside and outside of school. Research tells us that black public-school students who are seen as fighters or gang members find that teachers and other school or neighborhood adults perpetuate this label. This further alienates them from institutions, and their tough reputation may become their only remaining protection.14 Most significant though was that Christian’s school troubles were family troubles. His middle-school issues coincided with the tragic loss of his mother when he was eleven. Christian said his father had left the family by the time he was two and added that his father had never paid child support. Although his mother’s sister, Christian’s aunt, became his legal guardian and took over his care with other female family members in his household, Christian said he felt largely unsupervised. He brought up his early access to guns as an indication. The basic support structures of childhood—school and family—were absent for Christian.

154 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

This exploration into Christian’s life history put his Facebook post in yet another light. Christian’s traumatic childhood and the neighborhood and school violence he had experienced revealed just what a monumental achievement getting his GED really was. As a child and adolescent, Christian had clearly not been protected by many of the adults around him, but at his ninth school, he had found the traction he needed to complete his high school education and potentially move on to college. Christian was proud of himself, and his interpolation of Biggie Smalls with the message to “SUCK MY G.E.D.” was not a flat rejection of the “education gospel” but a way to punctuate his own success in the face of tremendous adversity. Christian valued getting his GED, which he made clear in another statement on Facebook: “I FINALLY FINISHED SCHOOL IM TOO HYPE RIGHT NOW!!!!” STUDYING PERSONALITY

A basic feat of ethnography is showing the people in a study. It is being able to capture and convey how people look and sound and what dispositions they enact. Social media provide ethnographers with access to the selfexpressions that their subjects author. In the Biggie Smalls post, Christian linked both Biggie’s own story of beating the odds and his critique of public education to the personal facts of his life—and even to a general attitude and modus operandi. But there’s still more to parse. By writing “SUCK MY G.E.D.,” Christian changed a demand for oral sex into his way of showing the teachers who doubted or failed to protect him that he could graduate after all. None of his past or present teachers, however, were part of his Facebook network, which included mostly peers known to him personally or through the site. By sexualizing the statement he made about his school success, he adopted a style of communication he used to play and push boundaries with his friends and peers. I recognized Christian’s brand of offcolor humor in the Biggie Smalls post because it sounded like other posts he wrote, such as “WHY YOU PREGNANT POSTING PICTURES IN YA PANTIES TALKING BOUT CALL YOU . . . IM SORRY TO SAY SHORTY THEM DAYS ARE OVER.” I heard Christian say these kinds of things to his friends in person, but being connected to his feed allowed me to see a performance he often repeated for his peers. His message to “SUCK MY G.E.D.” was consistent

155 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

with edgework that was highly gendered and that at least once crossed over into physical conflict with girls. Allow me to explain. From Christian and school staff, I learned of another altercation in addition to the shoving incident with students from the other school. The second incident had started in the lunchroom when a female classmate had deliberately stepped on one of Christian’s new Timberland construction boots, scuffing the toe. Christian then retrieved a broom used to sweep the floor and put it in the young woman’s hair. She punched him several times. Christian grabbed her wrists, pulled down her arms, and slapped her across the face twice. The two were separated and taken to different floors of the building. After the fight, the female classmate called two female cousins who were nearby. They entered the school building and pursued Christian with a pointed object, most likely a key. They tussled briefly before the fight was broken up. Christian was scratched along his forearms and hands. Reports were taken. While still inside school at 12:52 PM, Christian posted the following on his Facebook wall: “[Name of public housing units] Bitches Tried to Jump me in School Today LMAO Held it down like a nigga from [the avenue Christian’s lives off of] suppose to.” Christian later learned he had been suspended and left school. “Outside Feeling Great & Looking Better SUSPENDED for the rest of the week so i got another Vacation ahhhh Life is Great lml [laughing mad loud],” he posted at 1:17 PM. About an hour later, at 2:14 PM, he posted that he was “OMW [On my way]” to where the female classmate lived. But no confrontation ensued, and when I asked about his intentions, he downplayed the threat and said that he had run into someone else with whom he decided to hang out. I took this explanation to mean that his posts had been mostly posturing for his growing audience of commentators online. These three posts and a fourth post two days later, which had insulted the female classmate without naming her, garnered forty-one comments from a range of Christian’s Facebook friends. Some were classmates who had seen the incident; others were classmates who had heard about it, neighborhood friends, and Facebook friends Christian had never met personally. Eventually, the two antagonists made up, and each wrote nice things about the other on Facebook. Without minimizing the violence, the altercation was also ridiculous for how it started and evolved into Christian’s tongue-in-cheek Facebook narrative about his attack, suspension-vacation, and pursuit of the young women.

156 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

By writing “SUCK MY G.E.D.,” Christian used for this important life achievement the same language of boy-girl drama he used to validate more mundane events. This was Christian’s way of getting attention, especially online with the expansive audiences of social media. But in our private conversations, Christian addressed insecurities and feelings of vulnerability with young women that were contrary to the cavalier attitude he seemed to relish publicly, on his feed or in person. Even in his announcement that he would be graduating, Christian adopted the framework of sexual drama that he used to generate traffic on and interest in his feed. By getting to know Christian both personally and through his social media, I was in a position to report on the public parts of his personality in addition to the more candid and backstage or offline aspects of these performances. This is what ethnography can do with social media: show how people make an impression, what they do to get a rise out of others, and what is behind the front. In the final section, I elaborate on my relationship to Christian. PUTTING IT TOGETHER

I wanted to take you through my fieldwork with Christian to show how ethnography can be used to contextualize social media content, and vice versa. Rather than treating content on its own terms, I suggest grounding this material in the perspectives of its authors. Ethnographic relationships allow researchers to study the lived context in which social media use is embedded. By studying Christian offline and online, in public and private, and across settings (neighborhood, school, etc.), I was positioned to link his Facebook activity to his daily routines, life history, and public personality. I was able to evaluate Christian’s online material in terms of observed behaviors and disclosures offline and, in turn, his offline life in terms of how he presented online. I used as an extreme example in this chapter to show just how deeply an ethnographer can go to understand the meaning of even a single Facebook post in the context of a multifaceted relationship. Bringing social media into traditional ethnographic relationships introduces a new suite of data (written posts, profile photos, etc.) with which to build a sense of others.15 In this chapter, I presented three ways to go about studying social media ethnographically. First, I showed that by following a subject’s feed while shadowing their routines, the ethnographer can see exactly how social media fold into the rhythms of their day and how they

157 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

move through the world. Second, I explained that I used subjects’ profile photographs and other content as elicitation tools to ask about life history based on earlier presentations of self. Third, I discussed social media as a staging area for the public parts of a subject’s personality that, when tracked over time, reveal patterned forms of attention seeking distinct from the private self. These ways of studying social media are all made possible by getting to know a subject in the context of an ongoing relationship. What then to make of the ethnographic relationship? “There are no hard and fast rules” in these relationships between human beings.16 Relationships between ethnographers and their subjects are truly their own things, although they may resemble friendships, apprenticeships, and other familiar relational forms.17 These relationships take on unique dynamics. With Christian, there was a contagious aspect to our relationship. The same flow with which he appeared to move through his day characterized our connection, and I hitched my study to his way of being, which embarrassed me when I felt I was almost helping him break the rules at his GED school. But his ease and charm facilitated my fieldwork. Some of the richest ethnographic material comes from the most charismatic subjects. These natural performers seem to realize that ethnographers are observing and asking questions not only for themselves but also for a greater audience. Ethnographers become an outlet for such people, who thrive by telling their story of who they think they are or want to be. It is important to capture these public selves and the interactions around them, but the challenge is not to become enamored and miss the complexity and less flattering parts of their lives. For extroverts and flowing personalities like Christian, how does one get past the public self and performance? As extroverted as Christian was, I found that he was also quite introspective in private. It was important for us to carve space for one-on-one discussions along with the fieldwork in public. I used my car frequently in my fieldwork. Christian and I often took drives around the neighborhood, improvising destinations or running each other’s errands. The car gave us the chance to talk in a bubble. Over time, I became more instrumental about these study opportunities and asked Christian if we might record some of our car chats. Part of what makes ethnographic relationships both different from normal relationships and uniquely complicated is that the ethnographer turns around and reveals the intimacy granted as data. What’s more, the

158 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

ethnographer must remain skeptical and critical of the very people who have generously provided access to their lives in order to pursue more than their version of events. Ethnographers observe people to see if their behavior matches what they say they do and seek corroboration and thirdparty perspectives. Ethnographers must make peace internally with this data fixation, but they can also be more nearly transparent by sharing their writing and talking with subjects about their research. To this end, I find the inclusion of social media in ethnographic relationships useful because it establishes a dynamic in which the ethnographer and subject talk about public selves that are distinct from the whole person and that are specially crafted for reasons both the ethnographer and the subject are likely to find interesting. This dynamic allows for a level of study transparency because the ethnographer is constantly asking about these outward expressions and subjects come to expect this attention. In this sense, social media are tools for the very relationship between ethnographer and subject.

NOTES 1. M. Duneier, P. Kasinitz, and A. K. Murphy, eds., The Urban Ethnography Reader (New York: Oxford University Press, 2014). 2. K. Quinn and Z. Papacharissi, “The Place Where Our Social Networks Reside: Social Media and Sociality,” in Media and Social Life, ed. M. B. Oliver and A. A. Raney (New York: Routledge, 2014), 159–207. 3. N. K. Baym, Playing to the Crowd: Musicians, Audiences, and the Intimate Work of Connection (New York: New York University Press, 2018). 4. J. Lane, The Digital Street (New York: Oxford University Press, 2018); D. Miller, Social Media in an English Village: Or How to Keep People at Just the Right Distance (London: University College London Press, 2016). 5. Lane, The Digital Street. 6. Duneier, Kasinitz, and Murphy, The Urban Ethnography Reader. 7. M. L. Small, “How to Conduct a Mixed Methods Study: Recent Trends in a Rapidly Growing Literature,” Annual Review of Sociology 37, no. 1 (2011): 57–86. 8. Cf. A. Brock, “Critical Technocultural Discourse Analysis,” New Media and Society 20, no. 3 (2018): 1012–1030; R. Maragh, “Authenticity on ‘Black Twitter’: Reading Racial Performance and Social Networking,” Television and New Media 19, no. 7 (2018): 591–609. 9. G. A. Fine, “Organizational Time: Temporal Demands and the Experience of Work in Restaurant Kitchens,” Social Forces 69, no. 1 (1990): 95–114. 10. Fine, “Organizational Time.” 11. d. boyd, It’s Complicated: The Social Lives of Networked Teens (New Haven, CT: Yale University Press, 2014). 12. Lane, The Digital Street.

159 S O C I A L M E D I A A N D E T H N O G R A P H I C R E L AT I O N S H I P S

13. Cf. J. Hochschild and N. Scovronick, The American Dream and the Public Schools (New York: Oxford University Press, 2003). 14. L. J. Dance, Tough Fronts: The Impact of Street Culture on Schooling (New York: Routledge, 2002); N. Jones, Between Good and Ghetto (Piscataway, NJ: Rutgers University Press, 2010). 15. On the value of collecting and working with multiple types of data, see M. L. Small, “How to Conduct a Mixed Methods Study: Recent Trends in a Rapidly Growing Literature,” Annual Review of Sociology 37, no. 1 (2011): 57–86. 16. Duneier, Kasinitz, and Murphy, The Urban Ethnography Reader, 768. 17. Duneier, Kasinitz, and Murphy, The Urban Ethnography Reader, 768–769.

REFERENCES Baym, N. K. Playing to the Crowd: Musicians, Audiences, and the Intimate Work of Connection. New York: New York University Press, 2018. boyd, d. It’s Complicated: The Social Lives of Networked Teens. New Haven, CT: Yale University Press, 2014. Brock, A. “Critical Technocultural Discourse Analysis.” New Media and Society 20, no. 3 (2018): 1012–1030. Dance, L. J. Tough Fronts: The Impact of Street Culture on Schooling. New York: Routledge, 2002. Duneier, M., P. Kasinitz, and A. K. Murphy, eds. The Urban Ethnography Reader. New York: Oxford University Press, 2014. Fine, G. A. “Organizational Time: Temporal Demands and the Experience of Work in Restaurant Kitchens.” Social Forces 69, no. 1 (1990): 95–114. Hochschild, J., and N. Scovronick. The American Dream and the Public Schools. New York: Oxford University Press, 2003. Jones, N. Between Good and Ghetto. Piscataway, NJ: Rutgers University Press, 2010. Lane, J. The Digital Street. New York: Oxford University Press, 2018. Maragh, R. “Authenticity on ‘Black Twitter’: Reading Racial Performance and Social Networking.” Television and New Media 19, no. 7 (2018): 591–609. Miller, D. Social Media in an English Village: Or How to Keep People at Just the Right Distance. London: University College London Press, 2016. Quinn, K., and Z. Papacharissi. “The Place Where Our Social Networks Reside: Social Media and Sociality.” In Media and Social Life, ed. M. B. Oliver and A. A. Raney, 159–207. New York: Routledge, 2014. Small, M. L. “How to Conduct a Mixed Methods Study: Recent Trends in a Rapidly Growing Literature.” Annual Review of Sociology 37, no. 1 (2011): 57–86.

Chapter Eight

ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS IN THE DIGITAL AGE WILL MARLER

Conducting ethnographic research with people experiencing homelessness is no easy task. Ethnography asks its practitioners like myself to work their way into a social setting and, to a degree and for some time, experience life alongside the people within it.1 Doing so requires that at least some of the people within the setting welcome us and trust us enough to let us in on their thoughts. It requires us to gain some comfort with the site in order to be able to come back to hang out again and again. To study people living on and off the street, we must wade into social environments where trust may not easily be granted. Field sites can be highly public and subject to policing and inclement weather. Research relationships can also be less reliable than those in studies with housed people. The precarity of people’s lives when they are homeless is likely to interrupt data collection and halt budding relationships in their tracks. The challenges are enough to overwhelm on their own. They may be all the more daunting for graduate students with all their ethnographic experience ahead of them. In this chapter, I offer some advice for ethnographers of hard-to-reach populations, drawn from my experience studying homelessness in Chicago as a graduate student. How do researchers meet people living on and off the street? How do they gain people’s trust and broach the subject of research? How do they collect observational data over the long term with a population likely to drop in and out of their physical reach?

161 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

And how might these challenges be approached differently given the rise of social media and smartphones as tools and virtual research sites unavailable to ethnographers in earlier decades? The original challenges—getting into a field site, gaining trust, collecting data—are the subject of numerous writings to which the reader should also turn.2 There are discussions of method in influential ethnographies on homelessness.3 What I propose to add is twofold. First, in the spirit of the Research Confidential volumes, I highlight my naïveté as well as what I see as my accomplishments as a novice ethnographer of homelessness. Other beginner ethnographers should feel encouraged by my mistakes and be prepared to learn from their own. Second, I hope this chapter can elucidate the digital potential for ethnography with hard-to-reach populations, including people living on and off the street. Many are surprised to learn that people experiencing homelessness also have smartphones and social media accounts.4 What might this mean for our ethnographies? For example, could we reach out to this hard-to-reach population first online? Could we maintain virtual contact when out of physical reach with our participants? What would it mean for our relationships and our data and for the stories we tell about homelessness and the people who experience it? The organization of this chapter follows the development of my fieldwork over three years of graduate study. I learned early on that approaching people first on street corners was an uphill battle and that I would benefit from a more structured environment in which to get to know people experiencing homelessness and develop my approach to building relationships with people in their situation. I describe the lessons of pacing my approach and developing trust with respected members of the scene, lessons I learned over the year I spent visiting the nonprofit I call People First, a social services agency on Chicago’s North Side. These lessons followed me as I expanded my field site beyond the agency and into the neighborhood where the agency is located, which I call Waterside. Around the neighborhood, I learned to be consistent in my presence in the field and to mix up my observations to include a variety of public spaces, from parks and cafés to libraries and areas beneath viaducts. Throughout, I was testing out and learning lessons on the role of smartphones and social media for collecting data and keeping up with my participants, which is the subject of my final section in this chapter.

162 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

Though this chapter is a reflection on the method more than the substance of my scholarship, a brief note is in order on the research questions that animate my research and the way in which they developed. As is common to ethnographic projects, I entered the field with broad interests in my keywords: the internet, smartphones, and urban poverty and inequality. As I observed people’s daily lives with these keywords in mind, I began to consider what themes were developing in my field notes. I focused on two areas of inquiry. First, I wanted to know the role of smartphones in digital inequality: that is, whether smartphones are alleviating or reproducing the disadvantages that low-income and minority communities experience in their attempts to access and take advantage of the internet.5 Second, I wondered how social network sites such as Facebook influence social support provision among members of disadvantaged urban communities, particularly during periods of homelessness. I adjusted my subsequent observations and interviews to explore the related dynamics better. This kind of iterative research approach—adjusting observation to theory, and vice versa—is called grounded theory and is common to qualitative research.6 Now on to the task at hand. If I do my job in this chapter, aspiring ethnographers will lose some of the hesitations they had when they started reading. They will come away with an appreciation for the productive challenges of ethnographic research on homelessness in the digital age. And they will be ready to enter the field, make their own mistakes, and pass on what they have learned to the crop of aspiring ethnographers that follows. ON THE CORNER

Not every first idea is the right one. By sharing how my ethnography got off to a false start, I illustrate that there are advantages to finding a field site through different tacks. Particularly for research with disadvantaged populations, there may be lessons in failed attempts to enter the field. It was my first year in the PhD program, and I was enrolled in a seminar on field methods. Our assignment was to find a field site where we could observe and conduct interviews. I wanted to learn how people living on the streets in Chicago appealed to one another and the public for aid in physical spaces and online. It made sense to me to start with people who were asking for help in the most public of urban spaces. As a young white man of middle-class background, I expected that trust would develop slowly

163 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

between myself and those among the urban poor I could meet on street corners, particularly African Americans. Yet I knew of researchers from privileged backgrounds who had been successful in developing trusting relationships in such a way.7 With this in mind, I approached two people who were asking for change on different street corners in the city. The inner-city street corner is generally a male-dominated space.8 As such, I ended up interacting with men in these exploratory interviews. On both occasions, I dropped a dollar into the man’s cup and asked him if he would answer a few of my questions. One of these encounters took place close to a university campus. The man was black and thickset and sat on a stack of milk crates. He responded to my request with a shake of his head. “Another one of these? . . . Alright, let’s do it.” He agreed to let me record the interview on my phone, and neither of us mentioned payment. The man told me he often had students approach him for interviews. Though we talked for half an hour, the conversation felt scripted. I approached another man asking for change on a street corner, this one downtown. White, younger, and unshaven, the man held out a cup from behind a cardboard sign. He responded briefly to a few of my questions about the foot traffic on his corner but quickly closed up when my inquiries turned personal. The street corner was busy, and I was standing while the man sat. The scenario felt awkward and overly public for a personal interview. Not wanting to draw additional attention, I kept my notebook in my bag, remembering what I could to paraphrase later.9 I thanked the man and left shortly after. Neither interview left me feeling confident that I would learn much from approaching people I did not know in the highly public setting of the street corner. I did not have the fortuitous prop of one prominent sociologist of the street corner, Mitch Duneier, who introduced himself to a book peddler by pointing out his own publication among those being sold.10 Neither did I have someone to vouch for me who knew the man on the corner, as Duneier had. This is not to say that a research relationship could not have developed in other ways, had I continued to visit and find ways to make the situations less awkward. I could have offered to buy the men a coffee somewhere in the neighborhood. I could have returned day after day, showing myself to be dedicated. In the end, though, I turned my efforts to a site that could provide more structure for my efforts to get to know people struggling to keep shelter over their heads.

164 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

AT THE AGENCY

Shortly after my attempts to interview men on street corners, I contacted the agency I call People First to ask permission to hang out at their office. In this section, I describe how I took advantage of the relatively structured environment of this nonprofit social services agency to learn how to engage with people who are unstably housed. Over a year of visiting the agency once or twice a week, I learned how to pace my approach and minimize the degree to which I stuck out. I found that developing rapport with respected members of the unstably housed community could lead others to trust me as well. These lessons would follow me as I expanded my field site to include a public park in the neighborhood where unhoused people gathered. Further, learning to gain trust through in-person interactions provided the basis for reflecting on how to incorporate digital methods into my research. When I first came into contact with People First—and even more when I visited—the agency stuck out to me as a promising field site. I had learned of the agency through a program connecting unemployed clients to community members to develop job skills. I learned the agency had a waiting room resembling a lounge where clients and visitors could enjoy a free meal and spend some time off the street. In this sense, the agency was—and remains—a site not only where community members could seek social services and job programs but also where people experiencing homelessness could hang out and socialize. Most of the clients and visitors were African American and in their fifties or sixties, though in the lounge there was often a mix of black and white visitors, with fewer people of Hispanic and Asian descent present. A quarter of clients were unhoused, and a majority of those remaining were in subsidized housing. They were generally unemployed or underemployed and were supported by government assistance for one or more of these areas: housing, health care, food, transportation, and cell phone service. The agency made sense as a site at which to study homelessness in the digital age. People coming in off the street could use one of the six computers the agency made available in the lounge. The electrical outlets available in the lounge were reason enough for many people to visit the agency: as I would learn, people without stable housing are constantly in search of places to charge their phones. Before visiting the agency, I sent an email to the executive director expressing my interest in conducting interviews and observations. She

165 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

responded favorably and invited me to attend a meeting of staff and clients. After receiving approval from my university’s Institutional Review Board to do research involving human subjects, I attended the agency meeting, introducing myself to the agency staff and around twenty of the agency’s clients. I described in broad strokes my interest in poverty and communication and my intention to gather stories and perspectives from people willing to offer them. The audience offered me polite applause, and the meeting concluded. I milled about the room and introduced myself to clients of the agency as they prepared to leave, gathering up their grocery bags and stacks of winter clothing. Some politely declined to talk. Others were receptive and shared bits of their stories. I took out my notebook and wrote down in shorthand some of what I was hearing and observing (called jottings) to flesh out later.11 I noticed I was getting more responses from white clients than black ones, in line with the experiences of other white researchers who have worked to gain trust with low-income African Americans in the inner city.12 I also found women to be generally more receptive than men on this first occasion. I suspect that the street codes of masculinity made men more hesitant than women to speak to a male stranger (and a nosy one at that).13 The room cleared out, and I sat in a quiet corner to flesh out my notes from the first day in the field. The feeling was exhilarating: I was in. I began visiting People First a few times a week. Over the first month, I approached people as I had the day I announced myself at the meeting. I started by introducing myself to people I found hanging around the waiting room and mentioning my research. Would they answer some questions of mine about their experiences with being homeless? The approach was fruitful—on occasion. Again, I found that white clients were most willing to talk. I sensed that approaching clients coming in off the street, particularly black men, put them off. Something felt too direct. The responses I did get felt stiff. Across the diversity of people with whom I spoke, I began to feel I was an additional burden on their already tiresome day. Most of these interviews ended shortly after they started, with little interesting written in my notebook upon which to reflect. To illustrate what I mean, consider an exchange I had early on in my research with someone who later became a more active participant in my study. Rodney was a middle-aged black man who slept at a neighborhood shelter and visited the agency to eat lunch. Rodney and I were still strangers when I struck up conversation with him one day. He had taken a seat across

166 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

from me at the lunch table. I said hello and introduced myself as a graduate student doing research on poverty. He nodded politely and fielded some of my questions, muttering a few words in reply. His body language said as much as his words, but I was not reading his signals. He looked down at his hands, a smartphone in each. I thought it was interesting that he kept two phones. He glanced up only briefly to address me. I had asked him whether he used Facebook on either of his phones. Rodney furrowed his brow and shot back his response: “Yeah, so what of it?” I apologized for being nosy and looked down at my notebook. Rodney gathered his things and moved to a seat across the room. I decided that I needed to switch up the approach I had taken with Rodney and others in those first weeks of research. With advice from an academic mentor, I returned to the field with the mission of adapting my style of interaction to what I noticed was common among the people who frequented the agency. To start, I noticed that people were addressed with suspicion when, despite being strangers, they were eager to talk and ask personal questions of others. “You stay on your side of the fence, I’ll stay on mine” is how a regular client of the agency described a common rule of interaction. Among black clients I talked to, “dippin’ ” referred to asking personal questions of a stranger. I realized that I had been “dippin’ ” in my forward approach of attempting to learn about the lives of people I had not gotten to know. With these lessons in mind, I began to spend less time asking for interviews up front. I spent more time minding my own business, sitting at the lunch table with a magazine or browsing the internet on an available computer. I joined in on conversations when it felt appropriate, contributing to everyday talk about sports and the weather. When talk turned to topics in which I was interested—such as what it took to survive day to day on the street and what applications people used on their phones—I asked people to elaborate. If I felt there was something I wanted to write down, I pulled out my notebook and used it as a prop to introduce myself as a student working on a research project. I informed people I would not attribute their comments to them by name. I honored people’s requests not to be included in the research and, on those occasions, put away my notebook to continue the conversation undirected by my research questions. When my conversant seemed particularly forthcoming or had subsequent conversations with me, I asked if I could record our conversation. I received permission less often to record conversations than to continue writing.

167 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

This changed with some of my participants as I got to know them. Others agreed to be included in my study and continued to share intimate details of their lives, though they preferred not to be recorded. At the same time that I switched up my interactional approach, I took steps to minimize other markers of difference. I noticed how certain behaviors in the space of the office distinguished clients and more privileged members of the scene, including staff, interns, and volunteers. I started signing in at the front desk when I arrived, which was required of clients and walk-ins but not interns and volunteers. I partook in small portions of the donated food when it was ample, and I sat at the lunch table where clients ate. I used the bathroom reserved for clients and walk-ins rather than the staff bathroom. I also dressed down. I began to wear older, looser clothes and to keep my jacket on, even inside on warmer days. As a younger white man with horn-rimmed glasses hanging out among a largely older, primarily black clientele, I had few ideas that I was passing as a low-income member of the Waterside community. The intent was not to pass as a client of the agency to do covert research but to help along the process of building trust by minimizing markers of difference.14 Over a few months of visiting the agency once or twice a week, I found it less awkward to start or join conversations with clients I did not already know. As I got comfortable, I found myself developing rapport with more of the people who spent time at the agency. My rapport with people who had the respect of others would end up being a tremendous advantage to my reputation with people I had not met. One man who came to enjoy talking to me was Jessie, a black man in his sixties who was a former client of the agency. Jessie returned to the agency most days to eat his lunch in the lounge. He enjoyed telling anyone at the agency who would listen of the transformation he underwent from an addict to a sober man with steady work. I believe Jessie developed a fondness for me because I was someone who wrote as he spoke, someone he perceived to be recording the life lessons Jessie had gained from experience. There was little that I pulled from Jessie’s oratories that directly aided my research on homelessness and technology. Yet I continued to spend time listening to him talk. One day Jessie told me to take down his number. He told me to call him and we would have a beer sometime. This was the first phone number I had received in the field, and it made me feel as though I had developed a field relationship beyond mere acquaintanceship.

168 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

The many afternoons I spent talking with Jessie turned out to be fortuitous for my research. When I next spoke to Rodney, the man who had changed seats to avoid talking to me in the early days of my research, it was a warm exchange. Jessie and Rodney walked into the agency together that day. As it turns out, Jessie was Rodney’s uncle. Jessie saw me at the lunch table and introduced me to his nephew, not knowing of our previous less-than-warm exchange. Rodney and I got to talking, and I learned that he kept two phones because, in the midst of homelessness, one was always getting lost or broken or stolen. The conversation helped frame my research on the precarity of mobile phone access for people in poverty and the way people make do with alternate configurations of phone possession, sharing, and use.15 To arrive at the conversation with Rodney, I had to overcome the reasonable hesitation many members of low-income communities have toward others,16 researchers included.17 I did my best to adapt to the social conventions of interaction I observed at the agency and minimize the markers of my difference. I also gained from earning the trust of a respected member of the scene. The agency offered a relatively structured, indoor environment in which to develop my ethnographic sensitivities over the first year of my fieldwork with people experiencing homelessness. I would put sensitivities to use as I expanded my fieldwork out into the public spaces of the neighborhood. AROUND THE NEIGHBORHOOD

A year at People First allowed me to learn from experience what was required to develop trust with low-income adults in the setting of a nonprofit agency. The agency brought people with diverse experiences of poverty from around the city into a relatively predictable, indoor environment. After the first year, I wanted to expand my field site to include places around the neighborhood relevant to the experience of people without stable shelter. I had heard about a struggle between homeless residents of the neighborhood and the city the previous winter over the right of unhoused people to set up tents in a park near the lake. I wanted to see how these people were organizing their efforts and how (or whether) they used social media and smartphones to keep connected to each other and to the public.

169 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

I was wary of approaching people staying in the park without an introduction. I had in mind my attempts early in the study to strike up conversations with men asking for change on street corners. I began asking people around the agency whether they knew anyone who slept in a tent in the park. One day the opportunity came. I had been spending more of my time at the agency sitting beside people at the computers who were open to letting me watch as they browsed Facebook. During one such occasion, a client I had gotten to know named Vicki paused on one of her Facebook friends. She told me I should meet this man, who was active in advocating for the rights of unhoused people in the neighborhood like himself. I sent the man, named Eric, a friend request and asked him for an interview. We met at a coffee shop in the neighborhood the next week. A thin white man in his early sixties with a stern face but an easy smile, Eric quickly became one of the most forthcoming of the people I interviewed and spent time with during my research. Depending on the night, he stayed either in a tent in the park or in a neighborhood single-room occupancy (SRO). I learned that Eric was a kind of spokesperson, among others, for the unhoused people in Waterside. He had as Facebook friends and phone contacts many journalists, activists, lawyers, and philanthropists around Chicago. He kept track of the people who were homeless in the neighborhood and helped coordinate outreach and charity efforts. One outreach effort was a pop-up church service and free lunch held weekly in the park under a sunshade, sponsored by a Korean congregation in the suburbs. Another was a food drop made weekly by a black philanthropist from a South Side suburb. Eric invited me to join him for these occasions and said I could reach him by message on Facebook. I showed up at the outdoor church service the very next Sunday. There was a diversity of people who gathered at the lakeside park for church services and food drops and who slept there in tents. I learned that, in addition to a dozen or so who were sleeping outside, others who came to eat and worship in the park slept in shelters, affordable housing units, or nursing homes in the neighborhood. The scene was different for research than what I had experienced at People First. Many more people appeared to be under the influence of alcohol or drugs and to be suffering from mental health issues. Among those who stayed in the park, I watched people urinate under trees or take care of their business in buckets set up for this purpose, as there were no public toilets in the vicinity. While the park police

170 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

largely tolerated the presence of the gatherings and overnight tents, people made sure to look for and track police vehicles when they passed by. Though I never stayed overnight, I visited the park once or twice a week over several months to get to know more people who were hanging out there for lack of anywhere else to sleep or spend their days. When food was delivered by churches or community organizations, I played a dual role of volunteer and recipient. I helped unload donations and took the opportunity to talk to the people who had brought the food. As at the agency, when it was clear that there would be leftovers, I sat in the grass to eat my portion alongside other recipients. I dressed down like I did at the agency. I listened more than I spoke. I never hid my identity in conversation, but neither did I preface every conversation with a description of my research. Perhaps I sensed that, even more than at the agency, this would have been an awkward approach. When I heard something interesting that I wanted to follow up on for my research, I used my notebook as a prop to introduce myself, as I had learned to do at the agency. Over the months, I got to know a good portion of the people I saw regularly in the park. Some were forthcoming in talking to me, and others maintained their distance. What seemed to make people warm up to me was seeing me time and time again and seeing others among them warm up to me. As the anthropologist Clifford Geertz18 observed, it can be particularly important how an ethnographer reacts to tense and emotional experiences shared with members of the field site. While I never ran with my participants from the police, I stood with them beside their tents when police questioned them, waited out in freezing temperatures for food to arrive, helped diffuse verbal altercations, sat with a woman who had been attacked by a stranger, and occasionally shared in a beer or sip of liquor when it was offered. When members of a church visited the Waterside park to deliver food and clothes one day, I let them pray over me as they did the others. “Things are going to turn around for you,” I remember the congregant telling me. Thus, there was the rare occasion when it seemed I passed as homeless in the park. This was typically the case with nonregulars in the park, such as the visiting congregation or unhoused people from other parts of the city who passed through. To those who regularly stayed or gathered in the park, it became common knowledge that I was a student or “professor.” I continued to communicate my student status and ask for permission when

171 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

it came time to write down comments and stories that would end up in my research. The notebook, again, aided in marking my difference and broaching the subject of my research. As at the agency, my notebook worked well as a prop to inform people—and remind them when they forgot19—that I was different, that I was a researcher. And I worked to keep my differences from the people I was studying present in my own mind—not the least of which was that as I was returning each night to my one-bedroom apartment a few neighborhoods away, my participants were still in the park, preparing their tents for the night or returning to SROs in Waterside. If I slipped and started to think I was experiencing life in the park like my participants were, I recalled what one unhoused man replied when I first introduced myself to him as a researcher: “Oh! So, you’re not one of us. You’re observing us.” Indeed. Sharing experiences through consistent presence over time helped me improve my rapport with the people who gathered and stayed in the park. At the same time, I was benefiting from the relationships I was developing with people experiencing homelessness in Waterside who had the respect of others in a similar situation. Eric, whom I have described as a kind of spokesperson for the Waterside homeless community, was an anchor for me when I felt out of place in the scene and knew no one else around at the time. People often approached Eric for advice and help, and it helped to be standing next to him when this happened. At the same time, I also found it useful not to rely on Eric too often once I came to see that not everyone appreciated the unelected leadership he assumed over the small community staying or gathering in the park. I describe in the section that follows the relationship I developed with a black couple (the Freemans) who arrived at the park a few months into my research there. They tended to keep their space from Eric and to have the sympathy of more of the black members of the scene than did Eric. I could alternate, spending time alongside Eric and then with the Freemans, when I wanted to get in on different kinds of conversations happening in the park. I established the park as my second field site in Waterside after a year becoming comfortable with getting to know people experiencing homelessness at the agency. As I spent time at the park in the second year of fieldwork, I began to take note of the additional neighborhood sites that served as anchors for daily life. In lieu of having nine-to-five employment and a place of shelter where they felt comfortable and safe, people spent

172 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

their days at public and semipublic locations—libraries, nonprofits, cafés, department stores—where they could find resources or simply sit and rest. In addition to the People First agency and the park, I began to spend time at two public libraries in the neighborhood, where I would run into people I knew with some regularity. I sat at neighborhood cafés, fast-food restaurants, and the cafeteria of a chain department store. I would run into my participants at Starbucks and the cafeteria, as well as at fast-food restaurants, though not at one of the more upscale cafés in the neighborhood. Making the rounds to these different sites helped me understand the role of indoor public spaces as anchors for the daily routines of unstably housed people. Spending time in these locations gave me opportunities to strike up conversations with people during different points in their day and in settings that allowed for different kinds of conversations. For example, I found that people felt comfortable talking in some places and not others and talking with me alone rather than in a group. Mixing up the locations was a benefit to data collection in that regard. As I elaborate on later, fortuitous encounters during my rounds were particularly important for keeping up with those with whom I lacked a reliable connection through a phone number, as I had with Jessie, or a messaging application, as I had with Eric. ON THE PLATFORM(S)

So far, I have described the process whereby I established myself in a field site through face-to-face interaction. In this section, I dig into some of the issues I encountered in attempting to incorporate digital technologies into my efforts to observe people’s lives and keep in touch with them. I consider whether and how smartphones and social media can serve ethnographers as tools to recruit and keep up with people experiencing homelessness and as field sites in their own right. The matters of getting in and gaining trust are mainstays of reflections on ethnographic methodology.20 What is less often explored is the role of information and communication technologies in the practice of urban ethnography. This is a missed opportunity. Social media platforms and the smartphones that grant us (near) continuous access to them are novel means for ethnographers to observe and keep in touch with people whose lives are in flux, the lives of people they study. At the same time, while internet scholars have advanced the methods of digital ethnography,21 they have

173 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

done so largely without taking a firm stance as to whether offline ethnography provides the context necessary for our conclusions about what people do online.22 Digital platforms may be underutilized by ethnographers carrying out their studies through conventional means—that is, through face-to-face interaction. Internet researchers, meanwhile, miss much of the embodied context for what is shared and communicated online. There are important exceptions to this trend. Jenna Burrell’s study of internet cafés in Ghana and Jeffrey Lane’s offline/online research with youth in Harlem are texts that guide my own approach, with their attendant methodological reflections.23 What I hope to contribute to these and other reflections on digital-age ethnography with marginalized communities is an account of the balance that must be struck in taking advantage of smartphones and social media without relying on digital tools to replace face-to-face interaction. Additionally, I emphasize (and problematize) the more practical matters of recruiting and keeping in touch with our participants. First, I relate my attempts to recruit unhoused people through “cold calls” over Facebook. Then I give the example of my relationship with an unhoused family to show how building rapport face-to-face can facilitate a fruitful online connection. Keeping up with our participants on social media can be productive for both data collection and the practical matter of staying in touch with people without stable housing. I also reflect on the limitations that come from relying on digital channels without attention to the face-to-face maintenance of field relationships. I begin with the implications of social media platforms for getting into the field. Approaching people whose lives are very different from our own—in my case, people a generation or two above me struggling to secure long-term shelter—and asking them to be a part of our research can feel both awkward and intimidating. The social media environment changes the terms by allowing us to send out messages to our hopeful participants from the comfort of our home, campus, or favorite café. The social distance does not change, but the approach is potentially less stressful for both the researcher and the potential participant. Is there promise to approaching people experiencing homelessness online in order to kick-start an ethnography? From the start, we should be aware of the extent to which the people we are interested in learning from are represented on social media. In the case

174 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

of my own research, I knew that the proportion of low-income, middleaged and older adults in the United States who were active on the internet and social media was much smaller than that of their wealthier or younger counterparts.24 I kept this in mind even as, in the course of my study, I came across several posts in homeless-related Facebook groups by people describing their situation and asking for housing referrals or a place to stay. Sensing an opportunity for recruitment, I sent private messages to four people who made these posts. In the messages, I identified myself and requested an interview in as considerate a way as I could devise, including in my messages a link to city services for people without shelter. Not wanting to compel people to accept an interview out of financial desperation, I offered no compensation. My four messages received no replies from the strangers in need. One opened the message, according to the indicator in the Facebook messaging application. The three others never did. There are myriad reasons why people experiencing a crisis like eviction might not reply to a message online from a researcher. An unhoused person may be hesitant, like most people, to engage with strangers online.25 They may be disinterested in or suspicious of academic research, in particular. They may have lost touch with their online accounts in the process of losing their housing. Or they may simply have not seen the message, as Facebook does not make messages from people not in one’s network obvious to see. My limited number of social media cold calls were an ineffective means of recruiting people experiencing homelessness into my study. Of course, just like approaching men asking for change on the street corner, other researchers might benefit from approaches I did not pursue. Certainly, offering reimbursement could help, keeping the ethics of such reimbursement in mind. Sending more messages to specific and active online communities of people experiencing homelessness would increase one’s chances of getting responses. The matter of informed consent is critical in this context, and friend or follow requests should be sent with messages indicating who you are and why you are reaching out as well as clarifying that the person you are contacting may remove you from their contacts at any time.26 Noting the uncertain potential of social media cold calls, I want to broaden my point. What would it mean to rely on a sample of people active on social media to reach conclusions about how the internet impacts the lives of people experiencing homelessness? Suppose I had been successful

175 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

in recruiting a number of unhoused people through their postings on Facebook. I could learn about how the ability to reach out over Facebook shifts the terms of support seeking for people going through crises. What I would not learn about are the perspectives of those who sense the importance of social media for connection and support but who, for various reasons, remain offline or highly passive in their digital participation, much less the perspectives of those who have never considered how social media could be helpful to them. As I observe in my current phase of research in Waterside, people living on and off the streets were often motivated to be active on Facebook and other social media platforms. Yet for important reasons— such as having a limited understanding of how to use social media platforms and specific concerns over their online exposure—many chose to stay off the site. Others I have met in person had a Facebook account but were strictly “lurkers,” contributing no posts, photos, comments, or “likes” for digital ethnographers to record. The lesson here is that the internet— and social media, in particular—can have important meaning even for people who use them seldom, if ever. Researchers who focus on active users will miss out on the experiences of these people and as a result may easily draw the wrong conclusions. Thus, I argue that due diligence for digital-age ethnographers involves engaging in person as well as online with the people from whom we hope to learn. Important questions follow from such an argument. How do ethnographers add a digital connection to an in-person relationship established at a field site? How do they balance the in-person and digital aspects of an offline/online ethnography as time goes on in the research? In what follows, I show how my spending time face-to-face with a family experiencing homelessness grew into a relationship that involved sustained engagement over Facebook as well as in person. I explore how to balance the opportunity offered by digital channels of communication with the need to maintain face-to-face interaction in the study of the lives of people who are unstably housed. This conversation leads to a conclusion in which I synthesize the in-person and digital discussions to provide takeaways for digitalage ethnography with people experiencing homelessness. Briana and Donnie Freeman are a married black couple raising two sons while experiencing homelessness on Chicago’s North Side. I got to know them before they had acquired the family-sized tent in which they sleep. When I met them, they were still sleeping every night in the open air on

176 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

their makeshift beds of blankets and tarps, laid on the grass in the lakefront park. Park security largely tolerated their presence for the first six months I knew them. Our meetings and conversations began in person and, as trust developed, shifted to an online platform—namely, Facebook. As it turned out, this digital connection would sustain our relationship as our physical co-presence waxed and waned. Before we connected on Facebook, I relied on guessing where the Freeman family would be at particular times of the day and week based on certain reoccurring events, such as the church service every Sunday and the food drops every other day. I could generally rely on seeing Briana or Donnie, or both of them, at these events, with their sons alongside. I found other opportunities to cross paths with the Freemans as they shared more with me about their daily routines. I began to spend time in the cafeteria of a department store that offered Wi-Fi, as I could count on running into the Freemans there once every few visits. There was a unique pleasure in these fortuitous meetings that was lacking when we scheduled a time to meet. The reliability of these face-to-face encounters with the Freeman family came and went. A few weeks passed when I did not see them at all. The family had stopped attending the church service because of their distaste for the worship style; meanwhile, the philanthropist became less reliable with his food drops. Not seeing the Freemans troubled me. It meant losing my observational perspective on how the family managed their lives day to day. More simply, I wondered if the Freemans were doing okay. I turned to circling the park on my bike hoping to run into the family and spending more time at the department store cafeteria. I asked around among church attendees and others in the community of unstably housed people who gathered at the park. I got different answers and guesses as to where the Freemans had gone—and nothing conclusive. Then, after a few weeks had passed, I arrived at the park to see Donnie chatting with the South Side philanthropist, who had arrived with large platters of chicken and rice. Briana and the kids were nearby. After exchanging greetings, I learned from Donnie that the Freemans had been taken in by a charitable stranger who played host to them for a few weeks at her suburban home. After interviewing him about this experience, I decided to broach the subject of exchanging phone numbers. Briana’s reply was to ask if I was on Facebook. She explained that their Facebook accounts were more reliable as a means to keep in touch than their phone numbers.

177 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

Briana said she expected her phone number to change if the family decided to switch carriers to take advantage of a sign-on deal, such as a free phone they could give to the older of their two sons. They might also drop their phone numbers if they hit a wall in their ability to afford their current monthly service plan, which included an allotment of talk, text, and data for the family to share. They would save money by signing up for a cheaper, pay-as-you-go service plan, getting a new phone number in the process. I paused to jot down notes as Briana spoke. I broached the subject of observing the couple’s activity on the site for research, beyond just exchanging messages there. I told them it was part of my study to understand how people experiencing homelessness “use technology to connect with others and find opportunities.” I told them I would follow their posts and observe their interactions and connections and then ask them about these in interviews. Donnie nudged my shoulder playfully: “Alright, professor, I hear ya! Let’s do it.” He and Briana pulled out their smartphones and searched my name. We became Facebook friends. My Facebook connection with Donnie and Briana provided two advantages in regard to how well I could continue to learn from them about the experience of being homeless in the digital age. The first was that I had a new site for data collection. Observations extended now from the park and neighborhood cafés to what Donnie and Briana shared on Facebook (and later Instagram). My data also included conversations we had in our private message threads. I noted how Briana projected a sense of normalcy in her social media feeds—posts about gourmet food, pop culture, and photos from her childhood—that was lacking in her daily life on the streets. While Donnie posted little to nothing publicly, he included me in a steady stream of private messages of the “share this with ten of your friends” variety. Donnie seemed to reserve social media for browsing and forwarding content on private messages, while Briana was more forthcoming, posting to her timeline and engaging in in-depth conversations with me about their family’s condition over private messaging. My online observations allowed me to wonder, like other researchers, about the role of gender in shaping social media communication and what the outcomes of different styles of communication could be for accessing social support over social media platforms.27 The second advantage of our Facebook connection was practical. Briana, in particular, was timely in checking and responding to my messages. I no

178 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

longer had to rely on attending charity events or making the rounds in the park or at public Wi-Fi hotspots to see her and her family. I could now reach out over instant message to see about joining them to catch up. Without the Facebook connection, I might have lost touch with the Freemans like I had once before. Half a year after we became online friends, winter was approaching, and the Freemans were making plans to find indoor shelter. They chose a shelter that would house families in a South Side Chicago neighborhood, more than forty-five minutes by train from Waterside, where my research was centered. Due to a rapport built in person, I was able to carry on a conversation with Briana over instant message and keep up with her posts over Facebook despite the family’s move across town. We could reconnect in person, having the anchor of social media to maintain the relationship. This was a tether for my relationship with the Freemans that a phone number could not provide because they were active on social media but often went without phone service. There are limitations to the practical advantages of social media for ethnography with people who are unstably housed. First, a Facebook account is not impervious to the precarity of life without stable shelter or income and the risks of going online with limited digital literacy. Phones are lost, broken, waterlogged, and stolen.28 Public computer access does not always fill in the gaps because people may lose access to their social media accounts. Several of my participants went through two or three Facebook accounts over the period of the research project, having been locked out due to forgotten passwords (including those of connected email addresses) or breaches of their account security. Second, different styles and motivations will make online communication fruitful with some participants in a study but not others. As the Freemans’ physical presence in my study waned, messages with Briana continued to offer me insight into her and Donnie’s struggles to secure longterm shelter for their family, while Donnie was less a resource through online exchanges. He passed along mass messages but did not engage me in substantive back-and-forth. I was starting to get Briana’s perspective absent of Donnie’s. Finally, the richness of conversations over instant messaging is likely to fade as time passes without sharing experiences in person. Indeed, as the Freemans continued to search out housing elsewhere in the city, I felt online conversations hollowing out with Briana. I arranged for us to meet at a

179 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

restaurant when the family was back in the neighborhood. It helped to rekindle our relationship to share a meal and recall our experiences in the park. CONCLUDING THOUGHTS

There is no ready blueprint for studying homelessness through ethnography. The digital age adds new uncertainty as technologies reshape how we communicate and relate to one another. Similarly, there is no one experience of homelessness for the people who go through it. Researchers setting out to learn from people experiencing homelessness in the digital age must be prepared to adapt in order to get into a field site, gain trust once there, and keep up with research participants as their lives move about the neighborhood and beyond. I set out in this chapter to recognize my own mistakes and adaptations as a novice ethnographer of homelessness and communication technology. I hope that readers take away a few key lessons. To start, getting access to people who can teach you about homelessness from their experience of it may require time and more than one approach. It may be tempting to start collecting data right off the bat with the people who appear to be the most accessible. That approach may fail outright, as with my Facebook cold calls, or it may return shallow data, as with my impromptu interviews with people asking for change on street corners. I found the latter to be the case as well in my first weeks at the agency, when I was overeager to get interviews from people I had just met. Indeed, what makes ethnographic data unique and valuable is that they emerge from trust relationships built on shared experiences over time. Still, it can feel like a vague and intimidating prospect to set out from campus to find a place where you can start spending time with people experiencing a situation often distant from your own. The advantage of getting permission to spend time at the agency was that I found a setting with some structure—an indoor office space with social workers and seating areas for conversation—that was lacking on the street corner. I learned to pace my approach as well as to build relationships with respected members of the scene who could vouch for me with others. As I expanded my field site outside the agency, I noted how the dynamics inside the agency shaped the data I collected and how people shared different perspectives through conversations in the park, library, or cafés. Heading into less (and differently)

180 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

structured environments around the neighborhood, I took with me the lessons of pacing my approach, building on shared experiences, and being intentional in relationships with respected members of the scene. In this chapter, I have highlighted the lessons that apply particularly to ethnographic research that takes the internet seriously as a tool for research and a force shaping contemporary life. Incorporating smartphones, social media, and the internet at large, both in our data and in our means of collecting data, is what I mean by doing ethnographic research “in the digital age.” I argue that it requires spending time offline with people experiencing homelessness to get a broad understanding of the role social media play in their lives, whether or not they are active on social media platforms. By establishing digital ties with the people from whom we hope to learn in our ethnographies, we expand the realm of data collection and gain a tool for keeping in contact. The latter is a particular advantage when our research concerns populations whose whereabouts and routines are likely to change due to the lack of stable housing. My conclusion thus departs from the advice offered in a recent and influential ethnography of homelessness. When considering in a footnote how a researcher might develop a disposition conducive to ethnography before ever stepping into the field, Matthew Desmond suggests: “It also helps to get rid of your smartphone.”29 Yet digital spaces and the smartphones that grant us access to them are increasingly a part of everyday social life, including for those experiencing homelessness. As such, ethnographers should embrace the smartphone and learn to be duly attentive to what our participants say and do both online and offline.30 As my experience with a community of unstably housed adults in Chicago suggests, ethnography of homelessness in the digital age will be most successful when there is a balance struck between our offline and online means of learning about the lives of others.

NOTES 1. M. Hammersley and P. Atkinson, Ethnography: Principles in Practice (Abingdon, UK: Routledge, 2007); J. Lofland et al., Analyzing Social Settings: A Guide to Qualitative Observation and Analysis, 4th ed. (Belmont, CA: Wadsworth, 2006). 2. Hammersley and Atkinson, Ethnography; Lofland et al., Analyzing Social Settings. 3. D. A. Snow and L. Anderson, Down on Their Luck: A Study of Homeless Street People (Berkeley: University of California Press, 1993); M. Desmond, Evicted: Poverty and Profit in the American City (New York: Broadway Books, 2016).

181 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

4. R. E. Guadagno, N. L. Muscanell, and D. E. Pollio, “The Homeless Use Facebook?! Similarities of Social Network Use Between College Students and Homeless Young Adults,” Computers in Human Behavior 29, no. 1 (2013): 86–89, https://doi .org/10.1016/j.chb.2012.07.019. 5. W. Marler, “Accumulating Phones: Aid and Adaptation in Phone Access for the Urban Poor,” Mobile Media and Communication 7, no. 2 (2018): 155–174, https://doi .org/10.1177/2050157918800350; W. Marler, “Mobile Phones and Inequality: Findings, Trends, and Future Directions,” New Media and Society 20, no. 9 (2018), https://doi .org/10.1177/1461444818765154. 6. B. G. Glaser and A. L. Strauss, The Discovery of Grounded Theory: Strategies for Qualitative Research (Chicago: Aldine, 1967); K. Charmaz, Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis (London: SAGE, 2006). 7. Desmond, Evicted; E. Liebow, Tally’s Corner: A Study of Negro Streetcorner Men (Lanham, MD: Rowman & Littlefield, 1967); M. Duneier, Sidewalk (New York: Farrar, Straus and Giroux, 1999). 8. E. Anderson, Code of the Street: Decency, Violence, and the Moral Life of the Inner City (New York: Norton, 2000). 9. R. M. Emerson, R. I. Fretz, and L. L. Shaw. Writing Ethnographic Fieldnotes (Chicago: University of Chicago Press, 2011). 10. Duneier, Sidewalk. 11. Emerson, Fretz, and Shaw, Writing Ethnographic Fieldnotes. 12. C. B. Stack, All Our Kin: Strategies for Survival in a Black Community (New York: Basic Books, 1975). 13. Anderson, Code of the Street. 14. Snow and Anderson, Down on Their Luck. 15. Marler, “Accumulating Phones.” 16. S. S. Smith, Lone Pursuit: Distrust and Defensive Individualism Among the Black Poor (New York: Russell Sage Foundation, 2007). 17. Liebow, Tally’s Corner. 18. C. Geertz, “Deep Play: Notes on the Balinese Cockfight,” in The Interpretation of Cultures (New York: Basic Books, 1973). 19. B. Thorne, “ ‘You Still Takin’ Notes?’ Fieldwork and Problems of Informed Consent,” Social Problems 27, no. 3 (1980): 284–297. 20. Hammersley and Atkinson, Ethnography; Lofland et al., Analyzing Social Settings. 21. C. Hine, Ethnography for the Internet: Embedded, Embodied, and Everyday (London: Bloomsbury, 2015); T. Boellstorff, Ethnography and Virtual Worlds: A Handbook of Method (Princeton, NJ: Princeton University Press, 2012). 22. J. Lane, review of Ethnography for the Internet: Embedded, Embodied, and Everyday by Christine Hine, Contemporary Sociology 45, no. 5 (2016): 610–612, https://doi .org/10.1177/0094306116664524v. 23. J. Burrell, Invisible Users: Youth in the Internet Cafés of Urban Ghana (Cambridge, MA: MIT Press, 2012); J. Lane, The Digital Street (Oxford: Oxford University Press, 2018); J. Burrell, “Material Ecosystems: Theorizing (Digital) Technologies in Socioeconomic Development,” Information Technologies and International Development 12, no. 1 (2016): 1–13. 24. X. Li, W. Chen, and J. D. Straubhaar, “Concerns, Skills, and Activities: Multilayered Privacy Issues in Disadvantaged Urban Communities,” International Journal of Communication 12 (2018): 1269–1290; E. Hargittai, “Potential Biases in Big Data:

182 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

Omitted Voices on Social Media,” Social Science Computer Review 38, no. 1 (2020): 10–24, https://doi.org/10.1177/0894439318788322. 25. J. Vitak et al., “  ‘I Knew It Was Too Good to Be True’: The Challenges Economically Disadvantaged Users Face in Assessing Trustworthiness, Avoiding Scams, and Developing Self-Efficacy Online,” Proceedings of the ACM on Human-Computer Interaction 2, no. CSCW, art. 176 (2018), https://doi.org/10.1145/3274445. 26. Lane, The Digital Street, 177. 27. M. Burke, R. Kraut, and C. Marlow, “Social Capital on Facebook: Differentiating Uses and Users,” CHI 2011, May 7–12, 2011, Vancouver, BC, Canada, 571–580, https:// doi.org/10.1145/1978942.1979023; S. Tifferet & I. Vilnai-Yavetz, 2014. 28. Marler, “Accumulating Phones.” 29. Desmond, Evicted, 404. 30. Lane, The Digital Street.

REFERENCES Anderson, E. Code of the Street: Decency, Violence, and the Moral Life of the Inner City. New York: Norton, 2000. Boellstorff, T. Ethnography and Virtual Worlds: A Handbook of Method. Princeton, NJ: Princeton University Press, 2012. Burke, M., R. Kraut, and C. Marlow. “Social Capital on Facebook: Differentiating Uses and Users.” CHI 2011, May 7–12, 2011, Vancouver, BC, Canada., 571–580. https://doi .org/10.1145/1978942.1979023. Burrell, J. Invisible Users: Youth in the Internet Cafés of Urban Ghana. Cambridge, MA: MIT Press, 2012. Burrell, J. “Material Ecosystems: Theorizing (Digital) Technologies in Socioeconomic Development.” Information Technologies and International Development 12, no. 1 (2016): 1–13. Charmaz, K. Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis. London: SAGE, 2006. Desmond, M. Evicted: Poverty and Profit in the American City. New York: Broadway Books, 2016. Duneier, M. Sidewalk. New York: Farrar, Straus and Giroux, 1999. Emerson, R. M., R. I. Fretz, and L. L. Shaw. Writing Ethnographic Fieldnotes. Chicago: University of Chicago Press, 2011. Geertz, C. “Deep Play: Notes on the Balinese Cockfight.” In The Interpretation of Cultures. New York: Basic Books, 1973. Glaser, B. G., and A. L. Strauss. The Discovery of Grounded Theory: Strategies for Qualitative Research. Chicago: Aldine, 1967. Guadagno, R. E., N. L. Muscanell, and D. E. Pollio. “The Homeless Use Facebook?! Similarities of Social Network Use Between College Students and Homeless Young Adults.” Computers in Human Behavior 29, no. 1 (2013): 86–89. https://doi .org/10.1016/j.chb.2012.07.019. Hammersley, M., and P. Atkinson. Ethnography: Principles in Practice. Abingdon, UK: Routledge, 2007.

183 ETHNOGRAPHIC RESEARCH WITH PEOPLE EXPERIENCING HOMELESSNESS

Hargittai, E. “Potential Biases in Big Data: Omitted Voices on Social Media.” Social Science Computer Review 38, no. 1 (2020): 10–24. https://doi.org/10.1177/0894439318788322. Hine, C. Ethnography for the Internet: Embedded, Embodied, and Everyday. London: Bloomsbury, 2015. Lane, J. Review of Ethnography for the Internet: Embedded, Embodied, and Everyday by Christine Hine. Contemporary Sociology 45, no. 5 (2016): 610–612. https://doi .org/10.1177/0094306116664524v. Lane, J. The Digital Street. Oxford: Oxford University Press, 2018. Li, X., W. Chen, and J. D. Straubhaar. “Concerns, Skills, and Activities: Multilayered Privacy Issues in Disadvantaged Urban Communities.” International Journal of Communication 12 (2018): 1269–1290. Liebow, E. Tally’s Corner: A Study of Negro Streetcorner Men. Lanham, MD: Rowman & Littlefield, 1967. Lofland, J., D. A. Snow, L. Anderson, and L. H. Lofland. Analyzing Social Settings: A Guide to Qualitative Observation and Analysis. 4th ed. Belmont, CA: Wadsworth, 2006. Marler, W. “Accumulating Phones: Aid and Adaptation in Phone Access for the Urban Poor.” Mobile Media and Communication 7, no. 2 (2018): 155–174. https://doi .org/10.1177/2050157918800350. Marler, W. “Mobile Phones and Inequality: Findings, Trends, and Future Directions.” New Media and Society 20, no. 9 (2018). https://doi.org/10.1177/1461444818765154. Smith, S. S. Lone Pursuit: Distrust and Defensive Individualism Among the Black Poor. New York: Russell Sage Foundation, 2007. Snow, D. A., and L. Anderson. Down on Their Luck: A Study of Homeless Street People. Berkeley: University of California Press, 1993. Stack, C. B. All Our Kin: Strategies for Survival in a Black Community. New York: Basic, 1975. Tifferet, S., and I. Vilnai-Yavetz. “Gender Differences in Facebook Self-Presentation: An International Randomized Study.” Computers in Human Behavior 35 (2014): 388–399. https://doi.org/10.1016/j.chb.2014.03.016 Thorne, B. “ ‘You Still Takin’ Notes?’: Fieldwork and Problems of Informed Consent.” Social Problems 27, no. 3 (1980): 284–297. Vitak, J., Y. Liao, M. Subramaniam, and P. Kumar. “ ‘I Knew It Was Too Good to Be True’: The Challenges Economically Disadvantaged Users Face in Assessing Trustworthiness, Avoiding Scams, and Developing Self-Efficacy Online.” Proceedings of the ACM on Human-Computer Interaction 2, no. CSCW, art. 176 (2018). https://doi .org/10.1145/3274445.

Chapter Nine

GOING RURAL Personal Notes from a Mixed-Methods Project on Digital Media in Remote Communities TERESA CORREA AND ISABEL PAVEZ

It was 2011. I1 was finishing my dissertation at the University of Texas at Austin and already thinking about my next steps when I met a visitor from Chile, my home country, at a social gathering. She asked me: “So what do you do?” We researchers often struggle to describe what we do to someone outside the academy. I was trying my best to explain in lay terms that I do research on inequalities and digital media when she interrupted me and told me with enthusiasm that she worked at a telecommunications company. Then she told me about a new initiative in which the government and telecommunications companies were providing internet connections to isolated rural villages in Chile. I tried to get more details, but she did not know much more, as she worked in another area of the company. Most likely she was just trying to engage with my research interests and have a conversation. For me, however, that brief encounter triggered what became my first large research grant, a multimethod project that turned into a research program lasting several years. This project examined the factors that explained digital media access and use among people who live in isolated rural communities in Chile. It was a three-phase mixed-methods endeavor that included in-depth interviews in ten villages, a face-to-face representative survey in twenty-two villages, and six focus groups in three villages. In this chapter, I will describe in detail how and why we designed this multimethod project in rural communities.

185 GOING RURAL

I will also discuss how research projects sometimes have to take detours or get interrupted and are then resumed, the challenges involved in the process, and the decisions we had to make to handle unexpected events. FROM THE LIGHTNING BOLT TO THE RESEARCH DESIGN

Right after that fortuitous social encounter, I looked for more information about this internet access policy. It was a private-public initiative funded by the Chilean government’s Telecommunications Development Fund, which subsidized the technology infrastructure required to connect geographically isolated areas that were not financially attractive for internet providers. When the program was first launched, one telecommunications company won the government tender. As a result, in 2010 and 2011 the program, called Todo Chile Comunicado (All Chile Connected), installed antennas with 3G wireless connections in 1,474 rural communities. That way the residents of these communities could access the internet through mobile phones and 3G modems. As part of the public-private agreement, the companies could not charge residents more than they do in the closest urban area. I knew this public-private policy initiative represented a good case study in which to explore the factors that explain digital media adoption and use (or lack thereof) after access infrastructure is provided. At that time, Chile had one of the highest internet penetration rates in Latin America,2 thanks to a long-established public policy on information and communication technology (ICT) that promoted infrastructure for internet access.3 However, this country of seventeen million people has rugged geography: deserts in the north; rushing rivers, islands, and fjords in the south; the Andes mountains, which extend all along the country; and relatively long distances between rural towns. All these characteristics pose relevant challenges for getting people connected. In fact, despite the higher levels of internet penetration led by highly populated urban areas, the rural-urban gap was large. At the time, over a third of rural households (40.5 percent) did not have internet connections.4 In my subfield of communication focused on ICT, much research is devoted to studying inequalities in digital media uses, skills, and outcomes, having moved on from studying basic access divides. My own work has focused on that perspective as well. Nonetheless, I thought that

186 GOING RURAL

access gaps were relevant, particularly in a period when policy makers were focusing on mobile access to close the gaps. That is, in an age when the internet had reached the majority of the population and seemed ingrained in our everyday lives, there were still digitally and socially excluded communities that faced very particular challenges. These require ad hoc strategies from both the academic research and the policy-making perspectives. I suspected that current digital gaps could not be tackled with the same strategies employed for the population that had already been connected. Therefore, some of the research questions were these: How do the people of isolated communities define and understand ICTs? What are the degrees of adoption and digital inclusion of the inhabitants of recently connected isolated communities? What are the factors related to ICT adoption and use in rural communities that recently received mobile infrastructure access? Once I graduated from the University of Texas and settled in Chile after landing a job as a new assistant professor, I turned to addressing these research questions. It was early 2013. I thought they could become a good project for my first large research grant. After my dissertation, I knew that I did not want to work alone anymore, as it had been too isolating and I value collaborations. I invited Isabel Pavez, then a PhD candidate at the London School of Economics, to join me as coinvestigator. Good research partners help each other by exchanging viewpoints on how to design the research project and how to handle unexpected events during fieldwork. They also bring different methodologies, knowledge, and training, and— most importantly—they provide new perspectives on what we observe and how we interpret our observations and results. In sum, while the research process sometimes becomes challenging and strenuous, they make it more joyful and productive. As I do in most of my research projects, I decided to take a mixed-methods approach. The idea of combining qualitative and quantitative methodologies in social sciences was initially promoted in the 1970s, particularly by scholars of the policy evaluation field. They needed to use quantitative methods to assess the effect sizes of program interventions and qualitative methods to provide meaningful interpretations of the program results for policy makers.5 In addition, Denzin6 developed the concept of triangulation to combine data, investigators, theories, and methods. The use of the mixed-methods approach became more widespread in the early 2000s,7

187 GOING RURAL

which was when I was being trained as a scholar. Since then, I have become a firm promoter of mixed-methods designs. The combination of quantitative and qualitative methodologies is challenging because it requires not only understanding and managing a wider repertoire of methodological techniques but also navigating and negotiating the sometimes conflicting epistemological perspectives and languages. How do we deal with these challenges? First, having a good research team with complementary research backgrounds and areas of knowledge is essential. Second, maintaining a “pragmatic” attitude when navigating what sometimes seem like diverging research approaches is necessary. Mixed-methods researchers have rooted this multimethod approach in pragmatism as a philosophical grounding.8 Johnson and Onwuegbuzie have argued that pragmatic scholars do not intend to solve the ontological or normative differences between more purist methodological positions.9 Rather, they focus on the practical consequences of the ideas and research questions, and their efforts are more encompassing and oriented toward what works.10 As a result, our approach has been that in some cases we have combined deductive and inductive approaches. In other cases, we have emphasized one worldview over the other depending on the research question or technique. Despite these challenges, the mixed-methods approach has been particularly valuable because it combines the strengths of both types of methodologies. Based on Creswell and Plano Clark’s11 mixed-methods typology, in this study about digital media access and use in rural communities, we used sequential research designs, particularly exploratory and explanatory approaches. The exploratory research design starts with a qualitative data collection, followed by a quantitative data gathering. This design is used when the researcher needs an initial exploration to identify variables or develop an instrument for a follow-up quantitative investigation. The explanatory design normally uses qualitative results to explain and interpret the findings of a quantitative study.12 Although including persons with different areas of research expertise and diverse experiences is very enriching when designing a project, it can also become difficult when deciding if a quantitative or a qualitative approach should lead the design of the fieldwork. In this case, after several discussions, Isabel and I designed a three-phase investigation that started with in-person qualitative interviews. These would give us a sense of the

188 GOING RURAL

particularities of the rural communities selected and their social and cultural contexts before embarking on a large-scale survey, followed by focus groups to give more depth to the survey findings. The first qualitative exploration consisted of in-depth interviews conducted with rural community members. These in-depth interviews allowed us to meet the subjects in person, hear their voices and discourses, and understand their context as well as their experiences, feelings, and beliefs— including those impacting more private issues13 such as family interactions regarding digital technologies. They also revealed aspects that were not foreseen by the theory and the literature, such as how people’s isolated geographic contexts shape their discourses, attitudes, perceptions, and behaviors regarding everyday needs. For example, access to transportation, to clean water, to education, and in some cases even to electricity was not a given. From a media scholar’s perspective, these contexts situate digital technologies in a remarkable setting, as technology artifacts such as a smartphone and a refrigerator may compete for relevance in participants’ everyday lives. Furthermore, understanding these specificities helped us create a more valid and reliable survey instrument because it provided us insights regarding people’s everyday circumstances and attitudes toward digital technologies as well as hints about the language they used to talk about these topics. The second phase was a face-to-face representative survey, which provided greater generalization of the results to the rest of the population who lived in these rural communities under similar circumstances. It also allowed us to examine in a systematic and replicable form the factors associated with ICT adoption in these communities, estimate the power of the relationships, and control for different factors that play a role in digital media adoption and use. Finally, the results of the survey gave us information that helped us identify three types of communities based on their low, middle, and high levels of ICT adoption and use, where we then conducted two types of focus groups: ones with users and others with nonusers. The main strength of these focus groups is that they provided an environment where the participants could share their experiences with and ideas about digital technologies.14 The narratives of the focus groups helped us gather a more “fine-textured understanding”15 of our survey results and of what being users and nonusers means in a rural community.

189 GOING RURAL

FIRST STEPS: SELECTING COMMUNITIES

When we were developing our first research design, we contacted people from the government and the telecommunications company that had developed Todo Chile Comunicado. Specifically, in the government we contacted someone who worked in the Undersecretariat for Telecommunications, the office that was in charge of the Telecommunications Development Fund. In the telecommunications company, we first contacted the communication liaison, who set up a meeting with people in the area who installed and monitored this program. The public-private initiative advertised that it had benefited 1,474 isolated localities. Thanks to these meetings, we realized that the claim of benefiting these localities was true to some extent but not entirely so. Strictly speaking, not all of them were completely isolated. The program had identified 1,474 geographic zones—also called polygons—with lower internet connectivity, but they were not necessarily isolated. Nearby antennas provided some level of connection. Therefore, we decided to select those areas (or villages) that were completely isolated— defined as not having internet connection in the nearby areas before the program was implemented. With the help of the telecommunications company, we identified a sample of 106 localities across Chile. They were small: 85 percent of the selected villages had fewer than one thousand inhabitants. Thanks to our conversations with the government officials and company representatives involved with the program, we also found out that, surprisingly, two years after installing the antennas, the data traffic remained quite low. It was lower than what they expected and lower than voice traffic. That was the first yellow flag: residents of these communities were not using the internet despite having access to it. We were onto something in wanting to research the internet adoption and use of these communities. We knew it was going to be excessively expensive and strenuous to reach 106 isolated villages, especially given the geographical particularities of the country noted earlier. Fortunately, it was also unnecessary. Relying on stratified sampling would address our needs. Keeping in mind the second phase of the project, which involved a face-to-face representative survey, we decided to select a subsample of twenty-four isolated communities stratified according to geographic region and average data traffic per inhabitant. We selected eight communities each from the northern, central,

190 GOING RURAL

and southern regions of Chile. To identify communities with different levels of internet adoption, we used the data traffic logs of the previous year (from early 2012 to early 2013). Data traffic was calculated as the number of connections per capita—that is, the number of connections registered by the telecommunications company in a year divided by the population of each community. For the 106 villages, the average number of connections per capita was 0.61, with a standard deviation of 0.15. Therefore, we selected twelve villages with a number of connections below that average and twelve villages with a number of connections above that average. As a result, in our initial design we had four communities in each category. For instance, in the northern region, we selected four communities with fewer internet connections per capita and four with more internet connections per capita. The twenty-four communities all had fewer than one thousand inhabitants. Our initial intent was to survey about 50 people in each community to have a final sample size of 1,200. However, the final sample size was going to depend on the amount of money approved by the grant agency. As happens with many research projects, the stratified design that we neatly organized in our office—and the exact number of communities— changed in the second phase of the research project. In the middle of the survey fieldwork, unexpected events occurred: torrential rains and alluvions in the northern desert region flooded complete villages. Fortunately, we had the details for all 106 communities in case we had to change some of our survey sites, which is what ended up happening (more on this later in the chapter). Once we secured a grant from Chile’s National Commission for Scientific and Technological Research, or CONICYT, we were ready to start organizing our first phase: visits to and in-depth interviews in rural communities. VISITING THE ISOLATED COMMUNITIES: IN-DEPTH INTERVIEWS WITH ETHNOGRAPHIC ELEMENTS

For this project, it was key to collect contextual information about communities and their inhabitants in order to understand these heretofore unresearched isolated areas. Due to budget constraints, we could not visit all twenty-four villages that were part of the second-phase survey study; rather, we strategically picked a subsample. We had funds for three trips, so out of our sample of communities, we chose ones that were relatively close

191 GOING RURAL

to each other (about a four-hour drive by car). To make the most out of our trips, we also included isolated communities that were in the area and had received internet connection for the first time as part of the program—so they were on the list of 106 villages but were not on our survey-targeted short list of 24. They all had fewer than one thousand inhabitants, maintaining our focus on small communities. We conducted the fieldwork between June and August 2014. We ended up researching ten communities in the three different regions—four in the north, two in the center, and four in the south. They differed in their economic activities, which included mining, agriculture, fishing, and tourism. We investigated more than one community in each region to help identify and compare data in order to achieve a contextual understanding. As Bryman16 asserted, “Conducting qualitative research in more than one setting can be helpful in identifying the significance of context and the ways in which it influences behavior and ways of thinking.” As outsiders to these tight-knit communities, we were aware of the challenges posed by this type of qualitative approach, particularly given the nature of the relationship between researchers and participants.17 We were strangers—in their eyes, privileged professionals from Santiago, the capital city of a highly centralized country. Many of them had never visited Santiago. Reflexivity, the continued awareness of our position as outsiders and researchers, helped us navigate some of these issues.18 Before our visit, we wanted to have a contact person, ideally a community leader, in each of the communities to serve as an entry point and to provide us with a sense of the community’s organization. To achieve this, we contacted the municipality to which each community belonged. After several attempts, we were able to obtain a cell phone number of a contact person for each community. Some of them were the representative—called consul—of the municipality in the community, some were the president of the neighborhood association, and some were the community’s schoolteacher. Many times teachers were key. Most of these rural villages have primary-only schools led by one to three teachers at most. Therefore, students from different grades share a classroom and have the same teacher. Another relevant leader in some of these communities was the president of the senior citizens’ association because in these places the population is aging due to the lower birth rate and the out-migration of young people. We prearranged phone interviews with these contact people prior to our arrival.

192 GOING RURAL

Many of these communities were isolated in several ways. They could be in the middle of the desert—as the mining village Inca de Oro—or in an Andean mountain pass on the border between Chile and Argentina— as Puerto Fuy. Many barely had access to public transportation. Malihue, for example, a farming village in the south, had a public bus only twice a week. One of the most precarious cases was Carrizal Bajo, a seaside locality in the northern desert region, with about one hundred inhabitants dedicated to seaweed harvesting. The village was not connected to a power grid. Households relied on petrol generators for short periods of time during the day to produce electricity and keep their food refrigerated. The other communities had been able to access electricity and rural pay phones only since the 1990s. Then, in the mid-2000s, they obtained access to mobile phones because some private companies provided the services in nearby localities. However, many times the geographical conditions interfered with the signal. For the data collection, we deliberatively took an unstructured approach, closely related to ethnography, to focus on informants’ perspectives. The main method used was in-depth interviews, which were recorded and transcribed. We also had informal conversations, on which we took field notes. Although these did not constitute ethnographic interviews because participants were interviewed only once, we took elements from this approach. In this sense, the everyday context for these exchanges was extremely helpful, as we talked to people while they worked, interacted with other community members, and/or accessed the internet. This ambience provided a sort of “intuitive understanding”19 that gave an added perspective to the data.20 Most of the time we started with our prearranged interviews. These interviews were conducted in the participants’ houses, workplaces (such as a wood artisan’s workshop), primary schools, or community centers. The interviews lasted about one hour. Then these first participants walked us around the community, and as a snowball sampling method, they introduced us to other members and friends such as teachers, shop owners, and their adolescent or grown children. We took the opportunity to interview these people as well. Besides these more formal interviews, we walked around the village and informally talked to other people in the community to understand their context: day-to-day life in the area, ideas, attitudes, and discourses. In total, we formally interviewed forty-eight people, ranging in

193 GOING RURAL

age from fourteen to their seventies, but we talked to many more. Before being interviewed, they all had to sign an informed consent form. In the case of the adolescents, their parents or guardian signed an informed consent form, and the minors signed a child assent form. The conversations were open and unstructured; that is, we could start from different points of entry depending on the participants’ life interests. These exchanges usually took place in participants’ own natural settings,21 which gave us an opportunity to start with topics related to their everyday lives, their jobs, and their interests. In addition to these initial flexible topics,22 we had an interview guide that covered our two main focus areas: characteristics of the community and ICTs. For instance, we talked about economic activities, work opportunities, the level of connectivity to the urban areas, the evolution and current demographic characteristics of the community, characteristics of the school, community organizations, and family dynamics. We also explored the historical and current technology access in the community—including public pay phones, mobile phones, and the internet—and their own technology adoption or lack thereof, including their motivations, needs, expectations, networks of support, and types of use. When we noticed a new or unexpected topic emerge in an interview, we included and further explored that theme in the following interviews. For instance, when we were starting our fieldwork in La Laguna, a farming community in the central region, we could see that the antenna was very visible and salient in the landscape. When we interviewed Ana, president of the seniors’ association, she said: “I’ve never known what that antenna is for. When they put it up, nobody told the community they were going to put up an antenna.” She did not seem very happy about it. That was the first clue that suggested to us that people who had been raised in a familiar and tight-knit environment were seeing this antenna as a foreign element. We also realized that the antennas had been installed, but nobody from the program or the local government had approached the inhabitants to explain the purposes of these devices and the meanings of internet connectivity. In the following visits and interviews, we realized this was a major theme in every community. The new device was a source of negative discourse and fears about damaging health consequences. In communities like Los Maquis, people asked the local government to move the antenna farther away because they complained about headaches and seizures. In Inca de

194 GOING RURAL

Oro, people demanded that they be allowed to cover it with wooden panels because it was installed next to a preschool facility. Others were relieved when it was placed on the mountain and not in the middle of the village. This became a major finding that we did not foresee.23 It reinforced the importance of the first qualitative explorations in the communities. This first immersion helped us realize relevant consequences of the outmigration of children and young people from the community. We were able to observe—and parents would also tell us—that children were using their laptops, for example, at the kitchen table, as the kitchen is usually a central part of the home. There, parents and seniors would begin to engage in very simple tasks, such as looking for information that could be later used for their work or venture. However, children at the age of ten to twelve would have to migrate to a nearby larger town to continue secondary school. They could not commute daily due to the lack of transportation. This situation hampered this brokering of technologies, but on the other hand, it also became an opportunity because it forced some parents to engage with digital technologies to communicate with their children.24 All these became topics to cover in our survey. After visiting these communities, we felt better prepared to devise a face-to-face survey instrument, design a sampling method appropriate for these subjects, and foresee some challenges that interviewers could face during their survey fieldwork. CONDUCTING A SURVEY IN RURAL AND ISOLATED COMMUNITIES

After we had designed a survey instrument, we contacted Feedback, a polling firm with which we had worked several times when conducting national surveys for other projects. Our face-to-face representative survey was to be conducted in two dozen remote villages, and we needed to outsource the fieldwork because of the logistics involved in deploying interviewers to distant villages across Chile. At the same time, we needed to monitor the process very closely. We knew this was not the “standard” survey in terms of sampling method and subjects. Therefore, it was very important to work with a company that we trusted and that would allow us to be directly involved in the sampling process, the training of interviewers, and the monitoring of the entire fieldwork process. As researchers, we

195 GOING RURAL

were able to work with the person at the polling firm who was directly in charge of fieldwork. We started with our list of twenty-four localities strategically selected in the beginning. We also constructed a backup list of about twenty other communities that met the geographic and data traffic criteria in case we encountered any problems reaching the initial twenty-four. Having this backup list ready turned out to be very important during fieldwork due to some unforeseen circumstances that followed, which we detail next. We needed to sample households within the communities but did not have accurate sampling frames because the census data were old.25 Additionally, in our initial qualitative fieldwork, we realized that the distribution of households differed greatly from village to village. In some villages, the households were organized along a main street. But in others, they were geographically very dispersed. Therefore, we obtained maps from the Military Geographic Institute—the Chilean institution in charge of the development of cartography of the country—to study the scope and characteristics of the communities. Given these characteristics and the lack of a clear sampling frame, we used the random route as a sampling method.26 In each community, we selected one geographic starting point with a random point generator. Then the interviewers followed the route that we previously specified and sampled the households systematically using a prespecified interval.27 Specifically, we gave them the following instructions: In a one-street community, the interviewers had to take opposite directions and sample systematically using a prespecified interval. If the community had multiple streets, they also had to take opposite directions (north and south), follow a zigzag route, and sample in a systematic way using a prespecified interval. In each household, one fourteen- to seventyfive-year-old person was randomly chosen.28 With this random-route technique, there is a greater risk that the interviewers will influence the selection and the method will no longer be “random.” Therefore, to minimize the risk, we exerted strict control. Using Skype, we directly trained the teams of interviewers spread across Chile. This way we had the chance to explain the purpose of the study and motivate them with the policy-making relevance of this investigation, and because we had been in the communities, we could foresee and warn about some challenges that the interviewers were likely to encounter during fieldwork, such as the dispersion of households. Fortunately, this

196 GOING RURAL

polling company had vast experience in doing surveys, so its staff were able to offer advice for the training and fieldwork process. Then, together, we closely monitored the interviewers during the fieldwork by calling, checking their progress, motivating them about the importance of the study, and solving day-to-day challenges. Given that many of these villages had very small populations (three hundred to five hundred people), surveying the required forty to fifty households was almost like doing a census of them. Despite these challenges, one of the rewarding parts of doing research in rural communities is that people are generally welcoming and take their time to answer the survey with patience. Often they invited interviewers for tea, lunch, or even dinner. We did not offer incentives for completing the survey. Most of the interviews were conducted on several weekends because it was easier to find people at home. In the end, we obtained an 82 percent response rate. Because of financial constraints—the funding agency cut our requested survey budget by about 20 percent—we had to reduce the number of villages we were able to visit and the number of interviews we were able to conduct. We had hoped for 1,200 surveys in twenty-four localities but ended up with funds for 1,000 surveys in eighteen localities. Nevertheless, we maintained sample diversity by region and level of data traffic. The survey fieldwork started in March 2015. It went more or less smoothly in the southern and central regions, but we experienced unexpected circumstances in the north, where unseasonal rains caused flash floods in several areas.29 Over 30 people died and 4,500 were forced to leave their households. Understandably, under such precarious conditions, we had to stop fieldwork. Some of the communities where we had started data collection were so damaged that we could not return to them. To address this unforeseen situation, we had to include new communities. Fortunately, we had the backup list of other possible villages. However, many of these new possibilities were not in good shape either. We had to be flexible. We kept trying until we were able to find and visit new places that were isolated and had been connected by the program. By the end of April, we had completed one thousand surveys in twenty-two communities, although in three northern villages we could complete very few surveys (from five to sixteen), limiting our ability to do analyses at the community level. The results of the survey revealed that the level of internet adoption was low, with 39 percent of households connected to the internet and 37 percent

197 GOING RURAL

of people having used it.30 This was much lower than the rural penetration rate of 56 percent, according to government figures collected during the same year (2015).31 Inspired by these findings, for the third phase of the project, we wanted to explore further the perspectives of both users and nonusers. FOCUS GROUPS: GATHERING USERS AND NONUSERS

This was our last phase of data collection. Based on the results of the survey, we selected three communities with different levels of internet adoption: Puerto Fuy had the highest level of household internet penetration (58 percent). It was located on the shores of Lake Pirihuico in an Andean mountain pass on the border between Chile and Argentina. Approximately, five hundred people lived in this village, which had about six streets with contiguous houses. Historically, the main economic activity in Puerto Fuy was forestry, but this was changing to tourism, fueled by tourists crossing between the countries and hotels opening in nearby national parks. Los Maquis had an average level of household internet adoption (42 percent). Located in the Andean foothills in the central region, this village was surrounded by vineyards and olive trees, which were the main source of work for its inhabitants. It had approximately 350 people, whose houses were mostly distributed along the main—and only—street, although there were distant houses up the hill. Finally, La Población had the lowest level of internet penetration (22 percent). This village, also located in the central region, had about four hundred people dispersed over a wide geographic area. Subsistence and industrial types of farming were the main economic activities. Although the houses close to the main road had clean running water and electricity, those up in the hills relied on water extracted from a creek. Interestingly, we had visited two of them during our initial exploration. We wanted to conduct focus groups to witness interactions at the community level. Focus groups allow observation of the group dynamic and also tend to mimic everyday interactions and how participants influence each other.32 Therefore, they provide a setting that promotes contact among participants and enables examination of their hierarchies of topics, discourses, and perceptions. They also allow people to explore each other’s arguments.33 We wanted to avoid power hierarchies based on technology

198 GOING RURAL

literacy so we organized two types of focus groups in each community: one for users and another for nonusers. With the help of two research assistants, we began the process by reconnecting with our contact people. In the newly added village of La Población, we followed the same strategy as before: we contacted the municipality to which the village belonged and obtained the phone number of a leader of the neighborhood association. We stayed in nearby larger towns as we conducted our fieldwork in November and December 2015. When we arrived, we explained the purpose of this phase of the study to our contact person and asked to use a community space. The headquarters of the neighborhood association was the most convenient place because our contact person always had the keys. Schools were hard to use because of the regular school schedule. We also coordinated the focus group times—preferably in the evening so people could attend after work. Then we started doing doorto-door visits to apply the screening questionnaire and sort out the possible participants into our two types of focus groups: internet users and nonusers. Because the communities were small, we were able to apply screening questionnaires in most of the households of a village in one to two days. Then we took note of people’s contact information and asked about their availability. With all that information in hand, we selected the participants and called them to confirm their attendance. The time between our doorto-door visits and our confirmation calls was two to three days. Although this sounds like a well-oiled process that went smoothly, it was not. In the first village that we visited—Puerto Fuy—we had brought small travel bags as a gift. But we did not highlight this in the recruitment process. Given that people had been welcoming in the previous phases, we thought they were going to participate out of sheer interest or curiosity. For the first focus group—internet nonusers—four of the ten people we had invited showed up. For the second—internet users—only three of the ten invited appeared. We still conducted the focus groups and gave them their gift. However, we did not use those data. We left Puerto Fuy and came back three weeks later. This time we decided to offer money—CL$10,000 (approximately US$15) in an envelope—as an incentive and were clear about it up front. After that, we never again had a recruitment problem. We repeated the process and the money incentive in the other communities, and we always had a diverse group of eight to eleven people in the focus groups. In total, we interviewed sixty-five people in the six focus groups.34

199 GOING RURAL

We were aware of the complexities of conducting focus groups in tightknit, small communities where people know each other. By then, we had learned that the urban-rural gap and the power relationship that it inevitably brings were relevant for the inhabitants’ past relationships with outsiders, like tourists or politicians campaigning for office. We researchers were among those outsiders. Qualitative research literature has discussed this issue at length,35 stressing that the presence of an “outsider moderator” affects focus group participants who may overperform by, for instance, highlighting either the positive or the negative aspects of living in such localities. There might also be power hierarchies among participants of which the moderator is not aware, which may affect how people interact with each other. Therefore, this final stage was still challenging, but it was also a great opportunity to reflect on the entire process. Since we had already visited two of these communities, we had some background information about them. Additionally, we spent some time in them before conducting the focus groups. This gave us further information about the main characteristics of the communities and the participants in the focus groups. These insights helped us warm up the conversation in the sessions and immediately identify the dominant voices. To counteract these power dynamics as much as we could—because it was not possible to eradicate the dominant voices—we purposefully encouraged the participation of all members by sometimes asking targeted questions and making sure everyone could express their opinions. From this experience, we learned that conducting focus groups in these kinds of settings required much more than a good topic guide. We needed previous knowledge about the community. At that point, numerous trips, walking tours, conversations, interviews, and surveys had given us some understandings about these communities, their people, and the role digital technologies played in people’s lives. CONCLUDING REMARKS

This project showed that as social science researchers, we have to be aware of our surroundings. Even a serendipitous encounter at a social gathering may lead to the development of a multiyear research program. It also demonstrated the importance of talking to and developing relationships with people who work in our area of interest but outside our community of researchers. Although we have to be aware of maintaining our

200 GOING RURAL

independence, in this case the insights received from those in industry and government were key in understanding this initiative and designing the project. Many of our results led to questions about the way ICT policy was implemented with respect to isolated rural communities. Fortunately, the stakeholders listened to our input. We hope this feedback has resulted in better informed policy-making decisions that incorporate people’s contexts and particularities. As suspected in the beginning, the communities and people who remain disconnected in an era when the great majority are already online need ad hoc strategies. They cannot be approached with the same policies as were used earlier. Providing mobile infrastructure access might be a first step, but it is certainly not enough to get the majority of rural community inhabitants connected. This window of research opportunity, the Todo Chile Comunicado program, was intended to be a public policy that would somehow improve the lives of people in isolated rural communities. Yet the diverse and uncharted contexts and the social and cultural changes could not be addressed only from one angle or with one main research method. To be able to reflect on the role of technology and how these communities responded to these technology changes, we had to use a more complex and holistic approach. In this sense, a mixed-method design offered a tool kit with which to tackle the main research questions and gave us knowledge of how to address both the expected and the unexpected. In this rural quest, we learned how to make geography, as such, a participant in the research. We had to incorporate it to plan logistics, set time frames, access the communities physically, and deal with all the unpredictable weather events that hampered data gathering. Also, because we were working in rural communities with different experiences and understandings of development and technology, it was important to get to know local specifics before designing the data collection strategy. First, it was important both to gather information from the government, the telecommunications industry, and the municipalities that govern these communities to prepare for our visits and to have backup plans in case things did not work as expected. Second, survey development benefited from our being in the communities in person and talking to people, enabling us to avoid creating unrelatable questions or using language that would prove difficult to understand. The quantitative approach was key to going beyond people’s personal descriptions to identify trends. In sum, a solid body of

201 GOING RURAL

quantitative data, when combined with the qualitative data, provided us with the evidence needed to claim that for many people to go online in these rural communities, it was necessary to go beyond mobile infrastructure access. Finally, after confirming that the level of connection was low, it was truly a privilege to hear from nonusers how they made sense of this connected world and the level of pressure they felt to learn. Furthermore, we were able to provide them a space in which to share their failed digital experiences, the storytelling from internet users that haunted them, and their thoughts of feeling left behind. A mixed-methods approach gave us the freedom to ask questions constantly, including ones that we did not necessarily envision from the start. This process enriched our understanding and opened a complex but fascinating rural perspective often lacking in the research literature we address. It also let us grow as researchers, itself a worthwhile experience.

NOTES We would like to thank several research assistants who worked on different phases of this project: Javier Contreras, Nicolás Contreras, Cristina Barría, and Miguel Ángel Flores. We also want to express thanks for research grants Fondecyt 1140061 and Fondecyt 1170324, which supported this research. 1. The first-person pronoun I refers to the first author, Teresa Correa. The first-person plural pronoun we is used when the second author, Isabel Pavez, joined the project. 2. International Telecommunication Union (ITU), World Telecommunication/ICT Indicators Database (Geneva: ITU, 2012). 3. D. Kleine, Technologies of Choice? ICTs, Development, and the Capabilities Approach (Cambridge, MA: MIT Press, 2013). 4. Subtel, Estudio Quinta Encuesta sobre Acceso,Uusos,Uusuarios y Disposición de Pago por Internet en Zonas Urbanas y Rurales de Chile [Study Fifth Survey on Access, Uses, Users and Internet Payment Arrangements in Urban and Rural Areas of Chile] (Santiago: Intelis, Universidad de Chile, and Subsecretaría de Telecomunicaciones, 2014). 5. D. Royse et al., Program Evaluation: An Introduction, 4th ed. (Belmont, CA: BrooksCole, 2006). 6. N. K. Denzin, “Triangulation: A Case for Methodological Evaluation and Combination,” in Sociological Methods: A Sourcebook, ed. N. K. Denzin (Chicago: Aldine, 1978), 339–357. 7. J. Creswell and V. Plano Clark, Designing and Conducting Mixed Methods Research (Thousand Oaks, CA: SAGE, 2007); R. B. Johnson and A. J. Onwuegbuzie, “Mixed Methods Research: A Research Paradigm Whose Time Has Come,” Educational Researcher 33, no. 7 (2004): 14–26; A. Tashakkori and C. Teddlie, Handbook of Mixed Methods in Social and Behavioral Research (Thousand Oaks, CA: SAGE, 2003). 8. Creswell and Plano Clark, Designing and Conducting Mixed Methods Research.

202 GOING RURAL

9. Johnson and Onwuegbuzie, “Mixed Methods Research.” 10. Creswell and Plano Clark, Designing and Conducting Mixed Methods Research. 11. Creswell and Plano Clark, Designing and Conducting Mixed Methods Research. 12. Creswell and Plano Clark, Designing and Conducting Mixed Methods Research. 13. M. W. Bauer and G. Gaskell, eds., Qualitative Researching with Text, Image, and Sound: A Practical Handbook (London: SAGE, 2000); K. Rosenblum, “The In-Depth Interview: Between Science and Sociability,” Sociological Forum 2, no. 2 (1987): 388–400. 14. R. A. Krueger and M. A. Casey, Focus Groups: A Practical Guide for Applied Research (Thousand Oaks, CA: SAGE, 2014). 15. G. Gaskell, “Individual and Group Interviewing,” in Qualitative Researching with Text, Image and Sound: A Practical Handbook, ed. M. Bauer and G. Gaskell (London: SAGE, 2000), 39. 16. A. Bryman, Social Research Methods, 4th ed. (Oxford: Oxford University Press, 2012), 402. 17. K. Esterberg, Qualitative Methods in Social Research (Boston: McGraw-Hill, 2002); A. Lewins, “Computer Assisted Qualitative Data Analysis Software (CAQDAS),” in Researching Social Life, 3rd ed., ed. N. Gilbert (London: SAGE, 2008), 394–419. 18. Bryman, Social Research Methods. 19. H. R. Bernard, Research Methods in Anthropology: Qualitative and Quantitative Approaches, 4th ed. (Lanham, MD: AltaMira Press, 2006), 355. 20. E. S. Simmons and N. R. Smith, “Comparison with an Ethnographic Sensibility,” PS: Political Science and Politics 50, no. 1 (2017): 126–130. 21. M. Hammersley and P. Atkinson, Ethnography: Principles in Practice, 3rd ed. (London: Routledge, 2007). 22. B. Sherman, “Ethnographic Interviewing,” in Handbook of Ethnography, ed. P. Atkinson et al. (London: SAGE, 2001), 369–383. 23. For more details on these results, see T. Correa and I. Pavez, “Digital Inclusion in Rural Areas: A Qualitative Exploration of Challenges Faced by People from Isolated Communities,” Journal of Computer-Mediated Communication 21, no. 3 (2016): 247–263. 24. T. Correa, I. Pavez, and J. Contreras, “The Complexities of the Role of Children in the Process of Technology Transmission Among Disadvantaged Families: A MixedMethods Approach,” International Journal of Communication 13 (2019): 1099–1119. 25. The census in Chile is conducted every ten years. At that time, the last reliable census had been conducted in 2002. In 2012, due to methodological problems after changes in the strategies of data collection, census data were not reliable, and the government had to redo the census. Our fieldwork occurred in the middle of that problem, so we did not have new census data on which to rely. 26. S. Hader and S. Gabler, “Sampling and Estimation,” in Cross-Cultural Survey Methods, ed. J. Harkness, F. van de Vijver, and P. Mohler (New York: Wiley, 2003), 117–134. 27. P. Lynn et al., “Methods for Achieving Equivalence of Samples in Cross-National Surveys: The European Social Survey Experience,” Journal of Official Statistics 23, no. 1 (2007): 107–124. 28. To choose the participant randomly within the household, we used a Kish grid. See L. Kish, “A Procedure for Objective Respondent Selection Within the Household,” Journal of the American Statistical Association 42, no. 247 (1949).

203 GOING RURAL

29. BBC News, “Chile Floods Death Toll Rises to 17 as Clean-Up Begins,” March 30, 2015, https://www.bbc.com/news/world-latin-america-32114822. 30. For more survey results, see T. Correa, I. Pavez, and J. Contreras, “Beyond Access: A Relational and Resource-Based Model of Household Internet Adoption in Isolated Communities,” Telecommunications Policy 41, no. 9 (2017): 757–768. 31. Subtel, Séptima Encuesta de Accesos, Usos y Usuarios de Internet [Seventh internet access, uses and users survey] (Santiago: IPSOS and Subsecretaría de Telecomunicaciones, Gobierno de Chile, 2016). 32. Krueger and Casey, Focus Groups. 33. Bryman, Social Research Methods. 34. For more details on the focus groups, see I. Pavez, T. Correa, and J. Contreras, “Meanings of (Dis)connection: Exploring Non-users in Isolated Rural Communities with Internet Access Infrastructure,” Poetics 63 (2017): 11–21. 35. Bauer and Gaskell, Qualitative Researching; Bernard, Research Methods in Anthropology; Krueger and Casey, Focus Groups.

REFERENCES Bauer, M. W., and G. Gaskell, eds. Qualitative Researching with Text, Image, and Sound: A Practical Handbook. London: SAGE, 2000. BBC News. “Chile Floods Death Toll Rises to 17 As Clean-Up Begins.” March 30, 2015. https://www.bbc.com/news/world-latin-america-32114822. Bernard, H. R. Research Methods in Anthropology: Qualitative and Quantitative Approaches. 4th ed. Lanham, MD: AltaMira Press, 2006. Bryman, A. Social Research Methods. 4th ed. Oxford: Oxford University Press, 2012. Correa, T., and I. Pavez. “Digital Inclusion in Rural Areas: A Qualitative Exploration of Challenges Faced by People from Isolated Communities.” Journal of ComputerMediated Communication 21, no. 3 (2016): 247–263. Correa, T., I. Pavez, and J. Contreras. “Beyond Access: A Relational and Resource-Based Model of Household Internet Adoption in Isolated Communities.” Telecommunications Policy 41, no. 9 (2017): 757–768. Correa, T., I. Pavez, and J. Contreras. “The Complexities of the Role of Children in the Process of Technology Transmission Among Disadvantaged Families: A MixedMethods Approach.” International Journal of Communication 13 (2019): 1099–1119. Creswell, J., and V. Plano Clark. Designing and Conducting Mixed Methods Research. Thousand Oaks, CA: SAGE, 2007. Denzin, N. K. “Triangulation: A Case for Methodological Evaluation and Combination.” In Sociological Methods: A Sourcebook, ed. N. K. Denzin, 339–357. Chicago: Aldine, 1978. Esterberg, K. Qualitative Methods in Social Research. Boston: McGraw-Hill, 2002. Gaskell, G. “Individual and Group Interviewing.” In Qualitative Researching with Text, Image and Sound: A Practical Handbook, ed. M. Bauer and G. Gaskell, 38–56. London: SAGE, 2000. Hammersley, M., and P. Atkinson. Ethnography: Principles in Practice. 3rd ed. London: Routledge, 2007.

204 GOING RURAL

Hader, S., and S. Gabler. “Sampling and Estimation.” In Cross-Cultural Survey Methods, ed. J. Harkness, F. van de Vijver, and P. Mohler, 117–134. New York: Wiley, 2003. International Telecommunication Union (ITU). World Telecommunication/ICT Indicators Database. Geneva: ITU. Accessed July 10, 2019, https://www.itu.int/en/ITU-D/Statistics /Pages/publications/wtid.aspx. Johnson, R. B., and A. J. Onwuegbuzie. “Mixed Methods Research: A Research Paradigm Whose Time Has Come.” Educational Researcher 33, no. 7 (2004): 14–26. Kish, L. “A Procedure for Objective Respondent Selection Within the Household.” Journal of the American Statistical Association 42, no. 247 (1949): 380–387. Kleine, D. Technologies of Choice? ICTs, Development, and the Capabilities Approach. Cambridge, MA: MIT Press, 2013. Krueger, R. A., and M. A. Casey. Focus Groups: A Practical Guide for Applied Research. Thousand Oaks, CA: SAGE, 2014. Lewins, A. “Computer Assisted Qualitative Data Analysis Software (CAQDAS).” In Researching Social Life, ed. N. Gilbert, 394–419. 3rd ed. London: SAGE, 2008. Lynn, P., S. Hader, S. Gabler, and S. Laaksonen. “Methods for Achieving Equivalence of Samples in Cross-National Surveys: The European Social Survey Experience.” Journal of Official Statistics 23, no. 1 (2007): 107–124. Pavez, I., T. Correa, and J. Contreras. “Meanings of (Dis)connection: Exploring Nonusers in Isolated Rural Communities with Internet Access Infrastructure.” Poetics 63 (2017): 11–21. Rosenblum, K. “The In-Depth Interview: Between Science and Sociability.” Sociological Forum 2, no. 2 (1987): 388–400. Royse, D., B. A. Thyer, D. K. Padgett, and T. K. Logan. Program Evaluation: An Introduction. 4th ed. Belmont, CA: Brooks-Cole, 2006. Sherman, B. “Ethnographic Interviewing.” In Handbook of Ethnography, ed. P. Atkinson, A. Coffey, S. Delamont, J. Lofland, and L. Lofland, 369–383. London: SAGE, 2001. Simmons, E. S., and N. R. Smith. “Comparison with an Ethnographic Sensibility.” PS: Political Science and Politics 50, no. 1 (2017): 126–130. Subtel. Estudio Quinta Encuesta sobre Acceso, Usos, Usuarios y Disposición de Pago por Internet en Zonas Urbanas y Rurales de Chile [Study fifth survey on access, uses, users and internet payment arrangements in urban and rural areas of Chile]. Santiago: Intelis, Universidad de Chile, and Subsecretaría de Telecomunicaciones, 2014. Subtel. Séptima Encuesta de Accesos, Usos y Usuarios de Internet [Seventh internet access, uses, and users survey]. Santiago: IPSOS and Subsecretaría de Telecomunicaciones, Gobierno de Chile, 2016. Tashakkori, A., and C. Teddlie. Handbook of Mixed Methods in Social and Behavioral Research. Thousand Oaks, CA: SAGE, 2003.

Chapter Ten

STITCHING DATA A Multimodal Approach to Learning About Independent Artists’ Social Media Use ERIN FLYNN KLAWITTER

Envious. Flummoxed. Flabbergasted. Such were the reactions of my family members and close friends when I told them that I would be spending the summer of my dissertation research attending art and craft fairs for data collection. I know they were picturing me strolling a grassy aisle between booths on a lazy Saturday afternoon, sipping a cold lemonade as I browsed the custom-cast jewelry, copper lawn ornaments, and watercolor prints of Lake Michigan. OK, truth be told, I did do some of that on my ventures to ten different fairs in the Midwest, but my forays were actually executed with a specific social scientific purpose—to generate the most unbiased sample of independent artists that I could. The question that fundamentally drove my dissertation was one that had intrigued me during my prior professional career as a digital content strategist. I wanted to understand whether using social media to build relationships with sought-after audience members had any tangible effect. In short, I wanted to know this: Are differences in social media use related to differences in material consequences? Of course, one can imagine taking various paths to find the answer to that question, but I chose to design a study that focused on independent artists for reasons far more sophisticated than my desire to spend my first summer as an ABD (all but dissertation) traversing the Great Lakes region in search of fairs. That was indeed a happy result— but not as happy a result as being able to defend my eventual findings based

206 S T I T C H I N G D ATA

on the strength of my sample and the richness of my data collection. By inviting artists who sold their work at fairs as well as online to participate in my study, I avoided generating findings that would have related to some shared characteristic of artists who sold their work only online. My travels from Evanston, where I was enrolled at Northwestern University, to Milwaukee, Ann Arbor, and South Bend embedded my thesis in the intrinsically embodied, sweaty, and somewhat unpredictable world of artisanal craft fairs. Indeed, while I enjoyed my time at the fairs, the effort and expense required to attend them comprised only the first leg of a long journey of important methodological choices that informed a three-phase study—consisting of surveys, social media data and e-commerce sales collection, and interviews—and ensured the quality of my final results. THE CASE FOR INDEPENDENT ARTISTS AS A FOCAL POPULATION

So how did I find myself strolling the lush aisles of the Krasl Art Fair on the Bluff in St. Joseph, Michigan, recruiting artists, of all possible groups of people, for my dissertation research? As I prepared my dissertation proposal, I knew that I wanted to understand whether any of the social media activity I casually observed among artists helped create or sustain their businesses. In the years leading up to my proposal, I followed a number of artists online, women who were not only painting, throwing, and spinning but also generating content for self-branded lifestyle blogs and a variety of social media platforms. They were building online personas by telling stories and sharing imagery from their lives in an effort, it seemed to me, to attract an audience and to increase the sale of their wares. I was drawn to studying artists because I had a personal interest in art but also because their online efforts intrigued me. Still, my own fascination with artists’ work and communication styles does not a research rationale make. I needed to consider the theoretical and practical implications of my choice. From a theoretical standpoint, I noticed that social media, e-commerce websites, and mobile applications were mobilizing what I now conceptualize as the rhetoric of the “sharing economy,” a phrase that colloquially refers to the emergent phenomenon of entrepreneurs generating an income by sharing resources they own with others, who borrow or purchase limited use of them. This process is often facilitated by mobile applications and

207 S T I T C H I N G D ATA

websites that match a person who owns the resource with a person seeking to partake in it. The most obvious cases of this phenomenon are the ridesharing platform Uber and the home-sharing platform Airbnb. In the case of the former, drivers who own vehicles use a mobile application to find people who need, and are willing to pay for, rides. While Uber and other systems like it function much like a taxi service, they differ in their use of independent contractors, their charging systems, and, in general, their lack of government regulation.1 Similarly, Airbnb allows people to share the housing they own with others seeking a short- or longer-term place to stay. These services encourage the sharing of resources, but they do so at a cost. It is, of course, uncommon for independent artists and crafters to share the unique goods they make with purchasers. Rather, they sell them to interested buyers. How correct could my intuition be that independent artists are participants in the sharing economy? As I began to explore the topic of independent artists as social media users further, I found that e-commerce websites such as Etsy, arguably the most popular website for artists to list their goods and conduct transactions, encouraged artists to use social media to share information about themselves and their creative process to attract an audience and buyers for their work. For example, at the time of my research, the Etsy Seller Handbook provided this advice: But figuring out the best way to use social media to promote your Etsy shop can be tricky. . . . While the nuances are open to interpretation, it’s important to remember that social media is defined by the people who use it, and that includes you. At its core, social media gives us spaces to share our humanity, in all its complexity, beauty, and creativity. Unlike traditional forms of media, such as print ads and television commercials, social media allows you to put your passions front and center, while at the same time offering a space for you to connect with the passions of others. It is an ongoing exchange, and one that is constantly evolving. When approaching social media from a business standpoint, it is important to make the human element a central focus. If your audience feels personally and emotionally connected to you, their loyalty will extend beyond a single purchase or click. Regardless of the platform you prefer . . . develop a social media presence that is not only effective, but true to you and your business.2

Although various scholars had critiqued such discourse for valorizing unpaid or low-paid entrepreneurial labor in an economy characterized by

208 S T I T C H I N G D ATA

precarity,3 no one had yet measured the material economic consequences of such labor. Much of the work concerning the sharing and gig economy was solely qualitative in nature. Although such work provided necessary theoretical scaffolding that connected the emergence of sharing websites with the decline in secure long-term employment, none of it addressed what I believed to be a crucial question: How do people find success when they participate in the sharing economy? Lack of these data weakened necessary critique. As I continued my research, I realized that because many artists sell their work online, on platforms such as Etsy, which publicizes the number of sales completed by individual shops, I would be able to “observe” at least one material result of artists’ efforts to sell their goods. Rather than relying on self-reported sales or income data, I could monitor artists’ shops to determine whether or not they were having success selling their goods. My case for studying independent artists was growing stronger by the month: I was fascinated by the population; I had identified key theoretical issues my study could address; and if I could solve the problem of how to monitor sales, I could precisely measure an interesting material outcome. However, I still needed to confront two key limitations of a population that had so much going for it: first, the gendered nature of the art world and, second, the variety of work made by independent artists. I resolved these limitations by focusing my study on independent artists who primarily make functional art: jewelry, pottery, and textiles. In doing so, I also implicitly limited my study to women, who dominate the online markets for selling crafts. Literature in sociology and art history documents the fault line that has historically separated art and craft.4 The sociological perspective argues that the variations and implied hierarchy between the two may be explained by power dynamics. Art connotes fine pieces valued for their intrinsic aesthetic worth and created most often by male (so-called) geniuses, while craft refers to functional goods primarily valued for their domestic utility and produced by women. The markets that interested me were those oriented more toward the selling of crafts; the artists I observed using social media to promote their work were primarily crafters and, not surprisingly, primarily women. While some might argue that circumscribing the population of interest to women who make functional art limits the value of my research, I found

209 S T I T C H I N G D ATA

that doing so actually strengthened its arguments. First, studies focusing on women in the sharing economy are few and far between. By focusing on women, my dissertation filled a gap in the literature that tended to focus on platforms such as Wikipedia and Uber, which are dominated by men.5 Second, one of the variables implied in my research question—Are differences in social media use related to differences in material consequences?—is that of internet skills, or the know-how required to use the internet effectively.6 By asking how differences in social media use relate to differences in material consequences, I imply that better and worse ways of using social media may exist; ways that increase material benefits would be considered more skilled, while ways that decrease such benefits would be considered less skilled. When it comes to the relationship among gender, internet skills, and ways of participating online, the literature is not clear, although some have found that differences in online participation are related to gender.7 By limiting my study to women, I controlled for gender and thus could eventually argue that differences I found in material consequences were related to the women’s differences in use and not necessarily to gender differences among participants. After nearly a year of weighing the pros and cons of studying the social media use of independent artists, I finally stood on the bluff in St. Joseph, Michigan, overlooking the lake on a clear day. Surrounded by tents full of artists and their wares, I was confident that the flyers I handed out that day would reach a population that would allow me to make the claims I hoped to make. Independent artists were a theoretically, methodologically, and practically sound—as well as a personally interesting—choice. RECRUITMENT IN THE ABSENCE OF A CENSUS

Once I made the choice to examine the social media and online sales activity of independent artists, it was crucial that I generate an unbiased sample of them. Recruiting artists from a single online platform, even one as popular as Etsy, might result in participants who shared another characteristic related to their choice of Etsy as a selling platform, which might also relate to the material outcomes they experienced from using social media to sell their art.8 To identify a sample from which to recruit, I searched for comprehensive directories of independent artists from which I could create a random sample. Such directories do not exist. Similarly, none of the online

210 S T I T C H I N G D ATA

platforms I studied listed all participating artists’ names and contact information, so I could not simply draw a random sample from such a comprehensive listing. Because no census of independent artists existed, I thought carefully about how to create a sample from which to recruit participants.9 Based on research, I identified the different types of markets that might attract women who made functional art and determined that if I recruited artists from both juried and nonjuried art and craft fairs in the Midwest—a popular destination for crafters from across the United States during the summer months—I would reach a large number of artists who sell their work in multiple venues. Additionally, I created recruitment lists by sampling artists from four different e-commerce platforms focused on selling crafts: Aftcra, Artfire, TheCraftStar, and Etsy (see table 10.1). On each website, I applied the filters specific to that site to identify sellers meeting the basic recruitment criteria (such as being a U.S.-based seller who makes jewelry, pottery, or textiles). Then, rather than contacting sellers who appeared on the first page of the search results, I sampled sellers at specific intervals

TABLE 10.1 Construction of Recruitment Sample Round 1 (October 2014)

Round 2 (January 2015)

Round 3 (March 2015)

Art fairs

Ceramics Jewelry Textiles

28 59 32

Aftcra

Ceramics Jewelry Textiles

3 35 12

Artfire

Ceramics Jewelry Textiles

25 32 49

TheCraftStar

Ceramics Jewelry Textiles

3 43 43

Etsy

Ceramics Jewelry Textiles

125 125 125

250 250 250

47

739

1,184

158

Total recruited

7 7 7 6 85 20 80 170 163

Source: E. Klawitter, “Crafting a Market: How Independent Artists Participate in the Peer Economy for Handmade Goods” (PhD diss., Northwestern University, 2017), 50.

211 S T I T C H I N G D ATA

from across all pages of search results. While I compiled the recruitment lists, I also kept careful logs of how I was creating them. Maintaining a record of such details allowed me to report the process accurately in my final dissertation document. For example, I described this procedure for each round of recruitment in a fashion similar to this passage from my dissertation: In the first round of recruitment from Artfire, I used both the website taxonomy and the website’s search engine to generate a list of potential participants. I followed the link for “Fine Arts” from the Artfire homepage to “Ceramics & Pottery.” This generated 25 pages of items, from which I sampled the fifth unique shop from every page. This produced a list of 25 shops. I generated my list of shops that sell jewelry on Artfire by searching for the keyword “Jewelry.” This search resulted in 4718 pages of item-based search results. I proceeded through all of the results, sampling the fifth shop from every 135th page, which returned a list of 35 shops. Finally, to find shops that sold textiles, I followed the Artfire taxonomy for “Handmade” goods to the link for “Crochet/ . . . /Fiber Arts/Quilts,” from which I sampled 49 unique shops.10

Although this procedure was painstaking and time consuming, it prevented me from biasing the sample based on characteristics that might have influenced the search engine algorithm. Avoiding such bias was crucial for obtaining meaningful variations in material outcomes among sellers. For example, based on what I learned from my interview data, it is likely that the most successful sellers achieve higher search engine rankings. If I had sampled participants only from the first page of search results, I would have recruited only the most successful participants. This would not have allowed me to see the considerable variation in sales success that I found during my analysis of the data. Once I identified possible participants, I then determined how to contact them. In addition to handing out flyers at craft fairs and talking to potential participants in person—a futile effort, since most of them were focused on selling their products while at the fair—I also obtained sellers’ business cards and, in some cases, found their email addresses in the listings of participating artists on individual fairs’ websites. I then used email to send possible participants a recruitment note, a note that had been approved by my university’s Institutional Review Board (IRB), which protects human

212 S T I T C H I N G D ATA

subjects involved in research. Additionally, each e-commerce website provided a slightly different method for contacting sellers directly. Notably, Artfire, TheCraftStar, and Etsy did not provide sellers’ email addresses, so I relied on the contact forms provided on each website to recruit participants. In some cases, this simply involved copying and pasting the message from the approved email template into a contact form. However, TheCraftStar required me to fill out unique Captcha fields to send each contact form. This increased the work—and subsequent bleary-eyedness!—required to reach each contact. As I recruited participants, I processed their responses to the required screening survey so that I could determine whether they met the study’s eligibility criteria, including age, gender, and selling practices. Although the survey was short, it did collect information that participants might wish to keep confidential, so Northwestern’s IRB required that they give their informed consent prior to completing the screener. The informed consent gave considerable detail about the study design, including the requirement that participants consent to the collection of all the public social media content they posted for five months. I believe it was an ethical best practice to inform participants of this facet of their participation well in advance, and, in fact, a few participants did let me know via email that they were uncomfortable participating in a project that required them to allow a researcher to follow their public online activities. One said, “I’m happy to answer any written questions you might care to send my way, if that would be of help. I didn’t feel comfortable with the tracking of my other information for the study.” THE INITIAL PHASE: SURVEY QUESTIONNAIRE

Despite spending a number of months in the recruiting phase of the study, I did not meet my recruiting goal of seventy-two participants distributed equally across the three functional arts of interest (see table 10.2). Rather, after four months of contacting sellers electronically, I focused in the final month on filling gaps in the distribution of the types of art that participants made. At last, a group of forty-three participants, from the fifty-two I had determined to be eligible, completed the first phase of the study; a two-part, fifty-one-question survey. Part one of the questionnaire consisted of forty-four questions and included the measures in which I was

213 S T I T C H I N G D ATA

TABLE 10.2 Participant Yield N

Percentage

Contacted

2,084

Interested

135

6.5

65

48.1

Completed screening survey Invited to study

52

80.0

Phase I: Completed survey

43

79.6

Phase II: Completed social media app

39

90.7

Phase III: Completed interview

25

64.1

Source: E. Klawitter, “Crafting a Market: How Independent Artists Participate in the Peer Economy for Handmade Goods” (PhD diss., Northwestern University, 2017), 51.

most interested, such as the length of time participants had sold their goods online, the number of years they had promoted their creative businesses using social media, and the amount of income their craft business generated; in addition, a variety of questions sought information regarding their internet use experiences. Finally, I included a question to assess participants’ attentiveness to filling out the survey. The second part of the survey consisted of seven questions. These required participants to share identifying information—specifically, links to their e-commerce shops and social media profiles and contact information for compensation and the follow-up interview. Because the second part of the survey contained identifying information and I wanted to ensure that such information would remain separate from sensitive data, such as income, which participants shared on the first part of the survey, I used Qualtrics to build two separate surveys. Each required a personal identification code that I sent to each participant and that she then entered when accessing each survey. By assigning participants these codes, which I held in a secure, password-protected file, I could match their survey data with data I gathered later in the study. While this practice required me to complete additional steps in building the survey and in collecting and analyzing the data, the result was that I was later able to triangulate findings. I could connect participants’ confidential survey responses with their public social media posts and sales data. For example, I was able to assess whether or not participants who reported the most income from their craft businesses also generated the most sales in their online shops.

214 S T I T C H I N G D ATA

THE SECOND PHASE: USING A CUSTOM WEB-BASED APPLICATION FOR DATA COLLECTION

While I was recruiting participants for the study, processing their screening surveys, and inviting them to complete the survey phase of the study (phase 1), I was simultaneously collaborating with a professional web developer, Chris Karr of Audacious Software, to build a web-based application that would allow me to collect participants’ public social media posts and sales data. Chris had completed a master’s degree in my program at Northwestern and had worked with my advisor on other innovative data collection projects,11 so I was fortunate that he was available and willing to build the application I needed. I was able to compensate him for his work through internal grants from Northwestern. During a series of weekly Skype meetings, Chris evaluated my needs for the project and prototyped an application, which we named the Creative Work Study (CWS) app, that allowed me to assign participants unique codes that would then be associated with all their social media posts. Using an email generator built into the application, I assigned each participant her code, and then the email generator invited her to visit the application website, where she entered the URLs of her various social media feeds and online shops into a form. This was the extent of the artists’ active participation in the second phase of the study. Automated and Manual Data Collection

For five months, the app relied on RSS feeds, two scripts, and one devoted researcher to collect automatically and manually the participants’ public social media posts and sales data from their shops. One of the primary ways the application gathered data was through subscriptions to RSS feeds offered on social media. These subscriptions allowed the content posted on such social media to be “pushed” into the structured database Chris created. The second method of data gathering used social-media-specific application programming interfaces (APIs) to collect the date, time, permanent URL, and content of participants’ posts as well as the count of parasocial interactions (e.g., “Likes,” “Retweets”) the various posts received. This first script could automatically collect data only from participants’ Instagram and Twitter feeds. I had hoped to use an API to collect participants’

215 S T I T C H I N G D ATA

Facebook posts as well, but the organization would not approve the program, a fact that now has particular historical resonance given researchers’ misuse of data collected from Facebook around the same time.12 Despite our efforts to have our use of the Facebook API approved, which included contacting relatively senior employees at the company, we were unsuccessful. Finally, a second script scraped participants’ sales data from their Etsy shop homepage at 6 PM every day. I asked Chris to build a work-around for sites such as Facebook that did not provide a means for automatically collecting data or would not allow us to do so. He created a password-protected form in the administrative section of the app, which I then used to enter data manually from participants’ social media feeds. The most popular among these were Facebook and Pinterest. Because these data would not be automatically entered using an API, I decided to make the process more efficient by collecting all the participants’ social media data at the same time every week. Choosing to do so allowed me to ensure uniformity around the timing of data collection. Manually collecting data from participants’ Facebook, Pinterest, Google+, and/or YouTube pages at the same time every week meant that I spent every Friday from the beginning of March until the end of July copying and pasting every social media post generated by every participant from every one of their social media feeds that I knew about into the form. Thirty-nine women participated in this phase of the study. Following weeks when they were especially prolific, this process took up to twelve hours. Finally, the CWS app included an export function that allowed me to retrieve social media content and Etsy sales data. At the conclusion of the second phase of the study, I exported information from these two databases as text-delimited files. I then imported them into Excel for review and cleaning before importing them into Stata—for quantitative analysis— and Dedoose—for qualitative analysis. Once I removed duplicate entries from the database of social media posts—a risk inherent in copying and pasting social media data manually for hours and hours—I was left with 16,442 unique social media posts, collected over a period of five months. Analytical Challenges

To analyze the content of each social media post, I used Dedoose, an inexpensive web-based software program with desktop functionality that also

216 S T I T C H I N G D ATA

permits collaboration with other researchers. Following the conclusion of the second phase of the study, I prepared the data from the CWS app’s social media post database to meet the requirements of a Dedoose “project,” which is any group of related qualitative content. Because the data set was so large, however, I found that I needed to divide the data into five subprojects in order for Dedoose to accept the file size. While this task might seem mundane, it actually required a good deal of thought. I needed to make sure that each participant’s social media posts remained in the same subproject so that when I exported the data, I could accurately analyze codes associated with each participant without having to move data between files and thereby risk copy-paste errors that might improperly associate a particular post with the wrong participant. I met this challenge by exporting all social media posts from the CWS database to a text-delimited file, which I then imported into Excel. Using Excel, I sorted the data by each participant’s unique numerical identifier. To preserve the temporality of the posts, I added a column to indicate the week during which each post was collected, which I determined by referring to the date of the original post in the database. In response to Dedoose’s file size upload limit, I divided the posts into five roughly equal sections of approximately three thousand posts each, making sure to keep each participant’s set of posts within the same file. After the Excel files were prepared, I uploaded the files to their respective Dedoose projects. Dedoose, like Excel, presents data in columns, which in this case included all the data collected by the CWS app. I then proceeded to analyze the social media posts qualitatively. Because the posts were so numerous and contained so much rich data, I used three methods of coding. The first, a structural coding scheme,13 was adapted from John’s14 social logics of sharing framework, which identified three methods of sharing present in the networked communication environment: distributive, dividing goods or sharing information; communicative, sharing about one’s emotional state; and Web 2.0, using social media to spread discourse through technical means, such as retweeting. Because I was interested in the means of communication that artists use to share information about their work, I added a fourth category to the structural scheme: promotional. In addition to the broad structural scheme, I used descriptive codes to identify the content of the posts. Because participants used social media

217 S T I T C H I N G D ATA

to communicate personal information about themselves—sometimes in an effort to connect their personality to their products and other times seemingly simply to have a voice online—these included codes such as “art fairs,” “holidays,” “news/politics,” and “pets.” Finally, magnitudinal codes helped measure the degree to which a participant disclosed information about herself. I adapted Wheeless and Grotz’s15 scale for self-disclosure to ascertain the level of self-disclosure in each post. An example of a magnitudinal code is the rating of valence on a scale from −1 to 1, where −1 indicates a negative tone in the post, 0 indicates a neutral tone, and 1 indicates a positive tone. We used similar measures for the other dimensions of self-disclosure that Wheeless and Grotz identified: amount, depth, intent, and honesty. The simplicity of the Dedoose codebook empowered me to train undergraduate research assistants working with me on the project quickly. Because the software’s functionality includes the linkage of child and parent codes, we were able to apply multiple codes at once. That is, if the option is turned “on,” the parent codes for a child code are automatically assigned to the data, permitting relatively rapid coding and categorization. Additionally, when viewing and coding an excerpt in Dedoose, we could easily access, copy, and paste the permalink for the complete post into a browser in order to learn more about the context of that post. This functionality proved especially helpful when we were coding content from imagedominated platforms such as Instagram and Pinterest, where seeing the context of a post was especially useful. Finally, Dedoose allowed me to keep notes regarding my analysis via “memos” attached to individual excerpts, which I could then group accordingly. I used the “memo” function to make theoretical notes, to record questions, and also to note greater context for a particular post. After coding all 16,442 posts, I exported the data from Dedoose to Excel and calculated the percentage of each type of sharing the participants engaged in—distributive, communicative, Web 2.0, and promotional— during the course of the study. Using Stata, the software program I use for statistical analysis, I added these percentages as continuous variables in cells in the participants’ survey data file. This permitted me to analyze all of participants’ data—a combination of the survey responses, the sales data, and the social media posts—simultaneously.

218 S T I T C H I N G D ATA

Sales Data

Similarly, at the conclusion of the five-month period of online data collection, I exported sales records from each Saturday during the period of data collection as a text-delimited file. Because it turned out that most of my participants made relatively few sales (four participants sold nothing during the course of the study), I decided that weekly increases in sales would likely be the most granular amount of data necessary for future analysis. I then imported the file into Excel and prepared it to be imported into Stata. This preparation included replacing each participant’s unique identifier for the CWS app with her unique identifier for the survey, renaming variables, and identifying and appropriately replacing missing values, such as for the participant who closed her shop during the sixteenth week of the study. I then imported the data into Stata, where I generated variables to indicate each participant’s weekly and overall growth in sales. THE THIRD PHASE: INTERVIEWS

In the month following the conclusion of the second phase of the study, I conducted a series of semistructured interviews with twenty-five of the thirty-nine artists who had participated in the second phase. Doing so at this point of the study meant that participants’ social media posts and sales techniques would not be affected by the questions I asked, and, thus, we could have a frank conversation about their practices without my influencing their behavior during the rest of the study. This phase of the study proceeded relatively smoothly. I invited participants to an interview in an email. Once they agreed, I proposed several times that we could speak on the telephone or via Skype or Google Hangouts. We did not meet in person because participants were scattered broadly throughout the United States. Most participants elected to speak with me on the phone. The evening prior to the interview I emailed participants to remind them about the interview and verify their preferred method of contact, including their phone number and/or username. I also included an IRB-approved consent form for them to review prior to speaking with me. In preparation for the interview, I reviewed participants’ social media posts as well as any notes or memos I had made about patterns I noticed while collecting the data. Unfortunately, given that

219 S T I T C H I N G D ATA

I conducted the interviews in the month immediately following the conclusion of the second phase of the study, I could not construct any questions based on my complete analysis of the data. However, one benefit of manually collecting so many social media posts was that I had a deep familiarity with each participant and her work, so I was able to construct both broad questions and questions that were specific to the participant. At the scheduled time, I contacted the participant, and we spoke for a few minutes without the recording device activated. During this brief conversation, I described the interview procedure, told the participant her unique ID number, and gained her consent. After I turned on the recording device, I asked that the participant state her unique ID number and also repeat her verbal consent to participate in the interview as well as her preference to have quotes attributed to her or to have her identity remain confidential. I then followed a semistructured interview protocol that I modified according to what I observed about each participant’s creative work and use of e-commerce websites and social media platforms. I asked a variety of open-ended questions to elicit information and stories about each participant’s experiences and motivations. I often asked follow-up questions to draw out more detail and to gain clarity.16 I used two audio recorders for each interview and recharged their batteries following every use to make sure that neither lost power and that if one stopped recording, I would have a backup copy of the interview. During and immediately following each interview, I took notes about interesting insights I gained from the participant. I also immediately uploaded a master mp3 file to my desktop computer and saved a copy of that file to an external hard drive. In addition, I identified the file with the participant’s ID number and uploaded it to VerbalInk, a company that provides professional transcription services. VerbalInk transcribed the file and returned it to me via email within one week. Analysis

Together with my research assistants, I used Dedoose software to code the interview data structurally. The interview transcript files were much smaller than those storing the social media posts so I avoided the challenge of breaking the data up into smaller pieces, only to have to put it back together again. During the coding process, we identified excerpts of

220 S T I T C H I N G D ATA

interviews that spoke to the challenges and successes participants found selling their work online as well as the motivations, education, skills, and routines that enabled them to do so. Following an initial coding pass by a research assistant, I read each interview three times and coded it for stories or remarks about the variety of experiences independent artists have as they operate a creative business and use social media to promote it. As the expert coder in this phase of the project, I reviewed all of my research assistants’ work and adjusted codes as necessary. Because I did not conduct quantitative analysis of themes that appeared in the interviews, no intercoder reliability needed to be established. Rather, my work with the interviews was interpretive: I used the participants’ words to help me understand their context and to make arguments regarding why some of my more surprising findings might have occurred. STITCHING DATA: PUTTING IT ALL TOGETHER

Hopefully the above description of my project clearly communicates that planning, collecting, organizing, and analyzing the enormous amount of data I collected during my dissertation research was a gargantuan undertaking. The process was filled with seemingly minor decisions whose import I did not actually appreciate until the most difficult part of the process: finding a way to answer my driving question: Are differences in social media use related to differences in material consequences? Although I have constructed this chapter to emphasize how I gathered the pieces of the quilt that I eventually put together, I must also acknowledge that I embarked on several false starts and shed a few analytical strategies in order to make sense of it all. In the end, using a relatively simple bivariate analysis, I found that the top sellers were those who shared more than the group’s median amount of promotional content and less than the group’s median amount of communicative content.17 Interviews with participants revealed that the promotional strategy may be more effective because such content invites “clicks,” which, in turn, strengthen an item’s position in search results.18 This finding runs counter to rhetoric that argues for the importance of disclosing personal information in the sharing economy and suggests that good, old-fashioned advertising relates more closely to increased sales. Yet despite the relatively small sample of participants in the study, my dissertation’s somewhat surprising findings are bolstered by the care I took

221 S T I T C H I N G D ATA

at each step in the process to mitigate bias and error. I recruited broadly and deeply. I took stock of possible limitations and determined arguments that supported a rationale for accepting and even celebrating them. I collaborated with an expert to find clever technical solutions to solve the problem of collecting data from participants. I gathered these data over a long period of time—some participants enrolled in the study in October 2014 and remained in it through August 2015—which, in turn, allowed me to triangulate observed and self-reported information. Through patience, persistence, and the steady application of effort, a project that started with a personal interest and a hunch developed into something that required much more significant effort, much more than that required to stroll down a grassy aisle between craft booths on a breezy summer day. Although I didn’t know it while meandering amongst the booths and mired in vast swaths of data, my efforts would eventually be recognized and rewarded by my most admired colleagues and mentors: two years after I began my project, I won the Herbert S. Dordick Dissertation Award from my professional association, the Communication and Technology Division of the International Communication Association, and the graduate dissertation award from the School of Communication at Northwestern.

NOTES 1. B. Rogers, “The Social Costs of Uber,” University of Chicago Law Review Online 82, no. 1, art. 6 (2015). 2. D. Morgan, “Social Media Tips from an Etsy Expert.” Retrieved from https://www.etsy .com/seller-handbook/article/social-media-tips-from-an-etsyexpert/22423398853. Etsy Sellers Handbook (2014). 3. E.g., N. K. Baym, “Connect with Your Audience! The Relational Labor of Connection,” Communication Review 18, no. 1 (2015): 14–22, https://doi.org/10.1080/1071442 1.2015.996401; B. E. Duffy, “The Romance of Work: Gender and Aspirational Labour in the Digital Culture Industries,” International Journal of Cultural Studies 19, no, 4 (2016): 441–457, https://doi.org/10.1177/1367877915572186. 4. H. S. Becker, Art Worlds (Berkeley: University of California Press, 1984); V. L. Zolberg, Constructing a Sociology of the Arts (Cambridge: Cambridge University Press, 1990). 5. E.g., Y. Benkler, The Wealth of Networks: How Social Production Transforms Markets and Freedom (New Haven, CT: Yale University Press, 2006); N. A. John, “File Sharing and the History of Computing: Or, Why File Sharing Is Called ‘File Sharing,’ ” Critical Studies in Media Communication 31, no. 3 (2013): 198–211, https://doi.org/10.1080 /15295036.2013.824597; A. Wittel, “Qualities of Sharing and Their Transformations in the Digital Age.” International Review of Information Ethics 15, no. 9 (2011): 3–8.

222 S T I T C H I N G D ATA

6. E.g., E. Hargittai, “Second-Level Digital Divide: Differences in People’s Online Skills,” First Monday 7, no. 4 (2002), https://doi.org/10.5210/fm.v7i4.942. 7. W. H. Dutton and G. Blank, Next Generation Users: The Internet in Britain, Oxford Internet Survey 2011 Report (Oxford: Oxford Internet Institute, University of Oxford, 2011), https://doi.org/10.2139/ssrn.1960655; E. Hargittai and Y.-l. P. Hsieh, “Predictors and Consequences of Differentiated Practices on Social Network Sites,” Information, Communication and Society 13, no. 4 (2010): 515–536, https://doi .org/10.1080/13691181003639866; E. Hargittai and A. Shaw, “Mind the Skills Gap: The Role of Internet Know-How and Gender in Differentiated Contributions to Wikipedia,” Information, Communication and Society 18, no. 4 (2015): 424–442, https:// doi.org/10.1080/1369118X.2014.957711; E. J. Helsper, “Gendered Internet Use Across Generations and Life Stages,” Communication Research 37, no. 3 (2010): 352–374; A. Lenhart, J. Horrigan, and D. Fallows, “Content Creation Online,” Pew Internet and American Life Project, Pew Research Center, 2004, http://www.pewinternet .org/2004/02/29/content-creation-online/; M. Rosenberg, N. Confessore, and C. Cadwalladr, “How Trump Consultants Exploited the Facebook Data of Millions,” New York Times, March 17, 2018; J. Schradie, “The Digital Production Gap: The Digital Divide and Web 2.0 Collide,” Poetics 39, no. 2 (2011): 145–168, https://doi .org/10.1016/j.poetic.2011.02.003. 8. E. Hargittai, “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites.” The Annals of the American Academy of Political and Social Science 659, no. 1 (2015): 63–76. 9. G. Walejko, “Online Survey: Instant Publication, Instant Mistake, All of the Above,” in Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai (Ann Arbor: University of Michigan Press, 2010), 101–121; D. Williams and L. Xiong, “Herding Cats Online,” in Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai (Ann Arbor: University of Michigan Press, 2010), 122–140. 10. E. Klawitter, “Crafting a Market: How Independent Artists Participate in the Peer Economy for Handmade Goods” (PhD diss., Northwestern University, 2017), 47–48. 11. E. Hargittai and C. Karr, “WAT R U DOIN? Studying the Thumb Generation Using Text Messaging,” in Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai (Ann Arbor: University of Michigan Press, 2010), 192–216. 12. M. Rosenberg, N. Confessore, and C. Cadwalladr, “How Trump Consultants Exploited the Facebook Data of Millions,” New York Times, March 17, 2018. 13. J. Saldaña, The Coding Manual for Qualitative Researchers (Los Angeles: SAGE, 2015). 14. N. A. John, “File Sharing and the History of Computing: Or, Why File Sharing Is Called “File Sharing,’  ” Critical Studies in Media Communication  31, no. 3 (2014): 198–211. 15. L. R. Wheeless and J. Grotz, “Conceptualization and Measurement of Reported SelfDisclosure,” Human Communication Research 2, no. 4 (1976): 338–346, https://doi .org/10.1111/j.1468-2958.1976.tb00494.x. 16. S. J. Tracy, Qualitative Research Methods: Collecting Evidence, Crafting Analysis, Communicating Impact (Malden: Wiley-Blackwell, 2012).

223 S T I T C H I N G D ATA

17. Klawitter, “Crafting a Market”; E. Klawitter and E. Hargittai, “ ‘It’s Like Learning a Whole Other Language’: The Role of Algorithmic Skills in the Curation of Creative Goods,” International Journal of Communication 12 (2018): 3490–3510, https://doi : 1932–8036/20180005. 18. Klawitter and Hargittai, “ ‘It’s Like Learning a Whole Other Language.’ ”

REFERENCES Baym, N. K. “Connect with Your Audience! The Relational Labor of Connection.” Communication Review 18, no. 1 (2015): 14–22. https://doi.org/10.1080/10714421.2015.996401. Becker, H. S. Art Worlds. Berkeley: University of California Press, 1984. Benkler, Y. The Wealth of Networks: How Social Production Transforms Markets and Freedom. New Haven, CT: Yale University Press, 2006. Duffy, B. E. “The Romance of Work: Gender and Aspirational Labour in the Digital Culture Industries.” International Journal of Cultural Studies 19, no. 4 (2016): 441–457. https://doi.org/10.1177/1367877915572186. Dutton, W. H., and G. Blank. Next Generation Users: The Internet in Britain. Oxford Internet Survey 2011 Report. Oxford: Oxford Internet Institute, University of Oxford, 2011. https://doi.org/10.2139/ssrn.1960655. Hargittai, E. “Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites.” Annals of the American Academy of Political and Social Science 659, no. 1 (2015): 63–76. Hargittai, E. “Second-Level Digital Divide: Differences in People’s Online Skills.” First Monday 7, no. 4 (2002). https://doi.org/10.5210/fm.v7i4.942. Hargittai, E., and Y.-l. P. Hsieh. “Predictors and Consequences of Differentiated Practices on Social Network Sites.” Information, Communication and Society 13, no. 4 (2010): 515–536. https://doi.org/10.1080/13691181003639866. Hargittai, E., and C. Karr. “WAT R U DOIN? Studying the Thumb Generation Using Text Messaging.” In Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai, 192–216. Ann Arbor: University of Michigan Press. Hargittai, E., and A. Shaw. “Mind the Skills Gap: The Role of Internet Know-How and Gender in Differentiated Contributions to Wikipedia.” Information, Communication and Society 18, no. 4 (2015): 424–442. https://doi.org/10.1080/13691 18X.2014.957711. Helsper, E. J. “Gendered Internet Use Across Generations and Life Stages.” Communication Research 37, no. 3 (2010): 352–374. John, N. A. “File Sharing and the History of Computing: Or, Why File Sharing Is Called ‘File Sharing.’ ” Critical Studies in Media Communication 31, no. 3 (2013): 198–211. https://doi.org/10.1080/15295036.2013.824597. Klawitter, E. “Crafting a Market: How Independent Artists Participate in the Peer Economy for Handmade Goods.” PhD diss., Northwestern University, 2017: 47–48. Klawitter, E., and E. Hargittai. “ ‘It’s Like Learning a Whole Other Language’: The Role of Algorithmic Skills in the Curation of Creative Goods.” International Journal of Communication 12 (2018): 3490–3510.

224 S T I T C H I N G D ATA

Lenhart, A., J. Horrigan, and D. Fallows. “Content Creation Online.” Pew Internet and American Life Project. Pew Research Center, 2004. http://www.pewinternet .org/2004/02/29/content-creation-online/. Morgan, D. “Social Media Tips from an Etsy Expert.” Retrieved from https://www.etsy .com/seller-handbook/article/social-media-tips-from-an-etsyexpert/22423398853. Etsy Sellers Handbook (2014). Rogers, B. “The Social Costs of Uber.” University of Chicago Law Review Online 82, no. 1, art. 6 (2015). https://chicagounbound.uchicago.edu/uclrev_online/vol82/iss1/6. Rosenberg, M., N. Confessore, and C. Cadwalladr. “How Trump Consultants Exploited the Facebook Data of Millions.” New York Times, March 17, 2018. Saldaña, J. The Coding Manual for Qualitative Researchers. Los Angeles: SAGE, 2015. Schradie, J. “The Digital Production Gap: The Digital Divide and Web 2.0 Collide.” Poetics 39, no. 2 (2011): 145–168. https://doi.org/10.1016/j.poetic.2011.02.003. Tracy, S. J. Qualitative Research Methods: Collecting Evidence, Crafting Analysis, Communicating Impact. (Malden: Wiley-Blackwell, 2012). Walejko, G. “Online Survey: Instant Publication, Instant Mistake, All of the Above.” In Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai, 101–121. Ann Arbor: University of Michigan Press. Wheeless, L. R., and J. Grotz. “Conceptualization and Measurement of Reported SelfDisclosure.” Human Communication Research 2, no. 4 (1976): 338–346. https://doi .org/10.1111/j.1468-2958.1976.tb00494.x. Williams, D., and L. Xiong. “Herding Cats Online.” In Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai, 122–140. Ann Arbor: University of Michigan Press, 2010. Wittel, A. “Qualities of Sharing and Their Transformations in the Digital Age.” International Review of Information Ethics 15, no. 9 (2011): 3–8. Zolberg, V. L. Constructing a Sociology of the Arts. Cambridge: Cambridge University Press, 1990.

Chapter Eleven

A MEASUREMENT BURST STUDY OF MEDIA USE AND WELL-BEING AMONG OLDER ADULTS Logistically Challenging at Best MATTHIAS HOFER

People over the age of sixty are the fastest-growing demographic segment in Europe and North America. According to the United Nations’ 2017 World Population Prospects report, more than 30 percent of the population in Europe and North America will be over sixty years old by 2050. Currently, more than 20 percent of people in these regions are in that age group.1 Because this group represents such a significant portion of the population, it has been and will be the subject of much research. Social behavior is increasingly intertwined with digital media, so using methods that take advantage of and account for such behavior becomes ever more important. This chapter draws on a study that uses innovative so-called bursts to gather data on older adults’ media use and well-being. As people grow older, they have to deal with emotional, social, and cognitive losses. Their friends might die, their memory is not working as it used to, and all this may lead to negative emotions. From that perspective, aging seems to be something very unpleasant. However, research shows that when it comes to well-being, older people seem to be relatively welloff.2 That is, despite the aforementioned losses, well-being appears to be U-shaped over a person’s life span: as people grow older, they become happier. An individual’s well-being and mental health are key factors in the so-called process of healthy aging.3

226 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

The use of media—including television, radio, newspapers, and computers—is likely to play a major role in this process, as older adults as a demographic group spend a large amount of time using media.4 Thus, there might be a connection between media use and well-being among older adults simply because they spend so much time consuming media. Media use can help older adults structure their day, provide them with useful information about the environment in which they live, or even satisfy crucial intrinsic needs such as competence, autonomy, or relatedness.5 Thus, media use can have a profound impact on older adults’ wellbeing, making it important to study the connection between their media use and well-being to understand better the role media play in enhancing or diminishing their mental health. Indeed, much research has examined this relationship.6 There is a plethora of both experimental and survey studies examining the interplay between media use and well-being. Some studies look at the effect of well-being on media use,7 while others have examined the effect of media use on well-being.8 That is, some studies look at media choice depending on a person’s well-being, whereas other studies examine the effects of using media on an individual’s happiness. Despite considerable existing scholarship, my study has some unique features that are hard to find in previous research on media use and well-being, and, thus, it contributes a unique angle. These features make the study more exciting but also more challenging and certainly more expensive. I will come back to that last point later. First, let’s have a look at the unique features. The first feature of the study pertains to the population. In my project, my team of two part-time research assistants and I studied older adults’ media use and well-being. In communication research, this demographic group is still underresearched.9 It is certainly considerably easier to conduct online experiments with students who have to participate in order to get necessary course credits. Older adults require more effort on the part of the researcher. I will discuss this crucial point in more detail shortly. The study’s second unique feature is that we used ecological momentary assessment (EMA).10 Most (if not all) studies in the realm of media use and well-being use either cross-sectional survey designs or experimental approaches to study these phenomena. A few studies use longitudinal survey designs.11 However, most previous research cannot say anything about temporal developments of media use and well-being in the short term.

227 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

EMA is a method of assessing behaviors and experiences—such as wellbeing or media use—in the contexts in which they occur (i.e., in people’s daily lives). Thus, results from such studies have a higher ecological validity than, say, experimental studies.12 More precisely, we use two different varieties of EMA: continuous and passive registration of television and radio use and a fixed time-based approach to assess different forms of well-being and some contextual information throughout the day. In addition to the continuous passive registration of radio and television use, we ask participants about their media use at fixed times. And while this particular project focuses on older adults, such methods can be applied to other population groups as well.13 Usually, EMAs are conducted using cell phones. Survey software is installed on these phones, and participants answer the survey questions on a daily basis or even several times a day as they go about their everyday lives in their natural environment. Researchers could make use of the participants’ own cell phones. However, for researchers to maintain a certain amount of control, it can be helpful for participants to utilize cell phones that are given to them by the project. This, of course, comes with both financial and logistical costs, which I will address later. The design of this study is a hybrid between such an EMA study and a traditional longitudinal study with measurement over relatively long periods (years or months): a measurement burst study (MBS). Bursts are daily measurements (or even multiple daily measurements) over the course of several days. They are repeated after longer periods (months or even years).14 Figure 11.1 summarizes the design of my study. Before each burst, we conducted an in-person paper-and-pencil baseline survey in which we asked participants about their personality, their living situation, and their general media use. For instance, their media genre preferences and favorite television and radio stations were assessed in this baseline survey.

5–6 months

Burst 1

Daily assessments 1

2

3

4

5–6 months

Burst 2

Daily assessments 5

1

2

3

FIGURE 11.1 Study design of the measurement burst study.

4

Burst 3

Daily assessments 5

1

2

3

4

5

228 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

Measurement burst designs are used to study both fine-grained shortterm variability within bursts and long-term developments across bursts. Such long-term developments are not only developments in levels but also developments of the variability among persons. In other words, using a measurement burst design, we can answer questions like the following: • Is higher daily media use associated with greater or lesser well-being? • Do people who live alone use more media in general than those who cohabit, and how is their daily well-being connected to their daily media use? • What is the role of personality in the interplay of media use and well-being among older adults? • Can people’s variability in well-being and media use in the course of one week explain long-term changes in well-being or media use (i.e., over the course of several months)?

This all sounds well and good. However, my aim in this chapter is not to reproduce what is already discussed in theoretical, empirical, and methodological papers.15 Rather, I go into the nitty-gritty of such an intense study with the hope of encouraging others to consider using such methods. I describe the little details and decisions researchers very rarely (if ever) talk about, let alone write down. Fortunately, some people before me have already shared some particulars: Connor and Lehman have authored a very useful chapter on how to get started with an EMA study like ours.16 It appears in the “bible” for researchers conducting EMAs or studies with a similar design17 and takes the researcher through the entire process of setting up an EMA.18 What Connor and Lehman have written is also applicable to our MBS. In the remainder of this chapter, as a first step I take those authors’ advice and guidelines and then, in a second step, explain how the research team translated their advice into our research project, which problems and pitfalls we had to deal with, and what other issues we came across during the research project. RESEARCH QUESTION, TARGET VARIABLES, AND POPULATION Research Question

The first and most important step in conducting a study with intense measurements (or any other empirical research project) is to “determine the

229 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

research question, the target variables . . . , and the population of interest.”19 This is straightforward for our project: How are media use and well-being of older adults connected in both the long and the short term?

Target Variables

The target variables are obviously media use and well-being. However, at this point, things are already starting to get tricky, as both media use and well-being are very broad concepts. This might be apparent when it comes to media use: people read newspapers, surf the internet, watch television, and listen to the radio. Even if they have certain media turned on, they may not be actually listening or actively watching—maybe the radio or the television is just on in the background. When it comes to media use (and effects), the question of involvement or attention is crucial but hard to tackle. Measuring media use (or media exposure) is even trickier than defining it as a concept.20 The first approach is, of course, to ask people about their media use, but such self-reports are often subject to recall problems and bias, such as over- or underestimation of media use.21 In other words, people don’t remember how long they watched television, read the newspaper, or browsed the internet. They also tend to downplay media use that they consider to be less socially acceptable (e.g., “No, I did not binge an entire season of The Apprentice last night!”). Thus, measuring media use by asking people about it comes with a general lack of validity and reliability.22 To address this shortcoming, we chose to assess people’s media use with a passive and continuous measurement approach. More precisely, participants were equipped with the Mediawatch, made by the Swiss company Mediapulse AG. Every twenty seconds this watch registered environment noise for four seconds. These audio data—which are converted into a nonaudible format because of privacy concerns—were then compared against a reference database of the radio and TV programs available in Switzerland. For instance, if someone was listening to a Swiss radio station between 10:15 AM and 11:30 PM, this would show in my data. We would know the station, and we would know how long this person had listened to that station. However, here is a problem: How can we tell whether this person had really listened to the program on the Swiss radio station or whether they

230 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

had just turned the radio on without paying much attention to the program? The sad truth is that we can’t. To address this, in addition to measuring media use passively, we asked participants about their media use. More precisely, we asked them three times a day whether they watched television or listened to the radio and, if they had, for how long they had done so. This might give us some kind of cross validation. For instance, if someone’s watch recorded four hours of radio use and the person said that they listened to the radio for only two hours, we had some indication that the watch had overestimated the subjective media experience of that person. When it comes to measuring well-being, things get even more complicated. If well-being is not regarded as being objectively healthy or wealthy, it is conceptualized as a subjective experience. Hedonic well-being is defined as the presence of positive affect and life satisfaction and the absence of negative affect and life satisfaction.23 However, there are other forms of well-being—namely, psychological (or eudaimonic)24 and social wellbeing.25 According to Ryff, psychological well-being is a multidimensional concept consisting of six dimensions: (1) personal growth, (2) positive relations with others, (3) competence, (4) self-acceptance, (5) purpose in life, and (6) environmental mastery. Similarly, Ryan and Deci26 conceive psychological well-being as a result of the satisfaction of three intrinsic needs: (1) competence, (2) autonomy, and (3) relatedness. Keyes27 introduced the concept of social well-being, which is the wellbeing of a person within a community or a society. It is also conceived of as a multidimensional concept consisting of five dimensions: (1) social integration (the quality of a person’s relationship to their community and society), (2) social acceptance (a generally positive view of other individuals or groups), (3) social contribution (the feeling of being valued by and contributing to others’ lives), (4) social actualization (the belief that one’s group or society as a whole can progress in a positive way), and (5) social coherence (the recognition of the complexity of the social world).28 In the project, we had to decide whether to take into account all of these forms of well-being or to focus on one specific form. We went for the former. After all, using media can—theoretically—affect all of the referenced forms of well-being. For instance, if a person reads the newspaper every day, their sense of personal growth or autonomy may improve, but also their life satisfaction or—depending on the content—their sense of social

231 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

acceptance could increase or decrease. Therefore, we decided to assess all three forms of well-being: hedonic, psychological, and social. Researchers have developed scales to measure all these varieties of wellbeing. However, these scales might not be useful for daily assessments, as they provide general evaluations of a person’s life. Accordingly, we had to transform the scales in a way that made them suitable for daily measurements. For example, an item that is used to assess autonomy as part of psychological well-being reads as follows: “My decisions are not usually influenced by what everyone else is doing.”29 Since we wanted to ask participants each day about their feeling of autonomy, we transformed this item into a form that made it suitable for daily use: “Today, my decisions were influenced by what everyone else was doing.” Of course, the proper way of doing such transformations is first to formulate the scale and then to validate it with multiple large samples. However, both time and money are limited goods in a research project. We therefore decided to use these items without a proper psychometric evaluation. We did, however, discuss the scales with experienced colleagues to get other perspectives, and they supported our use of the modified measures. Population

Let’s turn to the target population. Conner and Lehman30 consider the selection of a target population a crucial step in an intensive longitudinal research project. This selection has far-reaching consequences for recruitment, measurement, costs, and implementation of the study. Accordingly, Conner and Lehman31 write: “Important sample considerations include whether or not all potential participants have consistent Internet access, sufficient comfort with technology, and strong verbal skills, as well as whether they can be trusted to follow protocols and care for equipment.” In our project, we studied older adults aged sixty and over. As mentioned earlier, conducting research with students who have to participate in studies to get their course credits is relatively easy. You just have to set up the study, announce it, and then wait for the students to participate—in a lab, an online experiment, or an online survey. Even for an intensive study in which daily measurements are taken (the aforementioned EMA), younger adults or adolescents are relatively easy to deal with because most, if not all, younger adults or adolescents are familiar with using smartphones,

232 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

whereas older adults often are not that familiar with using new technologies.32 Accordingly, relying on technology for data gathering can be harder with older adults. Recruitment

The first thing that is both more time consuming and more expensive than when working with a student sample is the recruitment of older participants. More precisely, we had to ask ourselves this: How could we make our research project visible to possible older participants, and how could we get them to consider participating in our study? We could not simply post an ad on a ready-made platform like we would do when looking for student participants. I was lucky enough to have been granted three part-time research assistants to support me in my research. I asked one of them to compile a list of all senior clubs in the larger area of Zurich, Switzerland, the location of my data collection. She sent me a list with about forty senior clubs and organizations. We started writing to these organizations and asking them if we could explain my project to their members. Only three out of the forty responded to our inquiries. Unfortunately, even those three did not work out, since after hearing more details about the project, they told us that they did not think that this was of interest to them. This was a rather discouraging development, as a study cannot exist without participants. We had to come up with a new plan, and we had to do it quickly because our data collection logistics were ready to go and my research time line was progressing. Fortunately, I thought about recruitment while I was writing the research proposal for my project, so I had a backup approach ready to go. I had requested money for both recruitment of and incentives for participants. Instead of relying on intermediaries to find participants, we decided to try to reach them directly by advertising our project in one of the publications with the highest circulation in Switzerland: Migros Magazin (migrosmagazin.ch), delivered for free to a substantial portion of the population. This method had worked well in another one of our studies targeting older adults. The ad was quite expensive, but it was worth it thanks to the responses we received. Additionally, we contacted local newspapers in the larger area of Zurich. For one of them, we could even contribute a short article about the project, which was then printed along with my ad.

233 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

And then the potential participants sent us emails and called. As suggested by Conner and Lehman,33 I installed a phone number specifically for this study. One of my research assistants was entrusted with answering that phone. After another two months, we had a sample of 120 older adults who registered interest in participating in my study. These potential participants needed more information. Communicating with Participants

To inform participants about the aims of the study, the schedule, the incentives, and their rights and duties, we put together a folder with extensive information about the project. In this folder, we also had to include an informed consent form that participants had to sign. This form, along with all measures, had to be submitted to the Institutional Review Board of my university (the University of Zurich). Because about 10 percent of the potential participants did not have an email address, we had to send them the folder by traditional mail, which is, of course, more costly and requires more time and paper. Once we got the signed informed consent forms back from around 87 percent of those to whom we reached out, we had to start planning the schedule of the study. The MBS consisted of five days of intensive measurement. We schedule three five-day sessions, each separated by a six-month break. However, it would have been logistically impossible to conduct the first burst with all participants at once given that we needed to hold informational sessions to explain the technology used and the procedures, which we were doing in person in groups of ten to twenty participants. Therefore, we had to conduct the first burst (including the introduction sessions) over the course of not one but three weeks. Once we had set up the schedule and sent out the information (some again by traditional mail), we held the introduction sessions at the university. The first thing we did in these sessions was explain the aim and the purpose of the research project. That is, we explained the theoretical background and the study idea in layman’s terms. After that, we introduced the participants to the technology they were about to use (i.e., the cell phone and the Mediawatch). At the end of the session, we administered the baseline survey. We decided to use paper-and-pencil questionnaires because some people might not be comfortable using a computer to fill

234 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

out a survey.34 This decision, of course, meant that we had to code the questionnaires by hand, which is failure prone,35 but since we were doing it in-house, we did have control over its quality. We also screened participants for dementia. This was a bit of a challenge because the questionnaire we administered for that purpose (the Mini Mental State Examination)36 contains questions that might seem a little offensive to participants (e.g., “What town, county, and state are we in?”). Thus, my research assistants and I had to be very careful not to offend any of the participants. We had to explain that the test might be too easy but that we had to conduct it in order to be able to publish the work in scientific journals. This then yielded the question of what a scientific journal is and so on. In the end, none of the participants expressed having been offended in any way, suggesting that we had been successful in explaining the purpose of the test. After this dementia test, we explained the measurement equipment (i.e., the cell phone and the Mediawatch) and detailed every single questionnaire item (measuring all different forms of well-being and media use) that we would measure on a daily basis in order to make sure that everyone had the same understanding of each item. We were confronted with a wide range of digital skills. That is, whereas some participants were used to using new technologies, others seemed to be afraid of doing something wrong because they were not at all familiar with using digital technologies. Finally, participants were given the incentives (150 Swiss francs) and all transportation costs. We also gave participants a padded envelope with a postage stamp so that they could send back the cell phone and the Mediawatch. We received every single Mediawatch and cell phone back. Sampling Strategy

The next point to consider is the sampling strategy concerning the measurements within the bursts. The sampling strategy is the frequency and the timing of observations or assessments of the variables in question.37 As Seifert, Hofer, and Allemand38 write, “Real-time also means right on time; in other words, researchers have to carefully determine whether they are collecting data about the most relevant variables at the most appropriate moments and at ideal time intervals.” What is the best time of day to measure well-being? When should I ask participants how long they have watched television since the last time I asked them? And so on.

235 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

When it comes to television and radio use, these questions are easily answered: as mentioned previously, television and radio use was to be recorded continuously throughout the day using the Mediawatch. Regarding well-being, we decided to use a fixed time-based approach.39 Participants were given a cell phone equipped with movisens (see the next section), the software that runs on this phone and prompts participants to fill out a questionnaire three times a day (9 AM, 2:30 PM, and 9 PM) over the course of five days (Monday to Friday). One might ask why we chose these time points and not others. We discussed this question at length in team meetings. There is no accepted standard on this as of yet, although others have also had to make similar choices.40 Here is our reasoning. By 9 AM, everyone is likely to be awake, and some people might already have used or be using media. Others might be about to use media because they just got up. Thus, 9 AM seemed to be a good time to ask about both media use and well-being (remember that the Mediawatch registers television and radio use continuously). We picked 2:30 PM thinking that this is in the middle of the day without coinciding with lunch. It would also be after the noon news is over. Thus, variables that are measured at this time are likely to be affected by the noon news. We scheduled the last prompt for 9 PM because most people are still awake then, the evening news shows are over, and the prime-time movie is still on. The latter could be a problem because respondents would be asked to answer questions in the middle of a movie. But waiting until later could be an issue as well, since by then people may have gone to bed. As my mom, who is similar in age to my respondents, noted to me: “[9 PM is fine,] but don’t do it later, I wouldn’t want to get disturbed by that phone while falling asleep!” In the end, these times worked out well, as respondents tended to fill out the surveys. Technology Platform

Another important thing to consider is the technology platform. “The choice of technology platform reflects a tradeoff between cost, complexity, and control.”41 I chose the platform movisens (https://www.movisens.com) for two reasons. First, the platform is already in use and—most importantly— paid for by the university research program with which I am affiliated, so I incurred no additional costs from using it. Second, because the platform is

236 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

already used at the university, we could always ask someone for assistance if we encountered any issues. When it comes to control, the platform has some very useful features but also some very annoying ones. Movisens is very useful in that it allows the whole questionnaire for multiple days to be programmed and loaded onto a cell phone with very little effort and very few technical skills. The phone alerts participants to complete the questionnaire at predefined times (the fixed time-based approach discussed earlier), and their answers are then saved on the cell phone and submitted to the server as soon as the phone has an internet connection. However, the software also has some not-so-great features: First, the researcher cannot randomize the questions. This is a serious drawback, since order effects can bias the results of a survey. Second, the graphical user interface of the platform where researchers create their questionnaires is not particularly user friendly and is rather nonintuitive—the complexity of its use is therefore rather high. All that said, I will not complain too much, since using it allowed me to save around 10,000 Swiss francs (about US$10,000). It is also worth noting that movisens’ support center is exceptional. As the preceding discussion of the research question, target variables, population, sampling strategy, and technology platform shows, a study of this sort requires complex planning and implementation. There was one other aspect of my research project that I considered important, although it was not necessary and not often implemented: involving participants in the research process. We wanted to learn how respondents experienced the study and whether there were any issues regarding the burst sampling strategy or the technology platform that would need to be addressed. Research is an ongoing process that constantly needs improvement. I believe that working together with participants can be very helpful in this regard— especially when it comes to EMA. Involving participants in the research project is called participatory research, which I discuss next. PARTICIPATORY RESEARCH

Jagosh and colleagues42 define participatory research as “the coconstruction of research through partnerships between researchers and people affected by and/or responsible for action on the issues under study.” I did not involve participants in the construction of the measurement

237 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

instruments or the design of the study. This would have inflated the budget of our study considerably, and I am not sure what benefit would have come out of involving participants at such an early stage of a research project, since most would not have the requisite academic background to address specifics. We did, however, hold voluntary feedback sessions after the first burst. In these sessions, we wanted to find out what problems participants encountered, whether the daily questions made sense to them, and whether the times of the survey response requests were appropriate. About half (52 percent) of all participants took part in these feedback sessions. In these face-to-face discussions, the majority of the participants told us that they were fine with these assessment times. We also wanted to know whether participants thought we should have asked additional questions in the daily survey—that is, whether they thought we had overlooked any important aspects of either media use or well-being. One of the participants noted that she reads a lot of books. Another participant asked why we did not consider asking people about movie theater visits. After all, going to the movies is media use, too. We had completely left out these questions and decided to include them in the next burst. We retained all the existing questions and did not change anything in the wording. Altogether, these sessions were very useful because they gave us helpful hints not only about what to change but also about what to leave as it was. Finally, such participatory sessions (along with the incentives) are a good means of keeping participants engaged in the study. At the same time, such participatory sessions increase the cost and the logistical effort of a study. People have to be sent invitations, their travel expenses have to be covered, rooms have to be booked for holding the sessions, staff have to be on hand to host them, and so on. However, I believe such sessions are worth the effort. It is rare that researchers have the opportunity to be in such close contact with their study participants so why not take advantage? OTHER CHALLENGES

To end this chapter, I would like to turn to a serious issue I came across during my study: measurement reactivity, defined as “the systematically biasing effects of instrumentation and procedure on the validity of one’s data.”43 In other words, people behave, feel, or think a certain way because

238 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

of the questions they are being asked in a study. For instance, Barta and colleagues44 mention the so-called guinea pig effect to “refer specifically to the increased self-consciousness and evaluation anxiety that may arise in research contexts.” Simply put, the problem of measurement reactivity is that participants of a study change their thoughts, their feelings, or even their behavior because their thoughts, their feelings, and their behavior are being measured. Translated into my research project, people’s well-being and media use may change because of the ongoing measurement of their well-being and media use. For instance, a female participant told me during one of the feedback sessions that she got very nervous each time the alarm of the cell phone went off to remind her she had to fill out the questionnaire. Unfortunately, the first question was about her momentary affect. This measure also contained a question about how nervous (versus calm) the person feels at that specific moment.45 In this instance, this posed a problem because the person felt nervous specifically due to the measurement—i.e., as a result of the notification about data collection she received immediately before the data collection. Another participant shared that since participating in the study, he had reduced his time spent watching television once he realized how much time he was wasting (his words) staring at the screen. These cases were paragons of measurement reactivity that needed to be addressed, and we tackled the problem in a few ways. First, we measured more days than we needed. That is, we mentioned earlier that each participant’s well-being and media use were assessed over the course of one week (i.e., Monday to Friday). However, we made sure that each participant got the measurement equipment and the introduction at least two days before the actual measurement started. This, we thought, would limit measurement reactivity to a certain degree because participants could become familiar with the fact that they were being asked questions about their well-being. Second, we explicitly told participants that they should not change any of their behavior despite the unusual situation. We know that this is a rather weak means of preventing people from reacting to the measurement. However, at least we made it explicit. An additional approach that we had not yet considered at the time of the first burst was to start each questionnaire with one or two filler questions that are not relevant to the research topic. The purpose of these filler questions, implemented in subsequent

239 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

measurement bursts, was to calm down the subjects who had any immediate reactions to receiving the questionnaire prompt and to give them a few moments to get into an “answering mood.” Despite these attempts at addressing measurement reactivity, it remains an issue that is hard (if not impossible) to tackle. Fortunately or unfortunately, psychologists are not alone in facing this problem. If you are familiar with Schrödinger’s cat, you know that quantum physicists also have to deal with measurement reactivity. While foolproof solutions may not be available to address such issues, being aware of and mitigating them is nonetheless important. CONCLUDING REMARKS

My hope is that this chapter encourages other researchers to incorporate measurement burst studies into their projects so that they can formulate theories based on multiple data sets. I hope that the obstacles I described have not discouraged fellow researchers but, rather, have encouraged them to dive into this exciting new way of collecting data by recognizing up front both the benefits and the challenges of such an approach.

NOTES 1. United Nations, Department of Economic and Social Affairs, World Population Prospects: The 2017 Revision, Key Findings and Advance Tables (New York: United Nations, 2017). 2. D. G. Blanchflower and A. J. Oswald, “Well-Being Over Time in Britain and the USA,” Journal of Public Economies 88, no. 7 (2004): 1359–1386, https://doi,org/10.1016 /S0047-2727(02)00168-8; D. G. Blanchflower and A. J. Oswald, “Is Well-Being U-shaped Over the Life Cycle?,” Social Science and Medicine 66, no. 8 (2008): 1733–1749, https://doi.org/10.1016/j.socscimed.2008.01.030; S. T. Charles and L. L. Carstensen, “Social and Emotional Aging,” Annual Review of Psychology 61 (2010): 383–409, https://doi.org/10.1146/annurev.psych.093008.100448. 3. J. W. Rowe and R. L. Kahn, “Successful Aging,” The Gerontologist 37, no. 4 (1997): 433–440, https://doi.org/10.1093/geront/37.4.433. 4. M.-L. Mares, M. B. Oliver, and J. Cantor, “Age Differences in Adults’ Emotional Motivations for Exposure to Films,” Media Psychology 11, no. 4 (2008): 488–511, https:// doi.org/10.1080/15213260802492026; M.-L. Mares and Y. Sun, “The Multiple Meanings of Age for Television Content Preferences,” Human Communication Research 36 (2010): 372–396, https://doi.org/10.1111/j.1468-2958.2010.01380.x; M. van der Goot, J. W. Beentjes, and M. van Selm, “Older Adults’ Television Viewing from a Life-Span

240 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

Perspective: Past Research and Future Challenges,” in Communication Yearbook 30, ed. S. C. Beck (Mahwah, NJ: Erlbaum, 2006), 431–469. 5. D. Gergle and E. Hargittai, “A Methodological Pilot for Gathering Data Through Text-Messaging to Study Question-Asking in Everyday Life,” Mobile Media and Communication 6, no. 2 (2018): 197–214, https://doi.org/10.1177/2050157917741333. 6. For an overview, see L. Reinecke and M. B. Oliver, eds., The Routledge Handbook of Media Use and Well-Being (New York: Routledge, 2016). 7. D. Zillmann, “Mood Management Through Communication Choices,” American Behavioral Scientist 31, no. 3 (1988): 327–340, https://doi.org/10.1177/000276488031003005. 8. L. Reinecke, “Games and Recovery: The Use of Video and Computer Games to Recuperate from Stress and Strain,” Journal of Media Psychology: Theories, Methods, and Applications 21, no. 3 (2009): 126–142, https://doi.org/10.1027/1864-1105.21.3.126; L. Reinecke, J. Klatt, and N. C. Krämer, “Entertaining Media Use and the Satisfaction of Recovery Needs: Recovery Outcomes Associated with the Use of Interactive and Noninteractive Entertaining Media,” Media Psychology 14, no. 2 (2011): 192–215, https://doi.org/10.1080/15213269.2011.573466. 9. Zillmann, “Mood Management”; S. R. Cotten et al., “Internet Use and Depression Among Retired Older Adults in the United States: A Longitudinal Analysis,” Journals of Gerontology: Series B 69, no. 5 (2014): 763–771, https://doi.org/10.1093/geronb /gbu018; S. R. Cotten, W. A. Anderson, and B. M. McCullough, “Impact of Internet Use on Loneliness and Contact with Others Among Older Adults,” Journal of Medical Internet Research 15, no. 2 (2013): e39, https://doi.org/10.2196/jmir.2306; M. J. Sliwinski, “Measurement-Burst Designs for Social Health Research,” Social and Personality Psychology Compass 2, no. 1 (2008): 245–261, https://doi.org/10.1111/j.1751 -9004.2007.00043.x. 10. T. S. Conner and B. S. Lehman, “Getting Started: Launching a Study in Daily Life,” in Handbook of Research Methods for Studying Daily Life, ed. M. R. Mehl and T. S. Conner, 89–107 (New York: Guilford Press, 2014); D. S. Courvoisier et al., “Psychometric Properties of a Computerized Mobile Phone Method for Assessing Mood in Daily Life,” Emotion 10, no. 1 (2010): 115–124, https://doi.org/10.1037/a0017813. 11. Cotten et al., “Internet Use and Depression”; Cotten, Anderson, and McCullough, “Impact of Internet Use.” 12. Sliwinski, “Measurement-Burst Designs.” 13. E. Hargittai and C. Karr, “WAT R U DOIN? Studying the Thumb Generation Using Text Messaging,” in Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai (Ann Arbor: University of Michigan Press, 2009), 192–216. 14. Sliwinski, “Measurement-Burst Designs”; M. J. Sliwinski et al., “Intraindividual Change and Variability in Daily Stress Processes: Findings from Two Measurement-Burst Diary Studies,” Psychology and Aging 24, no. 4 (2009): 828–840, https://doi.org /10.1037/a0017925. 15. Conner and Lehman, “Getting Started.” 16. Conner and Lehman, “Getting Started.” 17. M. R. Mehl and T. S. Conner, eds., Handbook of Research Methods for Studying Daily Life (New York: Guilford Press, 2014). 18. Cotten et al., “Internet Use and Depression.” 19. Cotten et al., “Internet Use and Depression.”

241 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

20. C. de Vreese and P. Neijens, “Measuring Media Exposure in a Changing Communications Environment,” Communication Methods and Measures 10, no. 2–3 (2016): 69–80, https://doi.org/10.1080/19312458.2016.1150441; J. Ohme, E. Albaek, and C. H. de Vreese, “Exposure Research Going Mobile,” Communication Methods and Measures 10, no. 2–3 (2016): 135–148, https://doi.org/10.1080/19312458.2016.1150972. 21. de Vreese and Neijens, “Measuring Media Exposure”; Ohme, Albaek, and de Vreese, “Exposure Research.” 22. de Vreese and Neijens, “Measuring Media Exposure.” 23. E. Diener, S. Oishi, and L. Tay, “Advances in Subjective Well-Being Research,” Nature Human Behavior 2, no. 4 (2018): 253–260, https://doi.org/10.1038/s41562-018-0307-6. 24. R. M. Ryan, V. Huta, and E. L. Deci, “Living Well: A Self-Determination Theory Perspective on Eudaimonia,” Journal of Happiness Studies 9, no. 1 (2008): 139–170, https://doi.org/10.1007/s10902-006-9023-4; C. D. Ryff, “Happiness Is Everything, or Is It?,” Journal of Personality and Social Psychology 57, no. 6 (1989): 1069–1081, https:// doi.org/10.1037/0022-3514.57.6.1069. 25. C. L. M. Keyes, “Social Well-Being,” Social Psychology Quarterly 61 (1998): 121–140. 26. R. M. Ryan and E. L. Deci, “On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being,” Annual Review of Psychology 52 (2001): 141–160, https://doi.org/10.1146/annurev.psych.52.1.141. 27. Keyes, “Social Well-Being.” 28. Keyes, “Social Well-Being.” 29. Ryff, “Happiness Is Everything.” 30. Conner and Lehman, “Getting Started.” 31. Conner and Lehman, “Getting Started.” 32. A. Seifert, M. Hofer, and M. Allemand, “Mobile Data Collection: Smart, but Not (Yet) Smart Enough,” Frontiers in Neuroscience 12 (2018): 971, https://doi.org/ 10.3389 /fnins.2018.00971. 33. Conner and Lehman, “Getting Started.” 34. A. Weigold et al. “Equivalence of Paper-and-Pencil and Computerized Self-Report Surveys in Older Adults,” Computers in Human Behavior 54 (2016): 407–413, https:// doi.org/10.1016/j.chb.2015.08.033. 35. Conner and Lehman, “Getting Started.” 36. S. T. Creavin et al., “Mini-Mental State Examination (MMSE) for the Detection of Dementia in Clinically Unevaluated People Aged 65 and Over in Community and Primary Care Populations,” Cochrane Database of Systematic Reviews, no. 1 (January 2016), https://doi.org/10.1002/14651858.CD011145.pub2. 37. Conner and Lehman, “Getting Started.” 38. Seifert, Hofer, and Allemand, “Mobile Data Collection,” 2–3. 39. Conner and Lehman, “Getting Started.” 40. Gergle and Hargittai, “A Methodological Pilot.” 41. Conner and Lehman, “Getting Started,” 98. 42. J. Jagosh et al., “Uncovering the Benefits of Participatory Research: Implications of a Realist Review for Health Research and Practice,” Milbank Quarterly 90, no. 2 (2012): 311, https://doi.org/10.1111/j.1468-0009.2012.00665.x. 43. W. D. Barta, H. Tennen, and M. D. Litt, “Measurement Reactivity in Diary Research,” in Handbook of Research Methods for Studying Daily Life, ed. M. R. Mehl and T. S. Conner (New York: Guilford Press, 2014).

242 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

44. Barta, Tennen, and Litt, “Measurement Reactivity,” 109. 45. P. Wilhelm and D. Schoebi, “Assessing Mood,” European Journal of Psychological Assessment 23, no. 4 (2007): 258–267, https://doi.org/10.1027/1015-5759.23.4.258.

REFERENCES Barta, W. D., H. Tennen, and M. D. Litt. “Measurement Reactivity in Diary Research.” In Handbook of Research Methods for Studying Daily Life, ed. M. R. Mehl and T. S. Conner, 108–123. New York: Guilford Press, 2014. Blanchflower, D. G., and A. J. Oswald. “Is Well-Being U-shaped Over the Life Cycle?” Social Science and Medicine 66, no. 8 (2008): 1733–1749. https://doi.org/10.1016/j .socscimed.2008.01.030. Blanchflower, D. G., and A. J. Oswald. “Well-Being Over Time in Britain and the USA.” Journal of Public Economies 88, no. 7 (2004): 1359–1386. https://doi.org/10.1016 /S0047-2727(02)00168-8. Charles, S. T., and L. L. Carstensen. “Social and Emotional Aging.” Annual Review of Psychology 61 (2010): 383–409. https://doi.org/10.1146/annurev.psych.093008.100448. Conner, T. S., and B. S. Lehman. “Getting Started: Launching a Study in Daily Life.” In Handbook of Research Methods for Studying Daily Life, ed. M. R. Mehl and T. S. Conner, 89–107. New York: Guilford Press, 2014. Cotten, S. R., W. A. Anderson, and B. M. McCullough. “Impact of Internet Use on Loneliness and Contact with Others Among Older Adults.” Journal of Medical Internet Research 15, no. 2 (2013): e39. https://doi.org/10.2196/jmir.2306. Cotten, S. R., G. Ford, S. Ford, and T. M. Hale. “Internet Use and Depression Among Retired Older Adults in the United States: A Longitudinal Analysis.” Journals of Gerontology: Series B 69, no. 5 (2014): 763–771. https://doi.org/10.1093/geronb/gbu018. Courvoisier, D. S., M. Eid, T. Lischetzke, and W. H. Schreiber. “Psychometric Properties of a Computerized Mobile Phone Method for Assessing Mood in Daily Life.” Emotion 10, no. 1 (2010): 115–124, https://doi.org/10.1037/a0017813. Creavin, S. T., S. Wisniewski, A. H. Honel-Storr, C. M. Trevelyan, T. Hampton, D. Rayment, V. M. Thom et al. “Mini-Mental State Examination (MMSE) for the Detection of Dementia in Clinically Unevaluated People Aged 65 and Over in Community and Primary Care Populations.” Cochrane Database of Systematic Reviews, no. 1 (January 2016). https://doi.org/10.1002/14651858.CD011145.pub2. de Vreese, C. H., and P. Neijens. “Measuring Media Exposure in a Changing Communications Environment.” Communication Methods and Measures 10, no. 2–3 (2016): 69–80. https://doi.org/10.1080/19312458.2016.1150441. Diener, E., S. Oishi, and L. Tay. “Advances in Subjective Well-Being Research.” Nature Human Behavior 2, no. 4 (2018): 253–260. https://doi.org/10.1038/s41562-018-0307-6. Gergle, D., and E. Hargittai. “A Methodological Pilot for Gathering Data Through TextMessaging to Study Question-Asking in Everyday Life.” Mobile Media and Communication 6, no. 2 (2018): 197–214. https://doi.org/10.1177/2050157917741333. Hargittai, E., and C. Karr. “WAT R U DOIN? Studying the Thumb Generation Using Text Messaging.” In Research Confidential: Solutions to Problems Most Social Scientists Pretend They Never Have, ed. E. Hargittai, 192–216. Ann Arbor: University of Michigan Press, 2009.

243 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

Jagosh, J., A. C. Macaulay, P. Pluye, J. Salsberg, P. L. Bush, J. Henderson, E. Sirett et al. “Uncovering the Benefits of Participatory Research: Implications of a Realist Review for Health Research and Practice.” Milbank Quarterly 90, no. 2 (2012): 311–346. https://doi.org/10.1111/j.1468-0009.2012.00665.x. Keyes, C. L. M. “Social Well-Being.” Social Psychology Quarterly 61, no. 2 (1998): 121–140. Mares, M.-L., M. B. Oliver, and J. Cantor. “Age Differences in Adults’ Emotional Motivations for Exposure to Films.” Media Psychology 11, no. 4 (2008): 488–511. https://doi .org/10.1080/15213260802492026. Mares, M.-L., and Y. Sun. “The Multiple Meanings of Age for Television Content Preferences.” Human Communication Research 36 (2010): 372–396. https://doi .org/10.1111/j.1468-2958.2010.01380.x. Mehl, M. R., and T. S. Conner, eds. Handbook of Research Methods for Studying Daily Life. New York: Guilford Press, 2014. Ohme, J., E. Albaek, and C. H. de Vreese. “Exposure Research Going Mobile.” Communication Methods and Measures 10, no. 2–3 (2016): 135–148. https://doi.org/10.10 80/19312458.2016.1150972. Reinecke, L. “Games and Recovery: The Use of Video and Computer Games to Recuperate from Stress and Strain.” Journal of Media Psychology: Theories, Methods, and Applications 21, no. 3 (2009): 126–142. https://doi.org/10.1027/1864-1105.21.3.126. Reinecke, L., J. Klatt, and N. C. Krämer. “Entertaining Media Use and the Satisfaction of Recovery Needs: Recovery Outcomes Associated with the Use of Interactive and Noninteractive Entertaining Media.” Media Psychology 14, no. 2 (2011): 192–215. https://doi.org/10.1080/15213269.2011.573466. Reinecke, L., and M. B. Oliver, eds. The Routledge Handbook of Media Use and WellBeing. New York: Routledge, 2016. Rowe, J. W., and R. L. Kahn. “Successful Aging.” The Gerontologist 37, no. 4 (1997): 433– 440. https://doi.org/10.1093/geront/37.4.433. Ryan, R. M., and E. L. Deci. “On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being.” Annual Review of Psychology 52 (2001): 141–160. https://doi.org/10.1146/annurev.psych.52.1.141. Ryan, R. M., V. Huta, and E. L. Deci. “Living Well: A Self-Determination Theory Perspective on Eudaimonia.” Journal of Happiness Studies 9, no. 1 (2008): 139–170. https://doi.org/10.1007/s10902-006-9023–4. Ryff, C. D. “Happiness Is Everything, or Is It?” Journal of Personality and Social Psychology 57, no. 6 (1989): 1069–1081. https://doi.org/10.1037/0022-3514.57.6.1069. Seifert, A., M. Hofer, and M. Allemand. “Mobile Data Collection: Smart, but Not (Yet) Smart Enough.” Frontiers in Neuroscience 12 (2018): 971. https://doi.org/10.3389 /fnins.2018.00971. Sliwinski, M. J. “Measurement-Burst Designs for Social Health Research.” Social and Personality Psychology Compass 2, no. 1 (2008): 245–261. https://doi.org/10.1111/j.1751 -9004.2007.00043.x. Sliwinski, M. J., D. M. Almeida, J. Smyth, and R. S. Stawski. “Intraindividual Change and Variability in Daily Stress Processes: Findings from Two Measurement-Burst Diary Studies.” Psychology and Aging 24, no. 4 (2009): 828–840. https://doi.org/10.1037 /a0017925. United Nations, Department of Economic and Social Affairs. World Population Prospects: The 2017 Revision, Key Findings and Advance Tables. New York: United Nations, 2017.

244 A M E A S U R E M E N T B U R ST ST U DY O F M E D I A   U S E A N D W E L L- B E I N G

van der Goot, M., J. W. Beentjes, and M. van Selm. “Older Adults’ Television Viewing from a Life-Span Perspective: Past Research and Future Challenges.” In Communication Yearbook 30, ed. S. C. Beck, 431–469. Mahwah, NJ: Erlbaum, 2006. Weigold, A., I. K. Weigold, N. M. Drakeford, S. A. Dykema, and C. A. Smith. “Equivalence of Paper-and-Pencil and Computerized Self-Report Surveys in Older Adults.” Computers in Human Behavior 54 (2016): 407–413. https://doi.org/10.1016/j.chb .2015.08.033. Wilhelm, P., and D. Schoebi. “Assessing Mood in Daily Life.” European Journal of Psychological Assessment 23, no. 4 (2007): 258–267. https://doi.org/10.1027/1015-5759.23.4.258. Zillmann, D. “Mood Management Through Communication Choices.” American Behavioral Scientist 31, no. 3 (1988): 327–340. https://doi.org/10.1177/000276488031003005.

Chapter Twelve

COMMUNITY-BASED INTERVENTION RESEARCH STRATEGIES Digital Inclusion for Marginalized Populations HYUNJIN SEO

On a sunny Monday morning in December 2018, Linda, a seventy-fiveyear-old African American woman, entered a small basement computer lab at a senior community center located in one of the poorest neighborhoods in Kansas City, Missouri. She passed the first two empty chairs and sat on the next one. There she opened a black laptop fixed to a long, gray table and started checking her Gmail account and Google News. Linda was a participant in our community-based engagement program held at the senior center. Throughout her engagement in our year-long program, she had always preferred to sit at that particular machine as she developed a sense of comfort with the public-use laptop. She learned how to create an account on Gmail, set privacy settings on Facebook, and evaluate online information. That Monday Linda, who just a year earlier had been afraid to use a computer for “worries about breaking it,” earned her second certificate in computer skills from workshops offered by our research team from the University of Kansas. “I feel like I know more and I can do more. I’m not afraid of it [the computer],” she said. Linda is one of the twenty-nine people who received at least one certificate from our program in 2018. Most of the two hundred members of the center are African American. Fears and frustrations shared by some center members about being online before taking this computer class represent a prevalent issue facing older, low-income African Americans in the United

246 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

States. Research shows that members of this group are less likely to have digital competency and access than are those from similar age cohorts in other racial and income groups.1 They tend to lag in their use of digital technologies due to inadequate access to digital devices or lack of relevant skills. As more and more activities take place online, it is important that these older adults be equipped with relevant technology tools and skills rather than being left behind. As a communication scholar interested in digital inequality, I am motivated to understand what researchers can do to help narrow this digital divide. I wanted to see whether community engagement could be incorporated into empirical research projects so that beyond documenting inequalities, we do something about alleviating them. This was the motivation for embarking on an evidence-based digital education program for older, low-income African Americans in Kansas City in 2017, which serves as the basis for this chapter. The goal is to share how community-based research involving marginalized populations can be designed and implemented through describing our own project with this group. I begin with relevant contexts for our community engagement project and then explain how we combined different research methodologies along with the educational intervention program. I conclude with lessons learned and suggestions for future programs as well as challenges and opportunities in working with underserved, marginalized populations in the area of digital technologies. CONTEXT AND PURPOSE

As more and more everyday transactions and activities take place online, digital access and skills are becoming essential for citizens to engage more fully in political, civic, social, and cultural activities. Older, low-income African Americans are arguably one of the most disadvantaged groups in this regard. Despite rapid progress in providing access to digital technologies, levels of internet adoption among older African Americans remain relatively low.2 Moreover, this group is less likely to be digitally competent compared to older members of other racial and income groups.3 Research shows that the most “confident” internet users are “disproportionately white, quite well educated and fairly comfortable economically.”4 The overwhelming majority of internet users are also young, whereas those age sixty-five

247 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

or older tend to be “wary.”5 While older adults in the United States are increasingly online, this trend has been driven primarily by affluent older adults.6 According to a 2017 report, about 67 percent of all adults older than sixty-five used the internet, whereas only about 46 percent of adults older than sixty-five with an annual household income below $30,000 were internet users.7 Similarly, more than half of affluent older adults use social media platforms compared to only one-fifth of low-income older adults.8 Affluent older adults have higher levels of general internet usage skills and are more social media savvy than low-income older adults.9 Despite the significant implications of this digital divide for the healthy growth of society, there is insufficient research on how older, low-income African Americans acquire and use relevant digital literacy and skills. Most of all, it is important to develop evidence-based digital and information literacy programs to support this population. Against this backdrop, our research team developed a digital literacy program for older, low-income African Americans in Kansas City. In particular, we wanted to learn about the digital skills this group perceived as most important to acquire, their primary concerns about taking a computer class and learning digital technology skills, and any changes over the course of the program in their perspectives about maintaining privacy and security online and about verifying online information. MULTISTAGE, MIXED-METHODS APPROACH

Our project team adopted a multistage, mixed-methods approach for this program. The research methods we used included focus groups, participation observation, interviews, and document analysis. Figure 12.1 shows an overview of our community engagement and empirical research steps for two years. In this section, I provide a brief overview of the research methods we used at each stage and our recruitment strategy. Project Initiation

Our team was composed of four professors, two graduate students, and one undergraduate student, all of whom were affiliated with the Center for Digital Inclusion at the University of Kansas. I am the founding director of the center, which is aimed at facilitating scholarship, education,

248

Nov.–Dec. 2016 Interviews with community center director and staff

Intervention

Formative research & program development

C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

Feb.–Mar. 2017 5 focus groups with community center members n = 33

Process evaluation

Sept. 2017–Dec. 2018 Participant observations Computer class instructor’s journal entries

Outcome evaluation

Sept. 2017–Dec. 2018 Computer class at the senior community center (42 sessions) n = 29

Dec. 2018 Assessment of digital competency

Sept. 2017–Dec. 2018 4 focus groups with community center members n = 33

Sept. 2017–Dec. 2018 Interviews with computer class instructor and community center staff

Dec. 2018 2 focus groups with community center members n = 17

Dec. 2018 Interviews with community center director and staff

FIGURE 12.1 Empirical research and education program work flow.

and collaborative partnerships to help enhance citizens’ digital access and information literacy, especially among underserved populations. Since its establishment in 2017, the center has worked with refugees, low-income minority seniors, and formerly incarcerated women seeking to reenter

249 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

the workforce. This project was a part of the center’s initiatives in digital inclusion. In identifying a community organization to work with for this particular project, we considered needs, capacity, and commitment. First of all, we tried to gauge which community center serving older adults in Kansas City needed the most help in the area of digital technologies. As mentioned earlier, the senior community center we chose to work with was located in the poorest neighborhood in Kansas City. In addition, this center served more than two hundred African American seniors, a group lacking digital access and skills compared to similar age groups among other racial groups. Second, we needed to consider a center’s capacity to host computer classes— for example, whether it was equipped with a computer lab. The selected center had a computer lab that was supported by local charities, whereas smaller senior community centers often did not have adequate space and resources for hosting a computer class. Finally, we wanted to make sure that the center’s leadership was both committed and prepared to work with an academic institution. The senior community center chosen for this project had worked with members of our university on health-related projects earlier, so there was precedent for such collaboration. Once we identified the center based on these factors, we contacted the center to discuss the feasibility of collaborating on offering a computer class for its members. As project leader, I emailed the center’s executive director to schedule a meeting between our research team and key members of the center. The executive director welcomed an opportunity to partner with an academic institution to provide more consistent and tailored technology education for the center’s members. This timely and positive response may have to do with the center members’ dissatisfaction with computer sessions offered by a volunteer associated with a technology-related nonprofit organization in the area. In addition, the fact that one of our team members had already worked with the center on a separate project was helpful in building trust between the center’s leadership and our research team. Our first meeting took place at the center two weeks after my initial email to the executive director. There our team explained what we hoped to achieve through a computer/digital literacy class at the center and what our tentative plan was. The center’s leadership discussed prior computer classes held at the center, including what worked and did not work and what they wanted to see in a future computer class. During the subsequent two

250 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

months, the center’s executive director and I held several follow-up meetings to initiate the program and discuss participant recruitment strategies. Project Stages

As shown in figure 12.1, we conducted formative research from November 2016 to March 2017. Based on our findings from this stage, we offered computer sessions from September 2017 to December 2018. We conducted evaluations throughout the program, with process evaluations taking place between September 2017 and December 2018 and outcome evaluations being completed in December 2018. Following is a description of each project stage. Formative research. Our formative research included interviews and focus groups. These helped us design our education program to be more relevant to older, low-income African Americans. Specifically, we first conducted interviews with the senior community center’s executive director and five staff members to understand their perspectives on challenges and opportunities regarding digital education for their members. In February and March 2017, we completed five focus groups with center members who were potential participants in our computer education program. (We discuss details about our approach to recruitment later in the chapter.) A total of thirty-three members of the center participated in one of these focus groups, in which they discussed their experiences, familiarities, and needs in terms of digital skills and literacy. All focus groups were held in a conference room at the center and lasted between sixty and ninety minutes. These focus groups and interviews helped us gain a picture of specific challenges facing the senior community center as well as potential program participants’ range of knowledge, experiences, and perspectives related to digital technologies. For example, the executive director indicated that previous computer classes lacked focus and relevance. Some focus group participants who had attended a previous class said the instructor lacked empathy and paid little attention to software aspects or other topics of particular interest to center members. One focus group participant said, “I went to one session and I thought it was over my head. I didn’t go back,” adding that the instructor assumed everyone had some basic understanding. The participant said, “I came into the class and they were far more advanced, they all knew how to turn on the computer. . . . I said enough

251 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

of this, I am going out, back out the door.” This frustration was expressed by other participants as well. Another participant said, “I had somebody who was helping me and she had my blood pressure up so high I promised myself I would never, ever ask her.” Our formative research also highlighted the importance of nurturing an open, judgment-free environment in which participants could work at their own pace and feel comfortable asking questions. Program development. We used an iterative process of incorporating our empirical research findings into program development and reflecting lessons learned from our program implementation in our subsequent research activities. Therefore, we conducted participant observations and evaluation focus groups throughout our education program, as described later in the chapter. Based on our formative research findings, we designed the first three months (mid-September to mid-December 2017) of the computer education program with the understanding that we would be adjusting the content as needed based on participant feedback and process evaluations. We held twelve sessions during this three-month period. Because we had adopted an interest-driven learning design, center members’ feedback during the focus group sessions conducted before our education program began provided significant guidance in designing our curriculum. An interest-based framework develops learning activities by drawing on learners’ specific interests, as interest is an effective motivator for learning.10 In addition to these focus groups, our team held meetings for several months to develop the curriculum. Undergraduate and graduate research assistants on the team reviewed free materials already available for digital education. In addition, my experience offering hands-on digital media courses for undergraduate students and technology workshops for various nonacademic groups was helpful in our curriculum development. Intervention. We conducted a total of forty-two computer sessions at the senior community center from September 2017 to December 2018, thirty of them after the preliminary three-month period. Twenty-nine individuals attended the computer class at least once. On average, fifteen people attended each weekly session. Table 12.1 shows sample weekly course topics and numbers of participants. The main topics covered in the class included managing privacy and security online, assessing information online, and using email and social media sites such as Facebook. Each session was held on a Monday morning for ninety minutes at the senior community

TABLE 12.1 Sample Computer Class Topics and Number of Participants per Session Session

Topics

Participant Activities

Information Verification and Searching (n = 10)

Determining if an online story is real or false Using information verification sites

Identifying elements of a news story to determine validity Using Snopes.com to verify an article

Computer Basics (n = 14)

Parts of a computer and operating systems File explorer Computer settings

Identifying parts of a laptop and desktop computer Organizing files within file explorer Adjusting computer settings such as brightness, desktop backgrounds, and zoom

Searching Online Part 1 (n = 16)

Explanation of the internet and its purpose Types and uses of web browsers Web browser navigation buttons Search engines

Opening a web browser and searching for a specific URL Using the navigation buttons back, forward, and refresh Bookmarking a website Conducting a keyword search on a search engine

Searching Online Part 2 (n = 17)

Anatomy of a webpage Reading a webpage Website security indicators Privacy settings in Google Chrome

Using the find toolbar to quickly find information on a website page Modifying Google Chrome privacy settings Deleting browsing history Opening and using a browser in incognito mode

Cyber Security Part 1 (n = 17)

Creating strong passwords Practice distinguishing between Understanding browser tracking a strong and a weak password How cookies work Opting out of cookie tracking in a browser

Establishing and Using a Gmail Account (n = 13)

The Gmail interface Mail settings and adding contacts Sending an email

Registering for a Gmail account and adding contacts Sending an email to other class participants Signing out safely

Gmail Part 2 and Facebook Inbox organization and settings (n = 15) Replying to emails Dealing with and reporting spam The purpose of Facebook

Replying to emails Deleting emails Setting up Gmail on smartphones Creating a Facebook account

Facebook (n = 15)

Exploring the homepage Exploring the timeline Adjusting individual Facebook account privacy, timeline, and tagging settings Blocking specific Facebook users

Common terms used on Facebook Tutorial on how to use the homepage and timeline Privacy settings

253 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

Cyber Security Part 2 (n = 13)

Using and being safe on public internet How to spot phishing scams Two-step account verifications

Connecting a personal device to public Wi-Fi Establishing two-step verification on Google accounts Spotting phishing scams from email examples

Smartphone Basics (n = 14) Differences between iPhones and Androids Operating systems and applications Security settings and tips

Using basic smartphone apps Connecting a smartphone to Wi-Fi Creating secure screen locks and passcodes

Tablet Basics (n = 13)

Internet usage on a tablet Security and privacy settings Advertising tracking

Third-party application privacy settings (Facebook) Adjusting location services settings on smartphones and tablets Deleting the cache on a smartphone and tablet

Microsoft Word (n = 13)

Everyday uses for Microsoft Word Exploring the Word interface

Creating and saving a new document with Word Practice typing and formatting text Using spell-check and word count

Microsoft Excel (n = 14)

Everyday uses for Microsoft Excel Exploring the Excel interface Navigating a spreadsheet and basic formulas

Creating and saving a new workbook with Excel Creating a monthly budget Using basic formulas for budgets

center’s computer lab, which had eighteen laptop computers. While twentynine members attended at least one session, no more than eighteen people showed up at any one of the sessions, and, thus, the number of computers was sufficient for our class. The hardware and software used in the computer lab had been provided with the support of local charities before our team’s intervention program began. Given the center’s limited resources for updating hardware or software programs, our team adjusted our instruction to work with what was available in the computer lab. The computers were all Microsoft Windows based, and our project team also used a Windows machine in providing instructions. To our relief, the computers were equipped with licensed copies of the software most relevant for our class including the Microsoft Office suite of programs. In addition, we showed participants how to download and install several free programs such as Google Chrome.

254 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

The sessions were led by a master’s student who was majoring in digital media at our school and had considerable experience offering digital education for underserved populations. Before the start of this particular project, she worked as a research assistant for our Center for Digital Inclusion. Through her research assistantship, she became involved in digitalmedia-related programs and offered a six-month computer course for women who were trying to enter the workforce after having been recently released from the prison system. She also participated in the curriculum development of our program at the senior community center. While the center is about a forty-minute drive from our university’s main campus in Lawrence, the graduate instructor resided in Kansas City, which made it easier for her to travel to the center. As a young, white woman, she differed from class participants in terms of visible identity (race/ethnicity and age). However, she quickly gained trust and respect from program participants by showing empathy, respect, and patience,11 as I discuss in the “Lessons Learned” section. Since there were great variations in the participants’ experiences and comfort with computers, completing some tasks took a long time. Therefore, the graduate instructor stayed at the senior community center for an hour after every session to ensure everyone could complete the week’s activities. Even though we anticipated varying skills and experiences, we did not expect that an additional office hour would be necessary. This need highlights the importance of staff flexibility with such programs. Evaluation research (process and outcome evaluation). We conducted the evaluation portion of our study throughout the training program. We used focus groups, participation observations, interviews, and document analysis in our evaluation. We used multiple methods to obtain a more holistic understanding of this underserved group’s experiences with digital technologies and of the changes in their perspectives and skills over time. In addition to the five focus groups conducted before our education program began, we held six focus groups for our evaluation research. Since our research and education programs were based on an iterative process in which each informed the other, evaluation research helped us both to assess project outcomes and to refine our subsequent curriculum. Four of these focus groups served as midprogram evaluations: two focus groups with sixteen participants were held in December 2017, and two focus groups with

255 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

seventeen participants took place in May 2018. Our goal with these four midprogram evaluation focus groups was to assess how computer class participants’ attitudes toward technology had changed and also to identify areas of improvement for future computer sessions. The last two focus groups, conducted with seventeen participants in December 2018, were our final evaluations, at which we determined how the computer class influenced participants’ experiences and comfort with technologies and their perceptions of related issues. During these evaluation focus groups, participants who attended our computer class demonstrated enhanced knowledge and comfort with digital technologies. We also conducted participant observations of the computer class throughout the education program. Participation observation is a widely used in qualitative research to obtain a systematic description of behaviors or events in a social setting.12 We used this method for a sustained period of time to develop regular insight into the ways older, low-income African Americans learned digital technologies. Three faculty members alternated observing the weekly ninety-minute computer sessions at the center and generated observers’ notes that we analyzed as part of the study. The graduate instructor introduced the observers to the participants and explained that they were people associated with the project. Then an observer sat in the back of the computer lab during each computer session and took notes on class participants’ questions to the graduate instructor as well as class participants’ interactions with each other and with the graduate instructor. The faculty members agreed on the items of particular interest before the observation research started. In addition, the graduate instructor wrote a journal entry every week about each class session. In her entries, she focused particularly on lesson topics, participant feedback, and reflections on leading a session and interacting with class participants.13 The observers and the graduate instructor typed up their notes and sent them to a research assistant who then reorganized them into five sections: (1) environment notes, (2) lesson notes, (3) participant notes, (4) feedback from class observers, and (5) reflections. These section categories were based on best practices on writing qualitative field notes.14 Environment notes focused on the setting and atmosphere of each class session, whereas lesson notes addressed the topics and instructional methods used in each session. Participant notes primarily concerned participants’ questions, reactions, and responses to instruction. Feedback from

256 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

class observers was based on notes from the three faculty observers, and reflections were done by the graduate instructor. Finally, we supplemented data from the focus groups and observations with interviews with the graduate instructor, the senior community center director, and center staff members responsible for developing programs for the center. We held an interview session with the graduate instructor every month during the education period to understand interactions during the class. All these interviews were conducted at the senior community center. While the graduate instructor’s journal entries were comprehensive, interview sessions with her provided an opportunity to ask follow-up questions. We conducted the interviews with the senior community center’s director and staff at the beginning, midpoint, and conclusion of the program. Additionally, we analyzed class review exams taken by the participants during the education period and other documents related to class activities, including lesson plans. Recruitment and participant characteristics. We recruited participants through the senior community center, and all participants in our program were members of the center. More than 90 percent of the roughly two hundred members of the center were African American, and all participants in our project identified themselves as African American. To recruit participants for our initial focus groups, we placed flyers at the center’s check-in desk. In addition, based on agreements between our research team and the center’s leadership, center staff announced the focus groups at several community meetings held at the center. For these announcements, our team provided center staff with written recruitment materials that had been approved by our university’s Institutional Review Board (IRB). Each member who participated in our focus group session received a $5 lunch coupon for the center’s cafeteria. This incentive strategy was determined during our meeting with the center’s leadership after they indicated that the $5 lunch was popular among center members. As development of our research-based education program progressed, we encouraged participants to ask their peers to attend our focus group sessions. This referral strategy turned out to be quite effective, as members of the center maintain frequent contact among themselves through various programs within the center. For example, one participant in our research program was able to convince several of her friends attending a weekly line dancing program at the center to participate in our research sessions.

257 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

These focus group sessions served as important opportunities for us to identify and recruit participants for our computer class. In addition, we placed flyers promoting our computer class at the center’s check-in desk, and the executive director shared the information at several community center meetings involving its members. As I describe later, earning trust from our program participants was essential in retaining participants and recruiting additional people for various research, class, and engagement activities throughout the two-year program. Given the relevance of our education program for underserved seniors, the center’s executive director invited us to give two workshops—one at the center and the other at a large convention center in Kansas City. Through these activities, our program engaged a total of 105 older, low-income African Americans during the project period. Data analysis. Since we used multiple methods, there was a relatively large amount of data to be analyzed. Three faculty members and two graduate research assistants were involved in the data analysis. We used a third-party transcription service to transcribe our focus group at a cost of $100 for each hour-long focus group session. These fees were covered by a grant our research team received for the project. After receiving all focus group transcripts, one faculty member and one graduate student collaborated to analyze the data using ATLAS.ti, which provides tools to organize and interpret qualitative data sets. Participants’ answers were analyzed inductively, using a combination of open coding, identifying relevant themes line by line, and focused coding, searching for specific themes to group them into categories.15 A graduate research assistant transcribed interview sessions, which tended to be shorter than focus group sessions. Additionally, a faculty member and a graduate student worked together to analyze the instructor’s journal entries, participation observation notes, lecture notes, and exams. Once each subteam completed its portion of the data analysis, the entire team gathered to discuss key findings from each method and to identify emerging themes across the different methods. Per the university’s IRB protocol, we anonymized and deidentified our data as appropriate. For example, we removed participants’ names and instead used study code numbers. A complete database that related study code numbers to consent forms and identifying information was encrypted and stored separately on a password-protected computer in a secure, locked office at the university.

258 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

LESSONS LEARNED

In this section, I discuss lessons learned during our two-year, communitybased research project with a marginalized group as well as recommendations for future efforts concerning community engagement. Show Respect for Community Partners and Program Participants

I cannot emphasize enough how important it is to show respect for your community partners and participants, especially when you engage with marginalized groups. Unfortunately, community organizations often report having negative experiences working with researchers, whom they perceive as being interested only in completing their research project. When you conduct community-based research, you need to demonstrate your sincere passion and interest in understanding the challenges of the community and potentially making contributions that address those challenges through research or engagement projects. Showing respect for community partners and program participants can happen in multiple settings and stages. In our case, the most important venue was the weekly education session. At the start of our project, our team had multiple discussions about how to pay the utmost attention to being respectful to the participants. Since we were working with older adults who were not familiar with digital technologies, showing empathy for their challenges was important. This aspect was also confirmed during our formative research. One of the major complaints that emerged from that research was that previous instructors often lacked empathy and patience. Referring to an early experience trying to learn how to use a computer, one participant said, “She [the instructor] was so bossy until I didn’t come back.” Another participant said, “Yeah, she knew everything and I didn’t know anything.” Therefore, I reminded our graduate instructor of the importance of being respectful and empathetic. In addition, given varying degrees of knowledge and experience among participants, the graduate instructor spoke slowly and checked with participants multiple times about whether the instruction was clear and understandable. She built a strong rapport with participants by validating the concerns they expressed about technology and respectfully walking them through any difficulty they encountered during the class session. Indeed, our evaluation focus group sessions showed that participants valued the

259 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

graduate instructor’s “respect” for them and “patience” in working with them. They said it was a pleasant contrast to some of their previous instructors, who were “rude” or “dismissive.” Respect can also be demonstrated through your commitment to engaging with the community partner frequently. At the initial stage of the program development, I opted to drive to the senior community center, which is about 45 miles away from my university office, every other week to meet with the executive director and keep her updated about our progress. These biweekly meetings took place on Tuesdays for three months. While email or phone conversation would have been easier on my part, actually visiting the center and having face-to-face conversations helped center staff understand my strong commitment to the project. Finally, our team members sometimes had lunch with the education program participants in the center’s cafeteria. As noted earlier, participants in our focus group sessions received meal coupons as incentives, offering an opportunity for us to dine with those who attended the focus group sessions. These lunches helped build rapport with program participants. Conduct Thorough Research to Understand Participants’ Needs and Interests

Having a proper understanding of your participants’ interests and existing knowledge and skills in the area of concern is essential to developing a relevant intervention program. This may sound obvious, but based on my conversations with community organizations, intervention programs often fail due to the lack of thorough research aimed at understanding the target group’s skills and areas of interest. Do not rush to start a community-based project; take your time to understand situations surrounding the community and its members more fully. As discussed earlier, we conducted extensive formative research to develop a program tailored for our target group. Findings from our formative research enabled us to implement an interest-driven learning design framework,16 which, in turn, helped us recruit and retain our program participants. Through our formative and midevaluation focus groups, we were able to gather data about senior community center members’ interests in learning digital technologies as well as their perspectives on related issues. We then incorporated the insights from this research into our curriculum to support the achievement of positive

260 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

learning outcomes. Our participant observations and evaluation focus groups showed that the class participants appreciated that the curriculum was geared toward their expressed interests. Similarly, properly understanding participants’ prior experiences in related areas is important. In our preintervention focus groups, several of them mentioned that they had gone to a different computer class at the senior community center but had never returned because the instruction had been hard to follow and they had felt lost. Based on our formative focus group findings, we understood that it was important to offer a step-by-step manual on every major activity in class. Most participants reported that handouts for each class lesson were helpful, as these allowed them to review and practice on their own. In addition, the graduate instructor stayed in the center’s computer lab for some time after class to continue to interact with class participants as needed. Establish Effective Communication Channels

Setting clear expectations and establishing effective communication channels are key to developing and sustaining a successful academic-community partnership. This is particularly the case with nonprofit organizations, as many lack resources and some major unexpected situations could come up. In our case, lack of solid internet connectivity at the senior community center sometimes affected our ability to implement the program in an efficient manner. Since our program focused on digital technologies, solid internet connectivity was essential, but the center’s basement computer lab had relatively slow internet connections. We had to create our own hotspot for some computer sessions when class activities required downloading documents or programs. On a rainy summer day, the center’s basement flooded, and we had to keep a fan on throughout our computer class to keep the area dry. We were, however, able to address these obstacles smoothly thanks to frequent communication between the project team and the center leadership. Our regular research sessions provided opportunities for gathering feedback from program participants and center staff. In addition, I had regular meetings with the center’s executive director throughout the program. We also had one program participant who served as a liaison between the team and the center’s staff when our team was not on-site. Selecting the

261 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

liaison was easy, as she had long been in close communication with the center’s leadership as a prominent volunteer at the center and she offered to serve as a liaison for the program without any compensation. It was also clear from the start of the program that she was well respected by her peers. Considering these factors, we accepted her offer to work as a liaison, and she and the graduate instructor communicated via email when needed. It Takes Time, But It Is Worth the Time

Empirical research based on community engagement or intervention tends to take a long time, as it requires building and maintaining relationships with a community partner and can involve complex logistical issues. In addition, there can be unexpected challenges stemming from situations surrounding the community partner over which you have no control. Therefore, it is essential that researchers properly assess the feasibility of a communitybased project and allocate sufficient time for data collection and analysis. Despite these potential challenges, community-based projects are essential in advancing scholarship with real-world implications and evidence-based programs for marginalized groups. Most nonprofit organizations do not have resources to conduct systematic research. Academics tend to lack direct access to marginalized populations, and academic research is often not translated into actual programs or policies. A partnership between an academic institution and a community organization can contribute to developing and implementing an effective program based on rigorous research. In our case, research activities at every stage of the program helped improve class activities, which, in turn, kept participants interested and motivated. Our project team’s experience in various research projects helped us carry out this multistage, mixed-methods project with lowincome minority seniors. I had previously worked on technology education with other underserved populations, including refugees and formerly incarcerated women, and some other members had worked on health or diversity topics with marginalized groups. As noted earlier, it was also important for us to identify properly a community partner based on needs, capacity, and commitment. Engaging in sincere and honest conversations about each side’s needs and resources is essential for successful implementation of a community-based research program. The academic-community partnership is important in making differences beyond the ivory tower.

262 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

NOTES 1. P. T. Jaeger et al., “The Intersection of Public Policy and Public Access: Digital Divides, Digital Literacy, Digital Inclusion, and Public Libraries,” Public Library Quarterly 31, no. 1 (2012): 1–20; A. Perrin and E. Turner, “Smartphones Help Blacks, Hispanics Bridge Some—But Not All—Digital Gaps with Whites,” Pew Research Center, August 20, 2019, https://www.pewresearch.org/fact-tank/2019/08/20/smartphones-help-blacks -hispanics-bridge-some-but-not-all-digital-gaps-with-whites/; A. Smith, “African Americans and Technology Use,” Pew Research Center, January 6, 2014, https:// www.pewresearch.org/internet/2014/01/06/african-americans-and-technology -use/. 2. Perrin and Turner, “Smartphones Help Blacks”; H. Seo et al., “Calling Doctor Google? Technology Adoption and Health Information Seeking Among LowIncome African-American Older Adults,” Journal of Public Interest Communications. 1, no. 2 (2017): 153–173. 3. Jaeger et al., “The Intersection of Public Policy and Public Access”; Perrin and Turner, “Smartphones Help Blacks”; Smith, “African Americans and Technology Use”; Seo et al., “Calling Doctor Google?”; H. Seo et al., “Evidence-Based Digital Literacy Class for Low-Income African-American Older Adults,” Journal of Applied Communication Research 47, no. 2 (2019): 130–152. 4. J. B. Horrigan, “How People Approach Facts and Information,” Pew Research Center, September 11, 2017, https://www.pewresearch.org/internet/2017/09/11/how -people-approach-facts-and-information/. 5. Horrigan, “How People Approach Facts.” 6. M. Anderson and A. Perrin, “Tech Adoption Climbs Among Older Adults,” Pew Research Center, May 17, 2017, https://www.pewresearch.org/internet/2017/05/17 /tech-adoption-climbs-among-older-adults/; E. Hargittai and K. Dobransky, “Old Dogs, New Clicks: Digital Inequality in Skills and Uses Among Older Adults,” Canadian Journal of Communication 42, no. 2 (2017): 195–212; A. Hunsaker and E. Hargittai, “A Review of Internet Use Among Older Adults,” New Media and Society 20, no. 10 (2018): 3937–3954. 7. Anderson and Perrin, “Tech Adoption Climbs.” 8. Anderson and Perrin, “Tech Adoption Climbs.” 9. E. Hargittai, A. M. Piper, and M. R. Morris, “From Internet Access to Internet Skills: Digital Inequality Among Older Adults,” Universal Access in the Information Society 18, no. 4 (2018): 881–890. 10. P. D. Dantas Scaico, R. J. de Queiroz, and J. J. Lima Dias, “Analyzing How Interest in Learning Programming Changes During a CS0 Course: A Qualitative Study with Brazilian Undergraduates,” in ITiCSE ’17: Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (New York: Association for Computing Machinery, 2017), 16–21; L. Torrey, “Student Interest and Choice in Programming Assignments,” Journal of Computing Sciences in Colleges 26, no. 6 (2011): 110–116. 11. Seo et al., “Evidence-Based Digital Literacy Class.” 12. G. Guest, E. E. Namey, and M. L. Mitchell, Collecting Qualitative Data: A Field Manual for Applied Research (Thousand Oaks, CA: SAGE, 2013).

263 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

13. Seo et al., “Evidence-Based Digital Literacy Class.” 14. R. M. Emerson, R. I. Fretz, and L. L. Shaw, Writing Ethnographic Fieldnotes (Chicago: University of Chicago Press, 2011). 15. K. Charmaz, Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis (London: SAGE, 2006). 16. Dantas Scaico, de Queiroz, and Lima Dias, “Analyzing How Interest in Learning Programming Changes.”

REFERENCES Anderson, M., and A. Perrin. “Tech Adoption Climbs Among Older Adults.” Pew Research Center, May 17, 2017. https://www.pewresearch.org/internet/2017/05/17/tech -adoption-climbs-among-older-adults/. Charmaz, K. Constructing Grounded Theory: A Practical Guide Through Qualitative Analysis. London: SAGE, 2006. Dantas Scaico, P. D., R. J. de Queiroz, and J. J. Lima Dias. “Analyzing How Interest in Learning Programming Changes During a CS0 Course: A Qualitative Study with Brazilian Undergraduates.” In ITiCSE ,17: Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education, 16–21. New York: Association for Computing Machinery, 2017. Emerson, R. M., R. I. Fretz, and L. L. Shaw. Writing Ethnographic Fieldnotes. Chicago: University of Chicago Press, 2011. Guest, G., E. E. Namey, and M. L. Mitchell. Collecting Qualitative Data: A Field Manual for Applied Research. Thousand Oaks, CA: SAGE, 2013. Hargittai, E., and K. Dobransky. “Old Dogs, New Clicks: Digital Inequality in Skills and Uses Among Older Adults.” Canadian Journal of Communication 42, no. 2 (2017): 195–212. Hargittai, E., A. M. Piper, and M. R. Morris. “From Internet Access to Internet Skills: Digital Inequality Among Older Adults.” Universal Access in the Information Society 18, no. 4 (2019): 881–890. Horrigan, J. B. “How People Approach Facts and Information.” Pew Research Center, September 11, 2017. https://www.pewresearch.org/internet/2017/09/11/how-people -approach-facts-and-information/. Hunsaker, A., and E. Hargittai. “A Review of Internet Use Among Older Adults.” New Media and Society 20, no. 10 (2018): 3937–3954. Jaeger, P. T., J. C. Bertot, K. M. Thompson, S. M. Katz, and E. J. DeCoster. “The Intersection of Public Policy and Public Access: Digital Divides, Digital Literacy, Digital Inclusion, and Public Libraries.” Public Library Quarterly 31, no. 1 (2012): 1–20. Perrin, A., and E. Turner. “Smartphones Help Blacks, Hispanics Bridge Some—But Not All—Digital Gaps with Whites.” Pew Research Center, August 20, 2019. https://www .pewresearch.org/fact-tank/2019/08/20/smartphones-help-blacks-hispanics-bridge -some-but-not-all-digital-gaps-with-whites/. Seo, H., J. Erba, D. Altschwager, and M. Geana. “Evidence-Based Digital Literacy Class for Low-Income African-American Older Adults.” Journal of Applied Communication Research 47, no. 2 (2019): 130–152.

264 C O M M U N I T Y- B A S E D I N T E R V E N T I O N R E S E A R C H S T R AT E G I E S

Seo, H., J. Erba, M. Geana, and C. Lumpkins. “Calling Doctor Google? Technology Adoption and Health Information Seeking Among Low-Income African-American Older Adults.” Journal of Public Interest Communications 1, no. 2 (2017): 153–173. Smith, A. “African Americans and Technology Use.” Pew Research Center, January 6, 2014. https://www.pewresearch.org/internet/2014/01/06/african-americans-and -technology-use/. Torrey, L. “Student Interest and Choice in Programming Assignments.” Journal of Computing Sciences in Colleges 26, no. 6 (2011): 110–116.

CONTRIBUTORS

Teresa Correa (PhD in journalism and media, University of Texas at Austin) is an associate professor in the School of Communication at Diego Portales University, Santiago, Chile. Her research focuses on inequalities in digital media access and uses and in health communication. Her research has been funded by the competitive National Science and Technology Development Fund, the International Development Research Centre from Canada, the National Fund for the Study of Pluralism in the Media System, and the American Association of Family and Consumer Science. Erin Fordyce, MS, MEd, is a methodologist at NORC at the University of Chicago, where she has been involved with questionnaire design, cognitive interviewing, and study implementation for web, mail, telephone, and mixed-mode surveys. She has seven years of experience in the social science research field, where she has worked on experimental designs, particularly with hard-to-reach populations. Her interests are in the use of innovative survey recruitment approaches, including social media. She has presented on innovative social media recruitment efforts at several conferences, including that of the American Association for Public Opinion Research, and at workshops for the National Institutes of Health. Deen Freelon (PhD, University of Washington), an associate professor in the Hussman School of Journalism and Media at the University of North Carolina at Chapel Hill, studies political uses of social media and other digital technologies. He is also a principal researcher for UNC’s interdisciplinary Center for

266 CONTRIBUTORS

Information, Technology, and Public Life. He has authored or coauthored more than forty journal articles, book chapters, and public reports in addition to coediting one scholarly book. An expert in multiple programming languages including R, Python, and PHP, he has written research-grade software applications for a range of computational research purposes. He previously taught at American University in Washington, DC. Eszter Hargittai (PhD, Princeton University) is a professor of communication and media research and holds the Chair in Internet Use and Society at the University of Zurich. Much of her research is interested in questions of digital inequality, with a particular focus on how people's internet skills relate to what they do online. She is also concerned with how research design decisions can bias data samples. In 2019, she was elected Fellow of the International Communication Association having receiving the association’s Young Scholar Award in 2009. Melissa Heim Viox, MPH, is a senior research director at NORC at the University of Chicago, where she directs and manages health research projects for government and private agencies. She has extensive experience working with hard-toreach populations using innovative research designs, including social media. Her research interests include maternal and child health, health equity, and immigrant/refugee well-being. Matthias Hofer (PhD, University of Zurich) is a senior research and teaching associate in the University of Zurich’s Department of Communication and Media Research. Until 2020, he was an SNSF Ambizione Fellow. His main research areas include media effects through the life span, media audiences and effects, and entertainment media. He also examines the experience of presence in virtual environments. Lee Humphreys (PhD, University of Pennsylvania) is a professor of communication at Cornell University. She studies the social uses and perceived effects of communication technology, specifically focusing on mobile and social media. She is the author of The Qualified Self: Social Media and the Accounting of Everyday Life (2018) and the coeditor with Paul Messaris of Digital Media: Transformations in Human Communication (2017). Tobias R. Keller (PhD, University of Zurich) is a visiting postdoctoral fellow at Queensland University of Technology, Australia. He studies political communication on social media, with a particular focus on social bots in election campaigns. In 2018, he won the Best Paper Award in the Political Communication Division of the International Communication Association. He has published

267 CONTRIBUTORS

widely in journals such as Political Communication, Communication Research, and Social Media and Society. Erin Flynn Klawitter received her PhD in media, technology, and society from the School of Communication at Northwestern University in 2017. Her dissertation, titled “Crafting a Market: How Independent Artists Participate in the Peer Economy for Handmade Goods,” was awarded the 2017 Herbert S. Dordick Dissertation Award by the Communication and Technology Division of the International Communication Association and the Graduate Dissertation Award from the School of Communication at Northwestern. Her research has been published in the International Journal of Communication, the Journal of Broadcasting and Electronic Media, and Emerald Media Studies. She holds a BA in liberal studies from the University of Notre Dame, where her senior essay was recognized with the Otto J. Bird Award, and an MA in communication from Saint Louis University. Ulrike Klinger (PhD, Goethe-Universität Frankfurt am Main) is assistant professor for digital communication at Freie Universität Berlin and head of the research group on News, Campaigns, and the Rationality of Public Discourse at the Weizenbaum Institute for the Networked Society in Berlin. After receiving her PhD in political science (and the best dissertation award from the German Political Science Association in 2011), she worked as a postdoctoral researcher at the University of Zurich in Switzerland, was a visiting scholar at the Humboldt Institute for Internet and Society in Berlin and at the University of California at Santa Barbara, and was a visiting professor at Zeppelin University in Friedrichshafen, Germany. Her research focuses on digital public spheres, political communication, and digital technologies. Jeffrey Lane (PhD, Princeton University) is an assistant professor of communication at Rutgers University–New Brunswick. He is a digital urban ethnographer who writes about the role of social media in urban life and criminal justice. He is the author of The Digital Street (2018), which won the Nancy Baym Book Award from the Association of Internet Researchers and the Best Book Award from the Communication, Information Technologies, and Media Sociology Section of the American Sociological Association, and Under the Boards: The Cultural Revolution in Basketball (2007). He is a faculty associate of the New Jersey Gun Violence Research Center at Rutgers University and a junior fellow of the Urban Ethnography Project at Yale University. His research has been published in peer-reviewed journals such as American Behavioral Scientist, New Media and Society, the Journal of Computer-Mediated Communication, and the

268 CONTRIBUTORS

Journal of Consumer Culture and has been written about in popular news outlets like The Atlantic and Vice. Will Marler (PhD, Northwestern University) is a postdoctoral scholar in the Department of Communication and Media Research at the University of Zurich. He conducts research around issues of digital technology and socioeconomic marginalization, with a focus on urban homelessness and the use of smartphones and social media. As an ethnographer, he centers his investigation of technology issues around the practices of everyday life for people in poverty through research relationships built over time. His publications have appeared in the journals New Media and Society and Mobile Media and Communication, among other venues. Isabel Pavez is an assistant professor at the School of Communication at Universidad de Los Andes in Santiago, Chile. She has a PhD in media and communications from the London School of Economics and Political Science and an MSc in anthropology from Universidad de Chile in Santiago. She has participated in numerous research projects regarding digital inclusion of vulnerable populations. Besides her academic work, she has served as a consultant for international organizations such as the United Nations Economic Commission for Latin America and the Caribbean and UNESCO. Elissa Redmiles (PhD, University of Maryland) is a postdoctoral researcher at Microsoft Research. Her research interests are broadly in the areas of security and privacy. She uses computational, economic, and social science methods to understand users’ security and privacy decision-making processes, specifically investigating inequalities that arise in these processes and mitigating those inequalities through the design of systems that facilitate safety equitably across users. She received her BS, MS, and PhD degrees in computer science from the University of Maryland and is the recipient of an NSF Graduate Research Fellowship, a Facebook Fellowship, a USENIX Security Distinguished Paper Award, and the John Karat Usable Privacy and Security Research Award. Hyunjin Seo (PhD, Syracuse University) is an associate professor of digital/emerging media and Docking Faculty Scholar at the University of Kansas and a faculty associate at the Berkman Klein Center for Internet and Society, Harvard University. As the director of the KU Center for Digital Inclusion, she has led community-based research projects offering digital skills and information literacy courses to underserved populations. She has published over forty journal articles and book chapters in the areas of digital media, collective action, and civic engagement.

269 CONTRIBUTORS

Michael J. Stern (PhD, Washington State University) is a professor in and chair of the Department of Media and Information at Michigan State University as well as a senior fellow at NORC at the University of Chicago. His research focuses on ways to reduce measurement error through testing the effects of visual design on respondents’ answers in web and mail surveys and has included experimental work employing in-depth analysis of client-side paradata, diverse approaches to probability and nonprobability sampling for web-only surveys, and assessments of response patterns to examine spatial clustering and geographically based coverage error. He has published several books and monographs as well as dozens of peer-reviewed papers in journals such as Public Opinion Quarterly; Field Methods; Survey Research Methods; Survey Practice; Information, Communication and Society; Social Science Quarterly; New Media and Society; International Journal of Internet Science; Work and Occupations; Sociological Inquiry; and American Behavioral Scientist. Nikki Usher (PhD, University of Southern California) is an associate professor at the University of Illinois at Urbana-Champaign in the College of Media’s Journalism Department (with an affiliate appointment in the Department of Communication). Her research focuses on news production in the changing digital environment, blending insights from media sociology and political communication, with particular attention to media elites. Her first book, Making News at the New York Times (2014), was the first book-length study of the United States’ foremost newspaper in the internet era and won the Tankard Award, a national book award from the Association for Education and Mass Communication in Journalism. Her second book, Interactive Journalism: Hackers, Data, and Code (2016), focused on the rise of programming and data journalism and was a finalist for the Tankard Award, making her the first solo author to be a two-time finalist. Prior to teaching at the University of Illinois, she was an associate professor at George Washington University.

INDEX

absent data, 6–8, 21–22, 24 access to a community or population, 80, 90, 95, 124, 126, 129, 131, 136, 143, 158, 172, 179–180, 200, 261. See also hardto-reach population access to data. See data access adolescents, 50, 58, 144, 154, 192–193, 231. See also teens advertisements, 9, 23, 42, 44, 55, 58, 68, 189 African Americans, 4, 148, 163–165, 245–247, 249–250, 255–257 Aftcra, 210 Airbnb, 207 algorithms, 12, 30, 35–36, 40, 44, 61, 211 Amazon, 12, 57, 62, 113 AMSM (adolescent men who have sex with men), 50, 52, 54–56, 58–59, 61, 67 Apple, 12–13 application programming interface (API), 6, 15–16, 22, 32–33, 36, 214–215 app walkthrough method, 95 archive, 7, 12–14, 23 Argentina, 192, 197 Artfire, 210–212 Asian Americans, 11, 15, 164 Audacious Software, 214

audio, 57, 102, 115–116, 219, 229 augmented reality (AR), 87 automated, analysis, 36–38, 44; behavior, 31; bots, 63; bursts, 4; data collection, 4, 214; data scraping, 4; tasks, 31; techniques, 23; web crawlers, 12 automation, 30, 35, 44 Beltway, 121, 131. See also Washington, DC bias, 1, 4–5, 12, 15, 19–20, 66, 71, 88, 117, 211, 221, 229, 236–237. See also unbiased big data, 143 Birdsong Analytics, 32 Black Lives Matter, 9, 11 Bluetooth, 85 Botometer, 35–41, 43–44 Bumble, 87–88 Cambridge Analytica, 23, 32 cell phone, 85, 148, 164, 191, 227, 233–234, 236, 238. See also mobile phone; smartphone Centers for Disease Control and Prevention (CDC), 50, 69 Chicago, 3, 56, 160–162, 169, 175, 178, 180

272 INDEX

Chile, 2, 184–186, 189–190, 192, 194–195, 197, 200 China, 23 Clemson University, 20; data set, 20, 22 clickstream, 93 codebook, 38, 105, 217 cognitive interviewing, 56–57 community-based research, 6, 245–246, 258–259, 261 computational analysis, 19, 135; approaches, 129; data, 135; social science, 44, 93, 136 computer, 31, 36–38, 63, 70, 94, 164, 166, 169, 219, 226, 233, 245, 250, 253–254, 257–258; algorithms, 30; classes, 245, 247–252, 255, 257, 260; lab, 144, 245, 249, 253, 255, 260; programs, 31; public access, 178; science, 44, 102, 114; scientists, 31, 35–38; security, 102; settings, 93, 252; skills, 245; training, 4 computer science, 44, 102, 114 Congress, U.S., 124, 126, 162, 136 consent, 34, 53, 61–62, 95, 110, 145, 174, 193, 212, 218–219, 233, 257 content analysis, 4, 38, 44, 125, 146 crime, 110. See also cybercrime CSS, 13 CSV, 19 cybercrime, 114 data access, 1, 7, 16, 21–24, 32, 34, 44 data cleaning, 32, 69 data collection, 6, 10, 18, 20–21, 32–33, 39, 50–51, 54, 57, 59, 61, 86, 89, 102, 128, 131, 172–173, 177, 180, 187, 192, 197, 200, 205–206, 214–215, 218, 232, 238, 261; absent, 7–8; active, 53; approval of, 61, 83; automated, 4, 214; bot detection, 63–66; ethics of, 34; interview-based, 112; in-person, 88; interruption of, 160, 196; limits of, 16; manual, 214; monitoring, 68–71; passive, 53; survey, 52; theoretical sampling, 82; time of, 11; Twitter, 16, 18 data quality, 1, 21, 66, 72 data science, 78, 93

data scraping, 2, 4, 13–14, 53. See also web crawler; web scraping data security, 62, 70 Dedoose, 215–217, 219 density plots, 40–42 digital literacy, 178, 247, 249. See also internet skills digital publicity, 132 digital trace data, 93, 124, 128–129, 132–135 disinformation, 6, 8–10, 15, 17, 35, 42–44 dissertation, 85, 184, 186, 205–206, 209, 211, 220–221 Dodgeball, 85–86 ecological momentary assessment (EMA), 2, 226–228, 231, 233, 236 election, 9–11, 17, 30–35, 39, 41–44, 68, 125 emoticon, 36 English, 9, 14, 104, 106–111, 114–115, 119, 123 errors, 17, 20, 22, 36–39, 71–72, 216, 221 ethics, 1, 34, 61, 82, 106, 113, 174 ethnography, 2–3, 5, 123, 126–129, 133, 135, 143–146, 152, 154, 156–158, 160–162, 168, 170, 172–173, 175, 178–180, 190, 192; hybrid, 127 Etsy, 207–210, 212, 215 Europe, 104, 225. See also European Union European Union (EU), 23–24. See also Europe Excel, 36, 100, 215–218, 253 Facebook, 7, 12, 23, 23, 31–32, 75, 162, 166; account, 11, 65, 146, 152, 175–176, 178, 252; adoption, 53, 55; ads, 55, 57–61, 64–65, 67–68, 75; API, 215; feed, 144; friends, 147, 155–156, 169, 177; posts, 3, 131, 134, 144, 146, 147, 154, 174, 215; privacy settings, 245, 252; recruitment, 52, 71, 87, 89, 173, 175, 179; sampling, 88; setup, 251–252 fake accounts, 43–44, 66 fake news, 31, 68, 125 field access, 139 flyers, 101, 106–111, 209, 211, 256–257 focus groups, 2, 4, 78, 87–88, 184, 188, 197–199, 247–248, 250–251, 254–260

273 INDEX

followers, on social media, 14, 21, 32–33, 36–38, 41–43, 134–135 gatekeepers, 139 Gay and Lesbian Alliance Against Defamation (GLAAD), 50 GED, 145–154, 157 gender identity, 8, 51, 53, 55–56, 58, 109 General Data Protection Regulation (GDPR), 23–24 Germany, 30–33, 35, 38–39, 41–43, 101–102, 104–107, 110–111, 113–116, 118–119 gift, 198; code, 57, 62–64, 66–67, 70; card, 113 GitHub, 31, 38 Gmail, 245, 252 Google, 12–13, 53, 67, 71–72, 104, 128, 134, 253; AdWords, 56, 67; Chrome, 252–253; Docs, 124; Hangouts, 218; News, 245; Plus, 215; ReCAPTCHA, 63; Sheets, 106–107; Translate, 11 GPS, 86 Grindr, 84 grounded theory, 82–83, 86, 162 guinea pig effect, 238 hard-to-reach population, 50–51, 54, 61, 69, 71–72, 91, 160–161. See also access to a community or population hashtags, 11–12, 15, 32–33, 39 historiography, 78 Hitchhiker’s Guide to the Universe, The (Adams), 34 HIV/AIDS, 50–52, 55, 68, 72 homeless populations, 4, 69, 160–162, 164–165, 167–175, 177, 179–180 HTML, 13 incentives, 43, 56, 62, 66–67, 70, 111, 196, 198, 232–234, 237, 256, 259 Indonesia, 90 Instagram, 52–53, 55, 57–59, 61, 64–65, 67–68, 72, 75, 177, 214, 217 Institutional Review Board (IRB), 8, 61–62, 82–83, 165, 207, 211–212, 218, 233, 256–257

International Communication Association, 44, 221 Internet Archive, 12–14 Internet Research Agency (IRA), 8–24 internet skills, 31, 113, 185, 209, 220, 231, 234, 236, 245–247, 249–250, 254, 259. See also digital literacy intervention, 50, 52, 125, 127, 144, 186, 245, 248, 251, 260–261; program, 246, 253, 259 interview, 2–4, 56, 88, 91, 93–94, 101, 105, 110, 123–125, 128, 145, 176–177, 179, 193, 196, 198–199, 206; 211, 213, 218–220, 247–248, 250, 254, 256–257; chat-based, 116; cognitive, 56–57; consent, 193; in-depth, 78, 87, 184, 188, 190, 192; in different languages, 116; in-person, 71, 84, 187; intercept, 81, 92; participant privacy, 113–114; phone, 89, 191; protocol, 102, 106, 112, 114–116, 118, 193; recruitment, 90, 107, 111, 162–166, 169, 174,; semi-structured, 102, 218; translation of, 119 interviewers, 52, 56, 195; absent, 66; training of, 194; well-being, 117–118 iPhone, 89, 253 Iran, 23 Javascript, 13 journalism, 123–125, 130–131, 134–136; political, 123, 125–126, 129, 135 journalists, 4, 103, 124–126, 131–136, 169; political, 123–126, 130–131, 134, 136 JSON code, 14 Kansas City, 245–247, 249, 254, 257 keyword, 6, 15, 32–33, 39 Knight Foundation, 10; data, 11–13 LGBTQ, 69 likes, 21, 30, 33, 93, 148, 175, 214 LinkedIn, 134 Linux, 13, 17; cron, 17–18 listserv, 16, 111–112, 134 log data, 93–94 longitudinal studies, 34, 128, 226–227, 231

274 INDEX

low-income populations, 4, 162, 165, 167–168, 174, 245–248, 250, 255, 257 machine learning, 35, 40, 44, 143 mail, 52 marginalized populations, 108, 245–246, 261 materiality, 131 measurement burst study (MBS), 225, 227–228, 233 measurement reactivity, 237–239 media elite, 125 Mediapulse AG, 229 Mediawatch, 229, 233–235 metadata, 14, 16, 19–20, 22, 33–34, 39, 43, 129 Microsoft Windows, 253 Migros Magazin, 232 mixed-methods approaches, 2, 5, 184, 186–187, 200–201, 247; multistage, 247, 261 mobile media elicitation technique, 93 mobile phone, 52, 80–81, 85–86, 92–93, 145, 168, 185, 192–193. See also cell phone; smartphone movisens, 235–236 MSM (men who have sex with men). See AMSM MySpace, 152 NBC, 19–22 Netvizz, 32 network analysis, 78, 128 New York, 147, 152 newspapers, 125–126, 226, 229–230, 232 nonprobability, 52, 54, 57, 71, 82 Northwestern University, 206, 212, 214, 221 objects, 124, 129–130, 135–136 older adults, 174, 225–229, 231–233, 246–247, 249, 258 open-source, 13 Oxford Internet Institute, 35 PageRank algorithm, 12 partnership, 236, 248, 260, 261

payment, 105, 110, 113, 163 PayPal, 113 password-protected, 213, 215, 257 PDF, 9, 19 Pew Research Center, 52, 55 phone number, 62, 63, 66, 106, 145, 167, 176–178, 191, 198, 218, 233 photos, 51, 93–95, 128, 130–135, 152, 156– 157, 175, 177 Pinterest, 133, 215, 217, 247 political communication, 30–31, 44 political journalism. See journalism: political political journalist. See journalist: political political parties, 33, 39, 42 politics, 7–9, 23, 42–44, 83, 124, 134, 217, 246 pretesting, 56 privacy, 5, 54, 61, 72, 93, 106, 113, 247, 251; concerns, 4, 8, 95, 229; practices, 3, 101–103, 112; policy, 114; protections, 17, 34; respondent’s, 61–62; risks, 103; settings, 39, 103, 245, 252–253; tools, 105 Python, 11, 13–14, 16–17, 36–38 qualitative research, 78–82, 84, 88, 91, 93, 95, 123–124, 127, 129, 162, 191, 199, 255. See also ethnography; focus groups; interview Qualtrics, 213 quantitative research, 79. See also data scraping; digital trace data; log data; survey radio, 126, 226–227, 229–230, 235; Swiss, 229 random-route technique, 195 recruitment, 92, 95, 106; challenges, 61, 82–83, 88, 118, 198, 209, 232; email, 106, 211; flyers, 106–110; opportunity, 174; safety, 111; sampling issue, 81; snowball sampling, 90; social media, 50, 50–54, 65, 68–69, 71–72; strategy, 57, 198, 210, 231, 247, 250, 256; street, 111–112, 119; survey, 2, 114; trial run, 105

275 INDEX

retweet, 7, 13–15, 20–21, 30, 33, 35, 39, 43–44, 214, 216. See also tweet RSS, 214 Russia, 6, 8–11, 23 sampling, 1–2, 4, 12, 15, 19, 34, 51–54, 57, 78–93, 95–96, 189, 192, 194–195, 210–211, 234, 236 scripts, 13–14, 17–19, 36–39, 43, 107, 214–215 security, 113; concerns, 5; data, 70, 178; digital, 101; guards, 148, 176; measures, 57, 62–63, 67, 70; of sites, 68; practices, 3, 101–103, 112, 247, 251; risks, 103; settings, 253; systems, 57; tools, 105, 252; serendipity, 130, 136 sexual orientation, 8, 53, 58–59, 65 sex work, 2–4, 101–107, 110–114, 116–119 Singapore, 90 Skype, 195, 214, 218 Slack, 124 smartphone, 53, 94, 161–162, 166, 168, 172–173, 177, 180, 188, 231, 252–253. See also cell phone; mobile phone SMS.ac, 85–86 Snapchat, 52–53, 55, 57–59, 64, 67, 72 social bots, 30–35, 40–44 social movements, 9 software, 6, 63, 215, 217, 219, 227, 235–236, 250, 253; engineers, 36 spam, 42, 44, 252 Stack Overflow, 18 Stata, 215, 217–218 studying up, 123–124, 127, 136 surveys, 2–4, 50–59, 61–64, 66–71, 103, 107, 114, 125, 184, 188–191, 194, 196–197, 199–200, 206, 212–214, 217–218, 226–227, 231, 233–237; web, 52, 70 Switzerland, 102, 104, 106–107, 119, 229, 232 tablet, 253 teens, 4, 51, 53, 55, 58, 60, 144–145. See also adolescents television (TV), 59, 118, 207, 226–227, 229–230, 234–235, 238 terms of service (TOS), 7–8, 16–17, 21–22, 24, 34, 88

TheCraftStar, 210, 212 “think aloud” interviews, 56 Todo Chile Comunicado, 185, 189, 200 transcripts, 92, 119, 219, 257 transgender: adolescents, 50; definition of, 51; identity, 53; individuals, 65; survey, 56, 67; youth, 50, 52, 54–56, 58, 61 translation, 52, 114–115, 119, 128, 133, transparency, 23, 42–43, 78, 91, 158 triangulation, 91–93, 133, 136, 186 Turing, Alan, 31 Turkey, 23 tweet, 2, 7, 9–16, 18–23, 31, 33, 36–37, 39–40, 42–44, 313. See also retweet Tweetbotornot, 35, 44 Twint, 16–18 Twitter, 6–7, 9–16, 20–24, 30–33, 43–44, 52–53, 58, 67–68, 71, 85–88, 128, 131, 134–135; accounts, 9, 11, 32, 34–36, 43–44, 129, 136; API, 15–16; bots, 4; data, 2, 4, 10, 21–22, 44; data collection, 16, 18; feeds, 214; followers, 33; login, 36; TOS, 7–8, 16–17, 34; users, 36, 40 Uber, 207, 209 unbiased samples, 205, 209. See also bias unexpected data, 3, 19, 123–124, 128–130, 132, 135–136 United States, 9, 23, 42, 50, 53–54, 58–59, 72, 82, 89, 104, 114, 124, 174, 210, 218, 247 University of Kansas, 245, 247 University of Texas, 184–186 University of Zurich, 118, 233 unpredictable, 200, 206 “Up the Anthropologist,” 127 URL, 34–35, 67–38, 70, 106, 114, 214, 152 validity, 69, 80–81, 95, 227, 229, 237, 252 VerbalInk, 219 video, 57, 67, 74, 102, 114–116, 128–129 visual data, 38, 41, 88, 94, 111, 133 Washington, DC, 3, 123–125, 128–129, 131, 133–136. See also Beltway WayBackPack, 13

276 INDEX

weather, 160, 166, 200 Web 2.0, 216–217 web analytics, 126 web crawler, 12. See also data scraping; web scraping web scraping, 14, 21. See also data scraping; web crawler well-being, 117, 119, 225–231, 234–235, 237–238

White House, 124–126 Wi-Fi, 176, 178, 253 Wikipedia, 209 youth, 2–4, 50–59, 61–62, 64, 69, 71–72, 144–145, 173 YouTube, 215 Zurich, 2, 101, 106–107, 110, 232