The CAPTCHA: Perspectives And Challenges Perspectives And Challenges In Artificial Intelligence 3030293459, 9783030293451, 3030293459, 9783030293451

This book discusses the CAPTCHA (completely automated public Turing test to tell computers and humans apart), an artific

843 138 7MB

English Pages 133 Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

The CAPTCHA: Perspectives And Challenges Perspectives And Challenges In Artificial Intelligence
 3030293459,  9783030293451,  3030293459,  9783030293451

Table of contents :
Preface......Page 8
Contents......Page 9
List of Figures......Page 11
List of Tables......Page 15
1.1 Artificial Intelligence......Page 17
1.2 Turing Test......Page 18
References......Page 21
2.1 The Concept of Human-Computer Interaction......Page 22
2.2 Usability......Page 26
References......Page 28
3.1 Human Interactive Proof......Page 30
References......Page 32
4.1 Definition of CAPTCHA......Page 33
References......Page 35
5.1 CAPTCHA Elements......Page 36
5.2 Advantages and Limitations of CAPTCHA......Page 37
References......Page 39
6.1 Categorization of CAPTCHA......Page 41
References......Page 44
7.1 Slider CAPTCHA......Page 45
7.2 NoCAPTCHA reCAPTCHA......Page 47
7.3 Invisible reCAPTCHA......Page 48
7.4 Image-Based CAPTCHA......Page 49
7.5 Social Recognition CAPTCHA......Page 54
7.6 Game-Based CAPTCHA......Page 55
References......Page 64
8.1 Designing the Text-Based CAPTCHA......Page 66
8.2 Designing the Image-Based CAPTCHA......Page 71
8.3 Designing the Other CAPTCHA Types......Page 75
8.4 How to Create a Simple CAPTCHA......Page 79
References......Page 87
9.1 Exploring the CAPTCHA Usability via Statistical Analysis......Page 88
9.2.2 Data Collection......Page 106
9.2.3 Dataset Creation......Page 107
9.2.4 Association Rules Extraction......Page 109
9.2.5 Results......Page 110
References......Page 114
10.1 The CAPTCHA Prediction Model......Page 115
10.1.1 The CAPTCHA Tests......Page 116
10.1.3 Experiment......Page 117
10.1.4 Results......Page 119
10.2 Exploring the Image-Based CAPTCHA Usability via Association Rule Mining......Page 122
10.2.1 Experiment......Page 124
10.2.2 Results and Discussion......Page 128
References......Page 132
Conclusions......Page 133

Citation preview

Smart Innovation, Systems and Technologies 162

Darko Brodić Alessia Amelio

The CAPTCHA: Perspectives and Challenges Perspectives and Challenges in Artificial Intelligence

Smart Innovation, Systems and Technologies Volume 162

Series Editors Robert J. Howlett, Bournemouth University and KES International, Shoreham-by-sea, UK Lakhmi C. Jain, Faculty of Engineering and Information Technology, Centre for Artificial Intelligence, University of Technology Sydney, Broadway, NSW, Australia

The Smart Innovation, Systems and Technologies book series encompasses the topics of knowledge, intelligence, innovation and sustainability. The aim of the series is to make available a platform for the publication of books on all aspects of single and multi-disciplinary research on these themes in order to make the latest results available in a readily-accessible form. Volumes on interdisciplinary research combining two or more of these areas is particularly sought. The series covers systems and paradigms that employ knowledge and intelligence in a broad sense. Its scope is systems having embedded knowledge and intelligence, which may be applied to the solution of world problems in industry, the environment and the community. It also focusses on the knowledge-transfer methodologies and innovation strategies employed to make this happen effectively. The combination of intelligent systems tools and a broad range of applications introduces a need for a synergy of disciplines from science, technology, business and the humanities. The series will include conference proceedings, edited collections, monographs, handbooks, reference books, and other relevant types of book in areas of science and technology where smart systems and technologies can offer innovative solutions. High quality content is an essential feature for all book proposals accepted for the series. It is expected that editors of all accepted volumes will ensure that contributions are subjected to an appropriate level of reviewing process and adhere to KES quality principles. ** Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, Google Scholar and Springerlink **

More information about this series at http://www.springer.com/series/8767

Darko Brodić Alessia Amelio •

The CAPTCHA: Perspectives and Challenges Perspectives and Challenges in Artificial Intelligence

123

Darko Brodić Technical Faculty in Bor University of Belgrade Bor, Serbia

Alessia Amelio Department of Computer Science Engineering, Modelling, Electronics and Systems (DIMES) University of Calabria Rende, Cosenza, Italy

Darko Brodić is Deceased ISSN 2190-3018 ISSN 2190-3026 (electronic) Smart Innovation, Systems and Technologies ISBN 978-3-030-29344-4 ISBN 978-3-030-29345-1 (eBook) https://doi.org/10.1007/978-3-030-29345-1 © Springer Nature Switzerland AG 2020 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

DOOR HANDLE: Oh useless. I forgot to tell you (laughs) I am locked. ALICE: Oh no. DOOR HANDLE: But yes. But on the other hand, you have the key, don't you? ALICE: What key? DOOR HANDLE: Don't tell me you left it up there? ALICE: Oh, oh God. And now how do I do it? DOOR HANDLE: Try the box, it’s obvious. (Alice in Wonderland)

To whom it may concern, with the hope that the research described in this book may be only used in the future for peaceful purposes

Preface

The aim of this book is to provide an overview of the main concepts and trends underlying the CAPTCHA test, which is an artificial intelligence-based test widely spread on today’s Web sites. Consequently, we hope that this book can be helpful to a broad audience from the scientific community of artificial intelligence, human–computer interaction, and pattern recognition. But we also hope that this book can be a valid support to undergraduate and Ph.D. students for shedding light on the current features and limitations of the described approaches in order to favor new CAPTCHA designs as well as new research directions on this topic. The creation of this book required three years of work. We would like to thank the Editorial Office as well as the anonymous reviewers for their invaluable support in all steps of setting and preparation of the chapters. Also, we are indebted to our research institutions, in particular, the Department of Computer Science Engineering, Modeling, Electronics, and Systems (DIMES) of the University of Calabria, and the Technical Faculty in Bor of the University of Belgrade, for their understanding and sincere interest. Finally, we would like to thank all our collaborators and co-authors who shared this research direction with us over the years: Radmila, Ivo, Dejan, Nadeem, Syed Khurram, Zoran, Milena, Sanja, etc. may this book not to be the end point, but the starting point of future international cooperation, that we always considered the essential part of a research activity. Bor, Serbia Rende, Italy

Darko Brodić Alessia Amelio

ix

Contents

1

Artificial Intelligence and Turing Test 1.1 Artificial Intelligence . . . . . . . . . . 1.2 Turing Test . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

1 1 2 5

2

Human-Computer Interaction . . . . . . . 2.1 The Concept of Human-Computer 2.2 Usability . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . .

......... Interaction . ......... .........

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

7 7 11 13

3

Human Information Processing (HIP) . . . . . . . . . . . . . . . . . . . . . . . 3.1 Human Interactive Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15 15 17

4

CAPTCHA Basics . . . . . . . . . . 4.1 Definition of CAPTCHA . 4.2 Tasks of CAPTCHA . . . . References . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

19 19 21 21

5

Characteristics of CAPTCHA . . . . . . . . . . . . . . 5.1 CAPTCHA Elements . . . . . . . . . . . . . . . . 5.2 Advantages and Limitations of CAPTCHA References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

23 23 24 26

6

Types of CAPTCHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Categorization of CAPTCHA . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29 29 32

7

Direction of CAPTCHA . . . . . . . 7.1 Slider CAPTCHA . . . . . . . . 7.2 NoCAPTCHA reCAPTCHA 7.3 Invisible reCAPTCHA . . . .

33 33 35 36

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

xi

xii

Contents

7.4 Image-Based CAPTCHA . . . . . 7.5 Social Recognition CAPTCHA 7.6 Game-Based CAPTCHA . . . . . References . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

37 42 43 52

8

CAPTCHA Programming . . . . . . . . . . . . . . . 8.1 Designing the Text-Based CAPTCHA . . 8.2 Designing the Image-Based CAPTCHA . 8.3 Designing the Other CAPTCHA Types . 8.4 How to Create a Simple CAPTCHA . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

55 55 60 64 68 76

9

CAPTCHA and Symbiotic Interaction . . . . . . . . . . . . . . . . . . . 9.1 Exploring the CAPTCHA Usability via Statistical Analysis . 9.2 Exploring the Text-Based CAPTCHA Usability via Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Dataset Creation . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Association Rules Extraction . . . . . . . . . . . . . . . . . 9.2.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

... ...

77 77

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

10 New Trends and Challenges in CAPTCHA Programming 10.1 The CAPTCHA Prediction Model . . . . . . . . . . . . . . . 10.1.1 The CAPTCHA Tests . . . . . . . . . . . . . . . . . . 10.1.2 The Regression Tree Strategy . . . . . . . . . . . . 10.1.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Exploring the Image-Based CAPTCHA Usability via Association Rule Mining . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Results and Discussion . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. 95 . 95 . 95 . 96 . 98 . 99 . 103

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

105 105 106 107 107 109

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

112 114 118 122

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

List of Figures

Fig. 1.1 Fig. 1.2 Fig. 1.3 Fig. 2.1 Fig. 2.2 Fig. 4.1 Fig. Fig. Fig. Fig.

6.1 6.2 6.3 6.4

Fig. 7.1

Fig. 7.2 Fig. 7.3

Fig. Fig. Fig. Fig. Fig.

7.4 7.5 7.6 7.7 7.8

Fig. 7.9 Fig. 7.10

The elements of Turing test called the Original Imitation Game . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The elements of the Turing test called the Standard Turing Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The elements of the Turing test called the Extended Turing Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Human-computer interaction data processing flow . . . . . . . . Overlapping relation among the task, system and user . . . . . An example of text-based CAPTCHA: a with only text, and b with only numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . An example of image-based CAPTCHA . . . . . . . . . . . . . . . . An example of FaceDCAPTCHA [2] . . . . . . . . . . . . . . . . . . An example of a video and b audio-based CAPTCHA . . . . The examples of other types of CAPTCHA: a QRBGS CAPTCHA [3], b Dice CAPTCHA [4] . . . . . . . . . . . . . . . . The slider CAPTCHA on the “They make Apps” website: a the CAPTCHA starting interface, and b the status of the test during the moving of the cursor . . . . . . . . . . . . . . . . . . . The Adafruit blog’s slider CAPTCHA . . . . . . . . . . . . . . . . . NoCAPTCHA reCAPTCHA samples: a with tick box only, b with tick box, image and text field, and c with tick box and image-based CAPTCHA with image list . . . . . . . . . . . . Two Invisible reCAPTCHA test samples . . . . . . . . . . . . . . . A sample of the Animals in the wild CAPTCHA [7] . . . . . . A sample of the House number CAPTCHA [7] . . . . . . . . . . A sample of the Old woman CAPTCHA [7] . . . . . . . . . . . . Two samples of image-based CAPTCHA: a worried face CAPTCHA, and b surprised face CAPTCHA [7] . . . . . . . . . A sample of the Animated character CAPTCHA [7] . . . . . . A sample of the picture of the CAPTCHA test [7] . . . . . . . .

..

3

..

4

.. .. ..

4 10 13

. . . .

. . . .

20 30 30 31

..

32

.. ..

34 34

. . . . .

. . . . .

35 36 37 37 38

.. .. ..

38 39 39

xiii

xiv

Fig. 7.11

Fig. 7.12 Fig. 7.13

Fig. 7.14 Fig. 7.15 Fig. 7.16 Fig. 7.17 Fig. 7.18 Fig. 7.19 Fig. 7.20 Fig. 7.21 Fig. 7.22 Fig. 7.23 Fig. 7.24 Fig. 7.25 Fig. 7.26 Fig. 7.27 Fig. 7.28

Fig. 7.29

List of Figures

Overview of the CAPTCHA samples website: a home page, b text label visualized in the case of wrong answer to the test, and c text label with the time required by the user for correctly solving the CAPTCHA test and the number of attempts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Microsoft Asirra CAPTCHA [9] . . . . . . . . . . . . . . . . . . . . . . The principle underlying the What’s Up CAPTCHA [10]. Images on the left are randomly rotated. Images on the right are the corresponding ones in their upright position . . . . . . . A sample of the AgeCAPTCHA interface [11] . . . . . . . . . . . A sample of social recognition CAPTCHA on Facebook . . . A first type of game for FunCAPTCHA. The user is asked to rotate the image in the right way . . . . . . . . . . . . . . . . . . . . . A second type of game for FunCAPTCHA. The user is asked to move the image of the woman into the middle . . . . . . . . . A third type of game for FunCAPTCHA. The user is asked to complete three series of crosses in the tic tac toe . . . . . . . . . A fourth type of game for FunCAPTCHA. The user is asked to match the tennis ball to the sports equipment . . . . . . . . . . A first sample of Sweet CAPTCHA. The user is asked to drag the sticks on the left to the drum on the right. . . . . . A second sample of Sweet CAPTCHA. The user is asked to drag the missing part on the left to its place on the right . . . A sample of Homo-sapiens Dice CAPTCHA . . . . . . . . . . . . A sample of All-the-rest Dice CAPTCHA . . . . . . . . . . . . . . A sample of Motion CAPTCHA: a the initial panel, and b the solution to the test . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two samples of the animated CAPTCHA . . . . . . . . . . . . . . A sample of the animated math CAPTCHA [17] . . . . . . . . . A first sample of the CAPTCHA test with visual effects puzzles [17] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two samples of CAPTCHA with other visual effects puzzles [17]: a an advertisement is rendered in 3D inside the screen together with the moving text, and b, c the text positions on a rotating sphere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A sample of Interactive game-based CAPTCHA [17]: a the flight simulator is positioned on the road depicting an advertisement, and a balloon showing the advertisement is depicted in the sky just above the road, b the first balloon is hit by the user, and c other two balloons are shown to be hit by the user . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

.. ..

40 41

.. .. ..

41 42 43

..

44

..

45

..

45

..

46

..

46

.. .. ..

46 47 47

.. .. ..

48 49 50

..

50

..

51

..

52

List of Figures

Fig. 8.1

Fig. 8.2

Fig. 8.3

Fig. 8.4 Fig. 8.5 Fig. 8.6 Fig. 8.7 Fig. 8.8

Fig. 8.9

Fig. 8.10 Fig. 8.11 Fig. 8.12 Fig. 8.13 Fig. 8.14 Fig. 8.15 Fig. 9.1 Fig. 9.2 Fig. 9.3 Fig. 9.4

xv

Overview of the attacking process to a text CAPTCHA. From the left to the right: original text CAPTCHA image, pre-processing with color removal, segmentation and final recognition [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samples of alphanumeric text CAPTCHA where anti-segmentation techniques are applied: a complex background, b extra lines, c collapsing, and d random noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Samples of text CAPTCHA where distortion anti-recognition technique is applied: a with only letters, and b with only numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different types of anti-recognition techniques applied on the texts of the CAPTCHA . . . . . . . . . . . . . . . . . . . . . . . The demo panel of the BotDetect CAPTCHA generator . . . . A sample of image-based CAPTCHA where rotation and controlled distortion are visible on the images [3] . . . . . . . . Another sample of image-based CAPTCHA where rotation is applied on some of the visualized images . . . . . . . . . . . . . A sample of image-based CAPTCHA where the visualized images belong to different domains and are quite complex in their composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Different sample human faces with natural variations, controlled distortions, illumination variations, and complex background [4] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Four types of atomic image distortions: a luminance, b color quantization, c line and curve noise, and d dithering [5] . . . Two samples of composite distortion [5] . . . . . . . . . . . . . . . Two samples of reCAPTCHA where a mixture of noise and artificial distortions are applied on the text . . . . . . . . . . . A sample of interaction between the legitimate user and the web server [9] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A sample of text CAPTCHA which is composed of four digits [13] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A sample of image-based CAPTCHA with also text and numbers [14]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A sample of text-based CAPTCHA with only text (a), and text-based CAPTCHA with only numbers (b) [1] . . . . . . . . . Procedure for (dis)proving a hypothesis according to the Mann-Whitney U test [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . Procedure for (dis)proving a hypothesis according to the Pearson’s correlation coefficient test [1] . . . . . . . . . . . . . . . . Hypothesis 3: minimum response time (a) and mean response time (b) of the group of male users which solves the text and image-based CAPTCHAs [1] . . . . . . . . . . . . . . . . . . . . . . . .

..

56

..

57

..

58

.. ..

59 59

..

61

..

61

..

62

..

63

.. ..

65 66

..

67

..

68

..

69

..

71

..

78

..

79

..

81

..

83

xvi

List of Figures

Fig. 9.5 Fig. 9.6 Fig. 9.7

Fig. 10.1 Fig. Fig. Fig. Fig.

10.2 10.3 10.4 10.5

Fig. 10.6

Fig. 10.7

Fig. 10.8

Fig. 10.9

Fig. 10.10

Fig. Fig. Fig. Fig. Fig. Fig.

10.11 10.12 10.13 10.14 10.15 10.16

Results of the Shapiro-Wilk’s test on the collected data in terms of p-value [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Results of the Kruskal-Wallis’ H test for the different hypotheses H1–H4 [3] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FP-trees generated by the FP-Grow algorithm for: a text-based CAPTCHA with only text and b text-based CAPTCHA with only numbers . . . . . . . . . . . . . . . . . . . . . . . The adopted House number CAPTCHA with reCAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The FunCAPTCHA adopted in the experiment [1] . . . . . . . . The algorithm for constructing a regression tree [1] . . . . . . . Feature values in the three datasets [1] . . . . . . . . . . . . . . . . . Regression tree for House number CAPTCHA [1]. x1 indicates the education level, x2 indicates the age, and x3 indicates the Internet experience. The value in the leaf represents the average response time over the cell associated to the leaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression tree for the picture of the CAPTCHA [1]. x1 indicates the education level, x2 indicates the age, and x3 indicates the Internet experience. The value in the leaf represents the average response time over the cell associated to the leaf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Regression tree for the FunCAPTCHA [1]. x1 indicates the education level, x2 indicates the age, and x3 indicates the Internet experience. The value in the leaf represents the average response time over the cell associated to the leaf . . . Average cvloss and resuberror values together with the standard deviation (in brackets) for the three types of CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Average cvloss trend together with the standard deviation (size of the vertical bars) of the three types of CAPTCHA at different values of the K parameter of the fold cross validation between 2 and 10 [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The image-based CAPTCHAs with facial expressions used in the analysis: (i) animated character, (ii) surprised face, (iii) old woman, and (iv) worried face [2]. . . . . . . . . . . . . . . Values for each user’s feature and its discretization [2] . . . . Example of transaction [2] . . . . . . . . . . . . . . . . . . . . . . . . . . Extracted ARs for the Animated character CAPTCHA [2] . . Extracted ARs for the Old woman CAPTCHA [2] . . . . . . . . Extracted ARs for the Surprised face CAPTCHA [2] . . . . . . Extracted ARs for the worried face CAPTCHA [2] . . . . . . .

..

91

..

94

. . 100 . . . .

. . . .

106 107 108 109

. . 110

. . 110

. . 111

. . 111

. . 113

. . . . . . .

. . . . . . .

115 116 117 118 118 119 119

List of Tables

Table 2.1 Table 9.1 Table 9.2 Table 9.3 Table 9.4 Table 9.5 Table 9.6 Table 9.7 Table 9.8 Table 9.9 Table Table Table Table Table Table

9.10 9.11 9.12 9.13 9.14 9.15

Comparison between visual and touch interaction devices . . Hypothesis 1: Mann-Whitney U test for text-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 2: Mann-Whitney U test for image-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 3: Descriptive statistics for text-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 3: Descriptive statistics for image-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 4: Mann-Whitney U test for text and image-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 5: Mann-Whitney U test for text and image-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 6: Pearson’s correlation coefficient test for text-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 6: Pearson’s correlation coefficient test for image-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . Hypothesis 7: Mann-Whitney U test for text and image-based CAPTCHA [1] . . . . . . . . . . . . . . . . . . . . . . . . . Kolmogorov-Smirnov test for laptop users [2] . . . . . . . . . . . Kolmogorov-Smirnov test for tablet users [2] . . . . . . . . . . . H2: Mann-Whitney U test for laptop and tablet users [2] . . H3: Mann-Whitney U test for laptop and tablet users [2] . . H4: Mann-Whitney U test for laptop and tablet users [2] . . Features of the dataset and their eventually discretized values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

..

11

..

82

..

82

..

84

..

84

..

84

..

85

..

85

..

86

. . . . . .

. . . . . .

86 88 88 89 89 90

..

97

xvii

xviii

Table 9.16

Table 9.17

List of Tables

Association rules extracted from the dataset whose consequent is the response time to solve the text CAPTCHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Association rules extracted from the dataset whose consequent is the response time to solve the number CAPTCHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

Chapter 1

Artificial Intelligence and Turing Test

Abstract This chapter introduces the concept of artificial intelligence. Firstly, the artificial intelligence was subject to the interest of researcher Alan Turing. He proposed three different, but evolved tests of artificial intelligence called: (i) Original imitation game, (ii) Standard Turing test, (iii) Extended Turing test. In spite of some deficiencies, these tests are used as a golden standard in the evolving artificial intelligence evaluation.

1.1 Artificial Intelligence Artificial Intelligence (AI) represents a possession of machine intelligence by the computers in the sense of the following question confirmation: “Can a machine think?” [1]. The given question has a scientific as well as philosophical aspect. However, it is even more complex because the thinking process has outward as well as inward states of mind. If we consider computers only, they perform outward appearance of intellectual tasks. Accordingly, it is reasonable to ask the question: “Are they really thinking?”. If the answer to this question is positive, then they have similar behavior patterns to humans, which is typically called possession of artificial intelligence. In this way, artificial intelligence can be only linked to “thinking machines”, i.e. computer. Accordingly, the artificial intelligence characterizes various ways of computation which imitate the human behavior that could be defined as intelligent [2]. Furthermore, the artificial intelligence can be divided into weak and strong one. Weak AI implies only intelligent acting of the machine, while strong artificial intelligence represents actions which include real intelligence. In this way, we can think about strong artificial intelligence as of way of thinking. The further question that arises is: “What is the difference between thinking and thoughts?” Thinking includes correct reasoning, conceptualization and the right way of representation. The extension of thinking concerns the various thoughts that it can deploy. Hence, the primary task is to give the answer to the following questions: • Can the machines think at all? • Can the machine intelligence approaches or surpasses the human level of thinking? © Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_1

1

2

1 Artificial Intelligence and Turing Test

Modern computer science incorporates these questions in the following elements called intelligent behavior. It means possession of intelligence as well as behavior as an intelligent subject. From all aforementioned, the biggest problem is to draw the reference line which is the boundary between non-intelligent and intelligent behavior, which imitates the humans. In this way, it is closely related to the question: “What humans will do in such circumstances?”.

1.2 Turing Test Turing test is established to evaluate how much the machine level of intelligence is close to the human one. Hence, he suggested a test which is based on the conversational abilities. If these abilities between humans and machines are so similar that we cannot differentiate them, then we can conclude that a machine behaves like a human, which implicitly means that a machine possesses artificial intelligence. Basically, he proposed more similar tests. In the first test called Original Imitation Game [1], he introduces three players. The player A is a man, the player B is a woman, and the player C, i.e. the interrogator who can be either sex. Each player sits in different rooms. It means that they are completely separated from each other. Each room is connected via computer screen and keyboard in order to exchange communication mutually. The mutual communication was represented with a series of five-minutes keyboard conversations as test criteria. Player C has a list of written questions, which is asked to player A and/or player B. The main goal is to determine which of the players A and B is a man as well as a woman. Basically, a player A is trying to trick the interrogator C to make the wrong conclusion concerning his sex. On the contrary, player B is trying to help the interrogator C to achieve the right conclusion about the sex of the player A. The interesting point of the given test represents the type of player’s division into player A (woman) and player B (man). It is worth noting that the player A (woman) tricks the interrogator C to make a wrong decision about the sex of the players, while the player B helps the interrogator C to make a right choice about the sex of the players. In some sense, a neutral observer can get an impression of the test as a biased one due to a specific division of the players’ sex. The elements of this test are illustrated in Fig. 1.1 [3]. From Fig. 1.1 it can be noticed that the interrogator C cannot see the player A or player B because of the barrier, but he/she can ask the questions without any interference. Hence, their communication is clear and smooth. The following questions arise that have been asked: • “What will happen if a machine takes the part of player A in this game?” • “Will the interrogator C decide wrongly as often if the game is played like this as he does when the game is played between a player B (man) and a player A (woman)?” Both questions can be replaced with only one question: “Can machines think?”

1.2 Turing Test

3

Player A

Player B

Interrogator C

?

Fig. 1.1 The elements of Turing test called the Original Imitation Game

In the second Turing test, commonly called the Standard Turing Test, he proposed a modification of the previous test. Accordingly, he changed the players in the “game” by exchanging the player A with the machine (computer), while the player B will be a human of either sex. Now, we can conclude that this kind of test is more neutral compared to the previous one. Hence, it has higher scientific value. Furthermore, he asked himself the following questions [1]: • What will happen if a machine takes the part of player A in this game? • Will the interrogator C decide wrongly as often when the game is played like this as he does when the game is played between a player B (man) and player A (woman)? Again, both questions can be replaced with only one question, i.e. premise: “Can machines think?” Currently, both players A and B try to trick the interrogator C to make the wrong decision about them. The elements of this test are illustrated in Fig. 1.2 [3]. From Fig. 1.2 it is worth noting that the interrogator C cannot see the player A or player B because of the barrier between them. Still, he/she can ask the questions without any interference and freely communicate with them. The third version of the Turing test was proposed in 1952. In this version of the Turing test a jury asks questions to a computer. In this test, the role of the computer was to persuade a significant portion of the jury to believe that the computer is really a human. Obviously, in this test, Turing didn’t want to leave the evaluation of the intelligence only to one judge (interrogator), but to a wider range of humans. In this sense, the computer can trick the interrogator (only one human), but it is more

4

1 Artificial Intelligence and Turing Test

difficult to trick a wider group of humans like the jury. Accordingly, Turing steps up the level of the decision process. The elements of this test are illustrated in Fig. 1.3.

Player A

Player B ?

Interrogator C

Fig. 1.2 The elements of the Turing test called the Standard Turing Test

Player A

Player B

Jury C

Fig. 1.3 The elements of the Turing test called the Extended Turing Test

?

1.2 Turing Test

5

However, the main task of the Turing tests was not to create a universal test for determination of the artificial intelligence. On the contrary, it was to establish the interpretation of a machine’s ability to mimic some of the human behavior, which is linked with the intelligence. Hence, it represents a pioneer method established to determine the capability of the machine to behave like a human [1]. Still, the standard Turing test has some obvious limitations. They occur when the questionnaire is formulated in a manner like Yes/No. In such case, the Turing test loses any significance. However, in the case when the answers to the questions are expanded by many possibilities, the Turing test has validity. The problem also arises when the questions are linked with the knowledge based on the information source (Google search engine). In such circumstances, there is possibility that the computers can outperform the humans in the standard Turing test, which is mainly based on the verbal communication, i.e. intelligence. Hence, it can be noted that the standard Turing test did not actually test the computers’ intelligence. On the contrary, it explores whether a computer behaves like a human in the verbal area. In this way, it determines a cross-section of the human and intelligent behavior. Another limitation of the test is the proposition of its duration to only 5 min, which can be questionable. At the end, the Turing test can be established in so-called reverse order. In this type of Turing test commonly called Reverse Turing test, the roles between the computers and humans are reversed. In the standard Turing test, the main judge is the human. On the contrary, in the reverse Turing test the main judge is the computer. In this way, the computer judge is used as the most neutral compared to the human one, which can be biased. The basics of the reverse Turing test are used in the “Completely Automated Public Turing test to tell Computers and Humans Apart”, i.e. CAPTCHA. Still, it is worth noting that the standard Turing test includes only verbal human behavior. However, the human intelligence is a connection of verbal communication and image perception or their combination [4]. Hence, the standard Turing test can be used only as one, but important part, which is present in the CAPTCHA. Obviously, the missing part is related to the image perception element of the human behavior and intelligence, which was out of the scope of the standard or reverse Turing test.

References 1. Turing AM (1950) Computing machinery and intelligence. Mind 59:433–460 2. McCarthy J (1996) “The philosophy of artificial intelligence”, What has AI in common with philosophy? 3. Saygin AP, Roberts G, Beber G (2008) Comments on “computing machinery and intelligence” by Alan Turing. In: Epstein R, Roberts G, Poland G (eds) Parsing the Turing test: philosophical and methodological issues in the quest for the thinking computer 4. Belk M, Fidas C, Germanakos P, Samaras G (2015) Do human cognitive differences in information processing affect preference and performance of CAPTCHA? Int J Hum Comput Stud 84:118

Chapter 2

Human-Computer Interaction

Abstract Human-computer interaction is a key element in the communication between a human and a computer. Hence, the most advanced computer system takes a significant attention in improving this communication. This chapter analyzes the elements of human-computer interaction paying special attention to elements that the usability consists of. A particular consideration is dedicated to visual and contact devices as well as to their comparison. Then, the factors of the usability are defined as: (i) tasks, (ii) system, and (iii) users. Each of them is carefully explained with its constituent components. At the end, their mutual interaction is discussed.

2.1 The Concept of Human-Computer Interaction Human-Computer Interaction is firstly mentioned in the mid-80s in the previous century. It is defined as a discipline which has been focused on design, evaluation and implementation of man-computer interaction. Hence, it is deeply connected to the communication process between human and computer. Because it is linked with the process of designing interfaces or websites sometimes it is called Human-Computer Interface. However, the core of it is a system development of the software that can be effectively used by the humans, i.e. computer users. Accordingly, it should enable computer systems to be safe, efficient, easy to use as well as effective [1]. In this way, it is closely related to the elements of the usability. It means that the software should be: • Easy to use, • Effective to use, and • Want to be used. Furthermore, it is of great importance to introduce the usability concept by the methods and tools into the computing [2]. In the further discussion of the human-computer interaction, it is necessary to define two different, but closely related words: • Interface, and • Interaction. © Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_2

7

8

2 Human-Computer Interaction

The interface can be defined as “visible piece of a system that a user sees or hears or touches” [3]. On the contrary, the interaction is a more complex term concerning user’s activities linked with the process of communicating with computers via keyboard, mouse, touch screen, joystick, computer gloves, microphone, camera, etc. The general concept of human-computer interaction represents the simplification process of the computers’ technology use. Basically, it is about the translation process between humans’ actions and computer response. We should imagine that system as a communication between two subjects. One side is a human side, while the other is technology, i.e. computer. They need to achieve an effective dialogue. It should include the following elements [4]: • Human-computer interaction, • Communication based on the agreements on terms used in the dialogue, and • Communication based on the agreement on the context of the communication. If all three of the given criteria are not fulfilled, then the communication cannot be effective [5]. Hence, the parallel can be drawn with the human communication, which is characterized with or without mutual understanding. Obviously, for an efficient and successful communication between subjects, some prerequisite is needed, as mentioned before. In this way, the interaction between humans and computers is more user-centric bringing a better effectiveness and usability. Hence, it includes: • Interaction with and through technology, and • Seeking to understand and support the humans (computer users). That is why, the human-computer interaction should be included in the process of development as well as the process of the software use. Furthermore, it means that the users should be included into the process of development and implementation of the computer system, and much more in the process of the software evaluation, which includes the elements of cognitive and behavior factors. Accordingly, the main goal will be to create a software which is user-centric. Using a simple language, it should include a careful studying of communication between humans and computers, and increasing technological input/output techniques to improve efficiency, effectiveness and seamless use of their interaction. Hence, the main purpose of the human-computer interaction is to implement a computer system which is capable to match the users’ requirements and needs. To effectively conduct this process the following is needed [6]: • Involvement of users in the process, • Integration of different scientific disciplines like computer science, cognitive science, etc., and • Iteration of the process to be optimized as much as possible. Involving users in the process is invaluable. It is worth noting that the users will have a different level of education as well as professional and scientific background. However, the computer system should be universal as much as possible. To perform this process effectively, different scientific areas should be consulted like computer

2.1 The Concept of Human-Computer Interaction

9

science, cognitive science, etc. At the end, the process should be highly iterative in order to create a user-friendly environment. The iterative process will give all necessary feedback from the users either negative or positive. Consequently, it will set up a step-by-step process, which will lead toward optimize a user-centric solution. To accomplish the necessary goals, the users should make interaction with the computer system. Still, two similar, but different, terms are in use: • Interaction, and • Interactivity. Interaction means the communication between two subjects, i.e. user and computer system. It can be complex because they use a different “language” to communicate. The translation element represents the interface. The interface should be user-centric to effectively translate the communication between user and computer system [7]. The interface can be established using various techniques. In the beginning, the interface was line command oriented. The users were inputting commands, which were further interpreted to allow the computer to complete some tasks. This kind of communication is called indirect, because of the step-by-step process in performing tasks. However, the advanced technology enables the real-time interaction between user and computer, bringing up their communication in a so-called direct interaction. In this way, the interaction between user and computer was seamless. Still, to establish a user-friendly system, many factors that influence the humancomputer interaction should be considered [5]: • • • • • • • • • •

Organizational factors, Environmental factors, Health and safety factors, User, Comfort factors, User interface, Task factors, Constraints, System functionality, and Productivity factors.

The most important thing represents the careful trade-offs between these factors to create the optimized human-computer interaction. Basically, the user is unavoidably subject in this trade-offs process. But, if we consider from the cognitive psychology point of view, the human mind can be seen as a specific data processing unit. In that way, the human mind consists of the inputs and outputs. In order to process data, it needs to identify types of data as well as to classify them. Furthermore, it should analyze such a data to understand them and make decisions about them. The process of subtraction is invaluable. Hence, the data are processed in both directions [8]: • From the reality to the abstract models, • From abstract models to the reality.

10

2 Human-Computer Interaction

In the human-computer interaction we have two subjects given as human and computer. Each of them has its own input and output unit. Furthermore, the input of the computer represents the output of the humans as well as the output of the computer represents the input of the humans [9]. This circumstance is illustrated in Fig. 2.1. From Fig. 2.1, it is worth noting that the human-computer interaction includes elements like: • • • •

Presentation of data, Perception of the user, System control by the human, and User interface between human and computer.

USER

Fig. 2.1 Human-computer interaction data processing flow

Input

Output

Data Processing

Data Processing

Output

Input

COMPUTER

Furthermore, the interaction level between humans and computers has been established to create some specific goal. Due to the interaction is authorized to use different techniques. The first usable interaction technique was graphics oriented one. In these circumstances, the mouse (touchpad) and keyboard represented the elements for the manipulation. Still, some application like 3D needed more advanced interaction techniques. Hence, the hand gestures have been used for solving such a problem. This technique added more advanced communication skills [10]. However, this kind of interaction based on the contact elements is further explored introducing visual interaction elements. It is established using more advanced technology devices like Cyber glove, Soft Kinetic HD camera, etc. The comparison between visual devices versus contact devices is given in Table 2.1 [11]. From Table 2.1 it is worth noting that both technologies have its good and bad elements. Contact devices require the user cooperation. Hence, they can sometimes be uncomfortable to wear and use for a long time. Still, they are very precise during working with them. On the contrary, the vision-based devices do not require user

2.1 The Concept of Human-Computer Interaction Table 2.1 Comparison between visual and touch interaction devices

11

Criteria

Contact devices

Visual devices

User cooperation

Yes

No

User intrusive

Yes

No

Precise

Yes/no

No/yes

Flexible to configure

Yes

No

Flexibility to use

No

Yes

Occlusion problem

No (yes)

Yes

Health issues

Yes (no)

No

cooperation. However, they are complex to be configured. Also, they are experiencing some occlusion problems. At the end, it is important to notice that using some contact devices can create health problems like an allergy due to mechanical sensor material, or even cancer risk for magnetic devices because of high emission of the magnetic field in their neighborhood [11].

2.2 Usability Usability represents one of the main human-computer interaction goals. It is usually defined as a quality of [12]: • User interaction to perform tasks, • Made of as low as possible errors, and • Time needed to become a competent user. Hence, the usability refers to how quickly and easily the humans via the interface of the computer software need a time to successfully find a solution. Various cognitive and demographic factors can affect the usability. However, the usability is not only connected to the way of successfully finding a software solution, but also to the satisfaction of the users during the solving process. Hence, to explore the usability, the observation of the users’ behavior as well as pursuing the process of solving is very important. Hence, the set of questionnaires, which needs to be set to the users, is of great importance. It defines their levels of satisfaction, in spite of how quickly and easily they use the software. In that way, we can say that the usability is closely related to simplicity and user satisfaction [13]. As previously mentioned, the usability concept includes many constituent elements [4, 14]: • • • •

Functionality, Efficiency, Effectiveness, Satisfaction.

12

2 Human-Computer Interaction

The functionality is determined as a set of software services that are available to the users. Still, the importance of the functionality is visible when it is linked to the functional efficiency to the user. The efficiency is measured by the task completion time and learning time. Hence, these two characteristics are closely related one to another [15]. The effectiveness defines an accuracy and completeness in which the users can achieve given goals. Basically, the effectiveness is accomplished in a circumstance when an appropriate equilibrium between the functionality and usability of a system has been established [16]. The satisfaction represents a comfort or easiness of use of the given software system [14]. To achieve all mentioned elements, the task, system and user work jointly by their independent variables to establish the system usability. Hence, it is valid: Task ∪ System ∪ User

(1)

The task major independent variables are [5]: • Frequency, • Openness. The frequency is related to the number of times that any task should be performed by the user [4]. If the task is carried out rarely, then the users will need more assistance in order to finish it without errors. Furthermore, if it works frequently, then the user will be familiar with the task. Openness means possibility to modify a given task to some degree. Still, the concept of openness does not allow such organization of the task which is variable at all, but rather fixed to make it easier for the user. The system major independent variables are [5]: • It should be easy to learn, • It should be easy to use, • It should be task match. Ease of learning implied easy understanding as well as an easy operation of the first seen system. It is true that it heavily depends on the user previous knowledge. Furthermore, ease to use refers to painless operation with a system when it is understood by the user. The task match indicates that the system provides matches concerning information and functions which are needed by its user. The user major independent variables are [5]: • Knowledge, • Motivation, • Discretization. Knowledge refers to the user’s knowledge and experience working with computers. Motivation is a key element. If someone has a high motivation, then he/she will inlay more energy to overcome given obstacles like problems with using the system. Discretization means that the user can satisfactorily choose a part of system to use for its purposes, which fulfill his/her requirements. The relation among the task, system and user can be illustrated as in Fig. 2.2.

2.2 Usability

13

Fig. 2.2 Overlapping relation among the task, system and user

USER Knowledge MoƟvaƟon DiscreƟzaƟon

TASK Frequency Openness

SYSTEM Ease of learning Ease of use Task match

Hence, considering all given dependent variables, the usability should allow a seamless use of the system which can provide working satisfaction to the user by a design that needs to be efficient, effective, safe, easy to learn, easy to understand, easy to remember and easy to use [17].

References 1. Preece J, Rogers Y, Keller L, Davies G, Benyon D (1993) In: Preece J (ed) A guide to usability, “human factors in computing”. Addison-Wesley, Wokingham 2. Carroll JM (2002) Human-computer interaction in the new millennium. Addison-Wesley, New York 3. Head AJ (1999) Design wise. Thomas H, Hogan Sr, Medford 4. Booth P (1989) An introduction to human-computer interaction. Lawrence Erlbaum Associates Publishers, Hove, East Sussex 5. Issa T, Isaias P (2015) Sustainable design, chapter usability and human computer interaction (human-computer interaction). Springer, London 6. Preece J, Rogers Y, Benyon D, Holland S, Carey T (1994) Human computer interaction. Addison-Wesley, Wokingham 7. Dix A, Finlay J, Abowd G, Beale R (2004) Human-computer interaction, 3rd edn. Pearson Education Limited, Harlow 8. Kaptelinin V (2001) Activity theory: implications for human-computer interaction. In: Context and consciousness: activity theory and human-computer interaction. MIT Press, Cambridge, Massacusets 9. Moran TP (1981) The command language grammar: a representation for the user interface of interactive computer systems. Int J Man Mach Stud 15(1):3–50 10. Rautaray SS, Agrawal A (2015) Vision based hand gesture recognition for human computer interaction: a survey. Artif Intell Rev 43(1):1–54

14

2 Human-Computer Interaction

11. Kanniche M (2009) Gesture recognition from video sequences. Ph.D. thesis, University of Nice, Nice, France 12. Benyon D, Turner P, Turner S (2005) Designing interactive systems: a comprehensive guide to human-computer interaction and interaction design, 2nd edn. Pearson Education Limited, Edinburgh 13. Rhodes JS (2000) Usability can save your company. http://webword.com/moving/ savecompany.html 14. Frøkjær E, Hertzum M, Hornbæk K (2000) Measuring usability: are effectiveness, efficiency, and satisfaction really correlated? In: Proceedings of SIGCHI conference on Human Factors in Computing Systems (CHI ‘00). ACM, New York, NY, USA, pp 345–352 15. Shneiderman B, Plaisant C (2004) Designing the user interface: strategies for effective humancomputer interaction, 4th edn. Pearson/Addison-Wesley, Boston 16. Nielsen J (1994) Usability engineering. Morgan Kaufman, San Francisco 17. Issa T (2008) Development and evaluation of a methodology for developing websites. Ph.D. thesis, Curtin University, Western Australia

Chapter 3

Human Information Processing (HIP)

Abstract This chapter explains the elements of human interactive proof. As a starting point, it uses a set of protocols which are able to authenticate humans compared to a computer. Furthermore, it defines all necessary properties that the human interaction proof should satisfy. Also, it determines a minimal human success rate of passing the given set of protocols. At the end, it proposes some applications where it can be used.

3.1 Human Interactive Proof Human interactive proof represents a set of protocols that grants a human confirmation to the computer [1]. Primarily, it is a problem of humans’ authentication to the computer. However, it can be a possibility to authenticate the human (vs. machine), herself (vs. anyone else), an adult (vs. child), etc. [2]. Hence, it is closely related with the cryptographic security. The identification protocol can be represented as two probabilistic interactive programs pair (H, C), which share the input x, such that the following conditions are valid [3]: • For all auxiliary inputs x, P[{H(x), C(x)} = accept] > 0.9 or 90% • For each pair a and b where a < > b, P[{H(a), C(b)} = accept] < 0.1 or 10% In these cases, P represents the probability. Also, we can notify that when {H(x), C(x)} = accept, then H verifies his/her identity to the C, which leads that H authenticates to C. Accordingly, human-interactive proof is a set of tests, which are based on the basis discrimination. This discrimination is established between actions executed by the humans and activities performed by the computers [4]. From the aforementioned, it is worth noting that human interactive proof is a tool which is a puzzle challenge that can be easily solved by humans and almost unsolvable to the computers. The aim of human interactive proof is discouraging the script attacks by computers to pretend to be human activity on the computer. However, the human interactive proof should be easy to be solved by humans in order not to discourage them to use this service. Hence, it is an important tradeoff. © Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_3

15

16

3 Human Information Processing (HIP)

The human interaction proof should meet the following requirements [5]: • Differentiate humans from computers, • Differentiate one category of humans from the other ones, • Differentiate a given human from the one category of humans. To emerge it, the computer should present a puzzle test that one group of humans can solve, while the other cannot. However, the obtained result should be validated by the computer and the process of verification should be freely online available [6]. Also, it should confirm the possibility of the following properties [6]: • • • • •

The test challenge should be automatically generated and graded, The judgement will be left to the computer, The test can be quickly and easily solvable by human users, The dialog between human users and computers should be relatively short, The test should be accepted by almost all humans in spite of their demographic differences, a small percent of the rejection, • The test will outstay reliable to a computer bot attack for many years in spite of technological advances, • The test’s algorithm should be known and publicly available as an open source. Basically, the human-interaction proof is based on a principle like the standard Turing test. However, differences exist. In this case the computer should validate the difference between humans and computers using human-interactive proofs. Still, the main aspect arises from the following question: Is it possible to create so user friendly HIP that can be easily solved by humans and very difficult to be solved by computers? Hence, the problem can be settled by introducing a standard Turing test with incorporation of some modifications. In contrast to the Turing test, the computer is interrogator and judge of the differentiation between humans, i.e. computer users, and the computer represented by computer scripts, i.e. computer bot programs (bots). Hence, it is necessary to design a puzzle program that has an ability to distinguish between computer users and computer bot programs. From the perspective of the standard Turing test, this task is the reverse Turing test. It should satisfy some criteria to be successful: • Computer users can successfully resolve the puzzle program in at least 90% of cases (typically called human success rate) [7–9], • Computer bots should be successful in solving the task below 0.1%. The aim of introducing such level of successfulness is the way to discourage the use of such services by the computer bots. Nowadays, human-computer interaction is used to protect many types of services like [10]: • E-mail spamming, • Online registration, • Ticket/Event registration,

3.1 Human Interactive Proof

• • • •

17

Online voting, Login, Chat rooms, Weblogs.

All above elements can satisfy specific challenge-protocols which allow to humans a secure authentication without requiring a specific element of forensic tools like biometric data, electronic key or any other physical evidence. Such a program is closely related to the elements of the standard Turing test. But, because the judge is a program, it is called a reverse Turing test. However, the commonly used name for this program is Completely Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) [2].

References 1. Hopper N (2001) Security and complexity aspects of human interactive proofs. In: Proceedings of the first HIP conference, pp 1–4 2. von Ahn L, Blum M, Hopper NJ, Langford J (2003) CAPTCHA: using hard AI problems for security. In Proceedings of advances in cryptology. Eurocrypt 2003. LNCS, vol 2656. Springer, Berlin, pp 294–311 3. Hopper NJ, Manuel B (2001) Secure human identification protocols. In: Boyd C (ed) Advances in cryptology—ASIACRYPT. LNCS, vol 2248. Springer, Berlin 4. Basso A, Bergadano F. (2010) Anti-bot strategies based on human interactive proofs. In Stavroulakis P, Stamp M, eds: Handbook of information and communication security, Springer 5. Dhamija R, Tygar J D (2005) Phish and HIPs: human interactive proofs to detect phishing attacks. In: Baird H, Lopresti D (eds) Proceedings of human interactive proofs: second international workshop (HIP 2005). Springer, Berlin, pp 127–141 6. First Workshop on Human Interactive Proofs (2002). Available http://www.aladdin.cs.cmu. edu/hips/events/ 7. Chellapilla K, Simard P (2004) Using machine learning to break visual human interaction proofs (HIPs). Advances in neural information processing systems 17. Neural information processing systems (NIPS ‘2004). MIT Press 8. Mori G, Malik J (2003) Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. In: Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition, 18–20 June 2003, Madison, Wisconsin, pp 134–141 9. Goodman JT, Rounthwaite R (2004) Stopping outgoing spam. In: Proceedings of the 5th ACM conference on electronic commerce, 17–20 May 2004, New York, NY, USA 10. Chellapilla K, Larson K, Simard P, Czerwinski M (2005) Designing human friendly human interaction proofs (HIPs). In: Proceedings of the SIGCHI conference on human factors in computing systems (CHI ‘05). ACM, New York, pp 711–720

Chapter 4

CAPTCHA Basics

Abstract This chapter describes the basic concepts underlying the CAPTCHA test. Specifically, the aim of the test is described, together with the main tasks of the CAPTCHA and the requirements that the test has to meet in order to guarantee a high security level. The CAPTCHA test is framed inside the human–computer interaction model and strongly connected with the concept of human interactive proofs, which are used to make a differentiation between the human users and computer bot programs.

4.1 Definition of CAPTCHA CAPTCHA which represents a puzzle program to be solved, is closely related to the three main elements: (i) the Turing test, (ii) the Human–computer interaction, and (iii) the Human interactive proofs. CAPTCHA is designed as a task in the form of a “challenge” program, which is used to authenticate the computer users. The main task of the CAPTCHA is to forbid the unauthorized access to specific websites. Hence, it should be easily solved by authorized users and hard to be solved by unauthorized users. Commonly, the unauthorized users represent the computer programs that attack the websites. It is clear that its main responsibility is to distinguish humans (users) from computers (computer programs). A computer program that tries to make malicious unauthorized access is the computer bot. CAPTCHA was firstly introduced in 2000 [1], and can be seen as a cryptographic program which is easily solved by humans and hard to be solved by computers. Although it seems like an easy program barrier, it is related to a complex programming question linked to the security. Basically, the main phenomenon that refers the users is in the area of the intelligence, visual communication and cognitive psychology. Hence, CAPTCHA is described as a test that can differentiate humans from computers.

© Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_4

19

20

4 CAPTCHA Basics

Basically, the CAPTCHA is a program created to differentiate humans from bots during the logging to a website [2]. Accordingly, the bot is a program, which tries to emulate human users. It includes elements of AI as well as ability of automated reasoning. In that sense, CAPTCHA is referred to the Turing test as well as to human interactive proofs, but also depends on human–computer interaction. However, some differences exist. Unlike the Turing test, the CAPTCHA is controlled by an examiner which is integrated into a CAPTCHA. Hence, if we rephrase the Turing test, we obtain a CAPTCHA. Accordingly, CAPTCHA is a test program that is used for solving a given type of task, which is more suitable for humans than for bots. If the response to the CAPTCHA is correct, then the program classifies the user as a human. Figure 4.1 shows a sample of CAPTCHA test which is based on the recognition of text (only letters (a) and only numbers (b)). The human–computer interaction elements determine new attitudes in designing and developing the CAPTCHA. These elements are heavily dependent on the current technology which is employed, such as laptop computers, tablet computers and smartphones, that have been widespread everywhere. They are increasingly popular in the modern society also for e-business purposes. This is especially true because of the easy portability due to huge battery autonomy. Also, both tablets and smartphones incorporate the touch screen function which totally modified the approach to the human–computer interaction used in these portable devices.

Fig. 4.1 An example of text-based CAPTCHA: a with only text, and b with only numbers

4.2 Tasks of CAPTCHA

21

4.2 Tasks of CAPTCHA The aim of the CAPTCHA is to stop the attacks made by bots. Today’s research about CAPTCHA is focusing on the development of the test program, which will be easily solved by people and represents a heavy problem to bots. A list of the CAPTCHA’s main tasks is the following [3]: • Prevention of spams on forums and e-mails, • Prevention of opening a large number of orders on sites that offer free services like Gmail, Yahoo, etc., • User accounts protection from the attacks that extract the user passwords, • Validation of the online surveys by answering the questionnaire that determines the differences between the humans or bots, • Protection of online pools. CAPTCHA that incorporates a high level of security has to meet the following requirements: • The solution of the CAPTCHA must not be conditional. It means that it shouldn’t depend on the user’s language and/or age. This leads to the conclusion that it should be intuitive as much as possible, • The solving of the CAPTCHA should be easy for humans and hard to bots in order to differentiate humans from bots. Also, it should be completed by humans in no longer than 30 s [4], with a success rate of at least 90% [5], • The creating of CAPTCHA must not disturb the user privacy. It further means that it has not to be user related.

References 1. Ahn LV, Blum M, Hopper NJ, Langford J (2003) Captcha: using hard AI problems for security. In: Proceedings of theory and applications of cryptographic techniques: 22nd international conference (EUROCRYPT). Springer, Berlin, pp 294–311 2. Von Ahn L, Blum M, Langford J (2004) Telling humans and computers apart automatically. Commun ACM 47(2):57–60 3. CAPTCHA. http://www.captcha.net 4. Rui Y, Liu Z (2004) ARTiFACIAL: automated reverse turing test using FACIAL features. Multimed Syst 9(6):493–502 5. Chellapilla K, Larson K, Simard P, Czerwinski M (2005) Designing human friendly human interaction proofs (HIPs). In: Proceedings of SIGCHI conference on human factors in computing systems, pp 711–720

Chapter 5

Characteristics of CAPTCHA

Abstract This chapter presents the core elements characterizing the analysis and study of a CAPTCHA test. Each element is introduced and described by its main features. Then, an overview of advantages and limitations in the context of the core elements regarding the CAPTCHA test is described. It is accomplished by shortly surveying on different relevant works concerning the CAPTCHA elements and previously introduced in the literature, together with their limitations. In the end, a summary of the main open questions about the presented literature is provided.

5.1 CAPTCHA Elements Open questions which are related to the CAPTCHA test are in conjunction with its three elements: (i) security, (ii) usability, and (iii) practicality [1]. Security represents the way of using the CAPTCHA to protect websites from any unauthorized access. It includes complex programming elements, which make the CAPTCHA secure. In other words, the CAPTCHA should be programmed in a way that is almost unsolvable by a computer. Usability refers to the way of solving the CAPTCHA by the users. It is firmly linked to the way of solving the CAPTCHA and finding the right solution to the CAPTCHA puzzle. It seems to be an easy problem, but it includes many elements that take a part in. To uncover these elements, the intelligence, visual communication and cognitive psychology should be considered. Practicality represents the way of realizing the CAPTCHA programming. The CAPTCHA should be easily interpreted by any web browser on computers, tablets and/or smartphones. It is worth noting that the CAPTCHA can be observed as a cryptography problem. Also, it has to be open and freely accessed. Hence, any user can examine the CAPTCHA programming code. © Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_5

23

24

5 Characteristics of CAPTCHA

Usability is affected by the following components: (i) (ii) (iii) (iv) (v)

learnability, efficiency, memorability, errors, and satisfaction [2].

Learnability represents the users’ ability to accomplish basic tasks when they observe the CAPTCHA for the first time. Efficiency is linked to the response time in solving the CAPTCHA after the user learned its design. Memorability determines the users’ ability to resolve a given CAPTCHA after some period of time. Errors describes the type of errors the users make while solving a CAPTCHA. It includes: (i) (ii) (iii) (iv)

how many errors appear, what kind of errors occur, how severe the errors are, and how easily the errors will be overcome.

Satisfaction is linked to the users’ pleasantness of using the CAPTCHA. Basically, it is deeply connected to the complexity of solving the CAPTCHA. Accordingly, the CAPTCHA should be: (i) (ii) (iii) (iv) (v)

intuitive, easy to understand, easy to solve, easy to remember, and hard to make errors.

5.2 Advantages and Limitations of CAPTCHA The related works on CAPTCHA often employ statistical approaches in different aspects. They can be divided according to their properties. Hence, they are split into the property areas of [1]: (i) Security, (ii) Practicality, and (iii) Usability. Usually, the main concerns of creating and using CAPTCHA are in the domain of its security. Hence, the majority of related works have addressed that problem. It represents a central problem of the CAPTCHA, but it is not the only one that has a great importance. Many researchers have proposed to improve the CAPTCHAs in terms of their security. The first CAPTCHAs have been very vulnerable to the bots’ attacks. Hence, many techniques have been proposed to improve them. Reference [3] has proposed a few techniques, which have been introduced using the handwritten text in the CAPTCHA. They are:

5.2 Advantages and Limitations of CAPTCHA

(i) (ii) (iii) (iv)

25

doubling of text, different orientation, mirroring of each word in the text, and overlapping text with some curve text lines.

Also, a model of scattering has been proposed in [4] to improve the text-based CAPTCHA. The latest development methodologies have proposed the use of a CAPTCHA technique based on chaotic logistic map and projective S-box [5]. An improvement of the CAPTCHA security has been proposed in [6–8]. Specifically, Ref. [6] has suggested the extraction of image fragments as well as the change of the image orientation. Also, Ref. [7] has proposed a complex, but efficient method called ARTiFACIAL for image-based facial CAPTCHA, which transforms the facial elements in the image. Also, an interesting approach to create a new CAPTCHA has been introduced in [8]. It uses the so-called AgeCAPTCHA which extracts images from a public image database. The image is then cropped to a specific size in order to obtain a rectangle that contains a face of indeterminate age. Practicality is very important from the programmer’s point of view. Unfortunately, this issue has no connection with the users of the CAPTCHA. However, some of the proposed solutions provide a detailed way to create the CAPTCHA [5–7]. Still, the usability represents one of the important problems related to the using of the CAPTCHA. Hence, it especially concerns the users of the CAPTCHA. Unfortunately, this problem is rarely observed and tested. In this context, Ref. [9] has proposed a user-friendly CAPTCHA scheme based on the human appearance characteristics. Still, it received a human success rate of 62 and 83%, which is not satisfactory. Also, Ref. [10] has conducted experiments on a small population of twenty Internet users to obtain their response to different types of CAPTCHA. The authors obtained statistically significant differences between CAPTCHAs according to all dependent variables, but not in task’s completion time. The usability of the CAPTCHA has been also researched in Ref. [11]. Although it explores different age groups, the population is too small, i.e. only 24 samples. Also, this study uses only one demographic factor (user’s age) for the analysis of the CAPTCHA’s solution time. Finally, only one type of CAPTCHA is under consideration. A wider population of 107 participants has been used for testing their ability to solve the different CAPTCHAs [12]. However, this study has the limitation of using only university students with an age between 17 and 26 years. A further study has introduced a new type of CAPTCHA called AgeCAPTCHA [8]. It incorporates publicly available images. After that, it extracts faces of a given age by means of face detection and age estimation algorithms. At the end, the faces are cropped, which reduces the attacks. The CAPTCHA is tested on 267 participants. Although this CAPTCHA brings a higher security level, it is characterized by a longer solving time compared to other types of CAPTCHA.

26

5 Characteristics of CAPTCHA

One of the most recent studies has used two different experiments [13]. The first experiment includes 131 participants, which are explored according to their cognitive preferences between verbal and image ones. The second experiment includes 125 participants, which examine the users’ speed of successfully solving different CAPTCHAs. Although the obtained results in this study concerning user’s individual differences are quite interesting, they cannot be applied to a wider population because the tested population sample is only a student population. Accordingly, the result of this study lacks a level of generality. Finally, an advanced statistical analysis has been introduced in [14, 15], using the association rules for evaluating the dependence of the response time to different types of CAPTCHA from the co-occurrence of some demographic factors of the users. To summarize, the main limitations of the current approaches for analysis of the CAPTCHA’s usability in the state-of-the-art are the following: • The small sample population, i.e. the low number of users involved in the experiment (i.e. the statistical significance of the tested population), • The limited number of demographic factors which are tested to evaluate the solution time to the CAPTCHA (e.g. the age), • The lack of generality of the analysis (e.g. only university students of age between 17 and 26 years), • The reduced number of considered CAPTCHA types (e.g. mainly text-based CAPTCHA). In the rest of this book, multiple types of CAPTCHA will be described in detail and some methods overcoming the previous limitations for analysis of the CAPTCHA usability will be in the focus.

References 1. Baecher P, Fischlin M, Gordon L, Langenberg R, Lutzow M, Schroder D (2010) CAPTCHAs: the good, the bad, and the ugly. In: Proceedings of GI-Sicherheit. Lecture notes in informatics, vol 170, pp 353–365 2. Nielsen J (2003) Usability 101: introduction to usability. http://www.useit.com/alertbox/ 20030825.html 3. Rusu A, Govindaraju V (2005) Visual CAPTCHA with handwritten image analysis. In: Proceedings of HIP. Lecture notes in computer science, vol 3517. Springer, pp 42–52 4. Baird HS, Riopka T (2005) ScatterType: a reading CAPTCHA resistant to segmentation attack. In: Proceedings of document recognition and retrieval XII. SPIE-IS&T electronic imaging, SPIE, vol 5676, pp 197–207 5. Khan M, Shah T, Batool SI (2016) A new implementation of chaotic S-boxes in CAPTCHA. SIViP 10(2):293–300 6. Kim JW, Chung WK, Cho HG (2010) A new image-based CAPTCHA using the orientation. Visual Comput 26(6):1135–1143 7. Li Q (2015) A computer vision attack on the ARTiFACIAL CAPTCHA. Multimed Tools Appl 74(13):4583–4597

References

27

8. Kim J, Yang J, Wohn K (2014) AgeCAPTCHA: an image-based CAPTCHA that annotates images of human faces with their age groups. KSII Trans Internet Inf Syst 8(3):1071–1092 9. Moran TP (1981) The command language grammar: a representation for the user interface of interactive computer systems. Int J Man Mach Stud 15(1):3–50 10. Madathil GF, Alapatt JS, Greenstein JS, Madathil KC (2010) An investigation of the usability of image-based CAPTCHAs. In: Proceedings of the human factors and ergonomics society annual meeting, vol 54, no 16, pp 1249–1253 11. Lee YL, Hsu CH (2011) Usability study of text-based CAPTCHAs. Displays 32(2):81–86 12. Belk M, Germanakos P, Fidas C, Spanoudis G, Samaras G (2013) Studying the effect of human cognition on text and image recognition CAPTCHA mechanisms. In: Proceedings of HAS/HCII. Lecture notes in computer science, vol 8030, pp 71–79 13. Belk M, Fidas C, Germanakos P, Samaras G (2015) Do human cognitive differences in information processing affect preference and performance of CAPTCHA? Int J Hum Comput Stud 84:118 14. Brodi´c D, Amelio A, Draganov IR (2016) Response time analysis of text-based CAPTCHA by association rules. In: Proceedings of 17th international conference on artificial intelligence: methodology, systems, applications AIMSA. Lecture notes in computer science, vol 9883. Springer, pp 78–88 15. Brodi´c D, Amelio A (2016) Analysis of the human-computer interaction on the example of image-based CAPTCHA by association rule mining. In: Proceedings of 5th international workshop on symbiotic interaction. Lecture notes in computer science, vol 9961. Springer. https:// doi.org/10.1007/978-3-319-57753-14

Chapter 6

Types of CAPTCHA

Abstract This chapter introduces a categorization of the main types of CAPTCHA. Then, each category of CAPTCHA is described and some examples are shown. The main advantages and problems related to the different categories of CAPTCHA are shortly discussed. This represents an introductive overview preceding the subsequent chapters where some types of CAPTCHA belonging to this categorization are described in more detail.

6.1 Categorization of CAPTCHA Firstly, the CAPTCHA was designed in 1997 for Altavista. Its first duty was to prevent to automatic adding a Uniform Resource Locator (URL) to a database of a web browser [1]. By enlarging the CAPTCHA’s tasks, the expansion of CAPTCHA varieties has been rapidly raised in recent years. Currently, the types of CAPTCHA are divided in the following main groups [2]: • • • • •

Text-based CAPTCHA, Image-based CAPTCHA, Audio-based CAPTCHA, Video-based CAPTCHA, other CAPTCHA types.

The above represents the typical CAPTCHA types group division, although many new elements are introduced in the different types of CAPTCHA. Text-based CAPTCHA is the most widespread CAPTCHA type. It asks the user to decrypt the text which is usually distorted in different ways. Unfortunately, this type of CAPTCHA can be successfully attacked by bots due to the existence of good decoders (see Chap. 4 Fig. 4.1). Image-based CAPTCHA is usually considered as the most advanced and safest type of CAPTCHA. It requires from the users to find out and point to a desired image from a list of images. Because it is based on image details, it represents for the bot an extremely difficult task to be solved. Figure 6.1 shows an example of the image-based CAPTCHA. © Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_6

29

30

6 Types of CAPTCHA

Fig. 6.1 An example of image-based CAPTCHA

Fig. 6.2 An example of FaceDCAPTCHA [2]

As an extension to image-based CAPTCHA, the Face DCAPTCHA is used [2]. It is a CAPTCHA that incorporates the elements of a face detection. It is one of the newest CAPTCHA types that includes a high level of security. It exploits a research about the human brain, which is very effective in the process of natural face segmentation in spite of used complex backgrounds. Figure 6.2 shows an example of the FaceDCAPTCHA. Video and audio-based CAPTCHAs refer to the auditory reproducible characters that the user has to input. Although this type of CAPTCHA is typically attacked in

6.1 Categorization of CAPTCHA

31

approximately 70% of cases, its development and innovation are essential for the blind users. Figure 6.3 shows an example of the video and audio-based CAPTCHA. Other types of CAPTCHA represent those CAPTCHAs that cannot be part of the previous standardization. Figure 6.4 shows such types of CAPTCHA. Basically, the CAPTCHA is taking advantage of the human ability in reading printed or handwritten text, using speech, image and facial recognition. The works about the CAPTCHA have researched mainly the safety and security standpoint ignoring the difficulties of users to solve the task. Although a CAPTCHA protects user accounts and passwords, it often represents a firm obstacle not just to the bots, but also to the humans to its solution.

Fig. 6.3 An example of a video and b audio-based CAPTCHA

32

6 Types of CAPTCHA

Fig. 6.4 The examples of other types of CAPTCHA: a QRBGS CAPTCHA [3], b Dice CAPTCHA [4]

References 1. Lillibridge MD, Abadi M, Bharat K, Broder A (2001) Method for selectively restricting access to computer systems. US Patent 6,195,698. http://www.google.com/patents/US6195698 2. Goswami G, Powell BM, Vatsa M, Singh R, Noore A (2014) FaceDCAPTCHA: face detection based color image CAPTCHA. Future Gener Comput Syst 31(2):59–69 3. Hernandez-Castro CJ, Ribagorda A (2010) Pitfalls in CAPTCHA design and implementation: the math CAPTCHA, a case study. Comput Sec 29(1):141–157 4. DICE CAPTCHA. http://dice-captcha.com/

Chapter 7

Direction of CAPTCHA

Abstract This chapter introduces the main challenges and novelties realized in the recent years for the CAPTCHA test. Hence, it focuses on the new directions of the test, by answering to the question: “What are the main characteristics of the new generation CAPTCHA tests?”. Accordingly, we will show some important features of the modern CAPTCHA. They answer to the need of designing more “user-friendly” puzzles which could be easier to be solved, and at the same time more powerful systems making harder the attacks made by the bots. These requirements are mainly accomplished by adding interactivity, animation, motion, image visualization and fun inside the CAPTCHA. In this chapter, we extend the description of some CAPTCHA types given in Chap. 6 and describe other CAPTCHA types.

7.1 Slider CAPTCHA The slider CAPTCHA test is another type of CAPTCHA which uses an interactive mechanism based on moving a small slider for distinguishing between a human and a bot. It has been proposed on the “They Make Apps” website [1], which is a sort of YellowPages for finding Apps developers. This test shows a sign-up form where the user can insert the personal email and password. Just below the form, a small slider has appeared with a cursor initially positioned on the left of the slider. The user is asked to fill in the form and to show his/her human side by moving the cursor off the small slider from the left to the right, until the submit text label is reached. The slider CAPTCHA is reported in Fig. 7.1. Obviously, this type of CAPTCHA is quite vulnerable to the bot attacks, because it is not difficult to design a script that automatically moves the cursor from the left to the right of the slider. Also, this option of moving the cursor can be particularly critical for people with particular needs [2]. A more complex slider CAPTCHA test using multiple cursors and requiring the solution of an electronics problem can be found in the comments section of the Adafruit blog [3]. It shows a resistor and a set of sliders representing the band levels of the resistor. Each slider is positioned in correspondence of each color band of the resistor. The user is asked to move the cursor of each slider according to the color © Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_7

33

34

7 Direction of CAPTCHA

band of the resistor [2]. If the user correctly answers to the test, the comment is successfully submitted inside the system. Figure 7.2 shows the Adafruit blog’s slider CAPTCHA.

Fig. 7.1 The slider CAPTCHA on the “They make Apps” website: a the CAPTCHA starting interface, and b the status of the test during the moving of the cursor

Fig. 7.2 The Adafruit blog’s slider CAPTCHA

7.2 NoCAPTCHA reCAPTCHA

35

7.2 NoCAPTCHA reCAPTCHA In order to simplify the interaction while ensuring its effectiveness, the NoCAPTCHA reCAPTCHA has been introduced by Google Inc. in 2014 [4]. It is characterized by a small panel with a tick box associated with the label “I’m not a bot”. The aim of this test is to put a tick inside the box by the mouse, confirming that the user is not an automatic program (bot) trying to access to the protected website. According to the direction and position of the tick inside the tick box, the test is able to discriminate between a human and a bot. In fact, differently from a human, a bot has the tendency to click right in the middle of the tick box. The NoCAPTCHA test has a sophisticated mechanism in its API behind the tick of the box. Nonetheless, because of its ease of use, it is particularly practical on different desktop and mobile platforms. In order to improve the security of the test, in the case the differentiation between the human and the bot is not quite clear from the tick, a CAPTCHA is shown and asked to be solved. It merges the characteristics of another CAPTCHA with the ticking of the box. Different types of CAPTCHA can be used for this purpose [4]. A first type can be an image-based CAPTCHA composed of an image depicting a house number and a text field. The aim is to recognize the number in the image and type it in the text field. Another type can be an image-based CAPTCHA showing a panel containing an image list and a reference image. This panel is presented to the user, who is required to select the similar images to the given reference image. A similar way to conceive the CAPTCHA test is more intuitive and faster for the users of mobile devices, avoiding to recognize and type a distorted sequence of letters or numbers on e.g. a mobile phone. Figure 7.3 shows different samples of NoCAPTCHA

Fig. 7.3 NoCAPTCHA reCAPTCHA samples: a with tick box only, b with tick box, image and text field, and c with tick box and image-based CAPTCHA with image list

36

7 Direction of CAPTCHA

reCAPTCHA: (a) with tick box only, (b) with tick box and image-based CAPTCHA with typing of the text, and (c) with tick box and image list.

7.3 Invisible reCAPTCHA One of the latest versions of the CAPTCHA test is the Invisible reCAPTCHA, introduced by Google Inc. in 2016 [5]. It is a type of CAPTCHA test which is not invasive for the user neither it requires any direct interaction of the user with the test. Contrary to the NoCAPTCHA reCAPTCHA, the Invisible reCAPTCHA is not characterized by a tick box which the user needs to interact with. In fact, the interaction of the user with the test is “passive”. Accordingly, in order to discriminate between a human and a bot, this CAPTCHA registers the movements of the users when they submit a form on the website, their data history that Google captured during the exploration of the website, the movements of the mouse, the IP address of the users, etc. It is activated when the user clicks on a button which is located on the web page or by a JavaScript API call. If the detected traffic on the website is suspicious, and the distinction between a human and a bot is not quite clear, the test asks the user to solve a traditional CAPTCHA test. Figure 7.4 shows two samples of the Invisible reCAPTCHA test. They are characterized by the reCAPTCHA logo disposed at a specific location of the web page. It notifies that the website is protected by the reCAPTCHA technology. In both cases, a form is required to be filled in. After that, the user should press the submit button. The Invisible reCAPTCHA test runs on the background in both cases. Any CAPTCHA test is explicitly proposed to the user.

Fig. 7.4 Two Invisible reCAPTCHA test samples

7.4 Image-Based CAPTCHA

37

7.4 Image-Based CAPTCHA Although the Invisible reCAPTCHA avoids the direct interaction of the user with the CAPTCHA test, most of the websites still use more traditional CAPTCHA for increasing their security. In particular, the image-based CAPTCHA is one of the most widespread types of CAPTCHA using aspects of image processing and computer vision for designing the test. As introduced in Chap. 6, the aim of the user is to solve a task of object recognition, motion tracking, image retrieval or image segmentation. It is based on the principle that a human is hopefully more apt to deal with similar such tasks than a bot. If the user correctly solves the proposed task, then the system classifies him/her as a human. Otherwise the user is classified as a bot. Different types of image-based CAPTCHA are available on the web, proposing multiple tasks in the image processing and computer vision fields [6, 7]. A first type of such CAPTCHA is the Animals in the wild CAPTCHA. In this test, an image collection is presented to the user, who is asked to recognize the image depicting an animal in the wild. Figure 7.5 shows a sample of this CAPTCHA test. A second type of image-based CAPTCHA is the House number CAPTCHA. It is characterized by a collection of images showing a set of numbers or a single number located in a different context, with a complex background or distorted. The user is asked to recognize the image reporting a house number. Figure 7.6 illustrates this type of CAPTCHA test. Other types of image-based CAPTCHA require a recognition of facial expression task to be solved by the user. The tests are usually made more complex by adding specific backgrounds or other details in the images. A first example is the Old woman

Fig. 7.5 A sample of the Animals in the wild CAPTCHA [7]

Fig. 7.6 A sample of the House number CAPTCHA [7]

38

7 Direction of CAPTCHA

CAPTCHA, where the image depicting the face of an old woman is asked to be selected from a collection of images showing human faces. Figure 7.7 shows an Old woman CAPTCHA sample. Another example is the Worried face CAPTCHA, where the user is asked to choose the image reporting a worried face among a set of images showing different human faces of different state of mind. Also, in the Surprised face CAPTCHA the user has the aim to recognize a surprised face and select the correct image in an image collection of human faces. Figure 7.8 reports a sample of the worried and Surprised face CAPTCHA. The Animated character CAPTCHA is another type of image-based CAPTCHA. It is characterized by an image collection of human poses and faces, where one image depicts an animated character. The user is asked to select the image showing the animated character in the image collection. Figure 7.9 shows a sample of the Animated character CAPTCHA. A particularly challenging image-based CAPTCHA test is the Picture of the CAPTCHA. In this test, an image collection is shown to the user. The proposed

Fig. 7.7 A sample of the Old woman CAPTCHA [7]

Fig. 7.8 Two samples of image-based CAPTCHA: a worried face CAPTCHA, and b surprised face CAPTCHA [7]

7.4 Image-Based CAPTCHA

39

Fig. 7.9 A sample of the Animated character CAPTCHA [7]

Fig. 7.10 A sample of the picture of the CAPTCHA test [7]

images cover different subjects, including colored numbers and letters with some type of distortion and complex background, natural landscapes, human subjects and faces, animals, computer desktop screen, etc. In particular, one of these images shows the picture of a CAPTCHA test. The user is required to find the image reporting the picture of the CAPTCHA in the proposed image collection. Figure 7.10 illustrates the picture of the CAPTCHA test. A useful website has been recently developed for experimentation and analysis purposes, collecting the aforementioned image-based CAPTCHA tests [8] (http:// captchasamples.altervista.org/index.html). It was born in May 2017 as a joint project between the University of Belgrade, Technical Faculty in Bor, Serbia and the DIMES, University of Calabria, Italy. The website is divided into sections according to the different types of CAPTCHA. Starting from the home page, the Internet user has the possibility to access the different sections for testing the image-based CAPTCHA types. If the user correctly solves the test, the login page is simulated. Otherwise, a text label is visualized on the web page notifying to try again the test because the answer is incorrect. After correctly answering the test, a text label is visualized reporting the time required by the user to solve the test (in seconds) and the number of attempts performed by the user for correctly solving the test. Figure 7.11 shows an overview of the website and its main characteristics. In the past years, other types of image-based CAPTCHA have been introduced in the literature. In 2007 the Microsoft presented Asirra CAPTCHA [9]. This system asks the user to identify the only pictures of cats in a collection of 12 images of cats and dogs. Experiments showed that Asirra can be easily solved by a human in 99.6% of times in a time which is under 30 s. Also, it is expected that a bot will have a chance of 1/54,000 to solve it. Figure 7.12 shows the Asirra CAPTCHA.

40

7 Direction of CAPTCHA

Fig. 7.11 Overview of the CAPTCHA samples website: a home page, b text label visualized in the case of wrong answer to the test, and c text label with the time required by the user for correctly solving the CAPTCHA test and the number of attempts

7.4 Image-Based CAPTCHA

41

Fig. 7.12 Microsoft Asirra CAPTCHA [9]

Fig. 7.13 The principle underlying the What’s Up CAPTCHA [10]. Images on the left are randomly rotated. Images on the right are the corresponding ones in their upright position

In 2009 Google Inc. introduced the What’s Up CAPTCHA, which is based on the identification of the image’s upright orientation [10]. It is composed of a set of images which are visualized as randomly rotated to the user. The test asks the user to adjust the randomly rotated images to their upright position. It is based on the principle that rotating an image to its upright position is a difficult task for the bots while it can obtain a success rate up to 90% for the humans. Experiments demonstrated that users prefer to rotate the images instead of deciphering the text. This is particularly useful in mobile environments, where the rotation can be easily achieved by the touch of the screen. Figure 7.13 shows the principle underlying the What’s Up CAPTCHA.

42

7 Direction of CAPTCHA

Fig. 7.14 A sample of the AgeCAPTCHA interface [11]

The AgeCAPTCHA was proposed by Kim et al. [11]. Its aim is to categorize a set of test images which depict human faces into one of the eight age categories. Firstly, a panel is presented to the user, containing a randomly selected test image and a set of labels identifying the age categories. The user is asked to select the appropriate age category for the test image. If the user provides the correct answer, then the test is continued by presenting further test images to be categorized by the user. Otherwise, a new set of test images is selected and the test is started from the beginning. The test is completed when all the test images are correctly categorized by the user. According to the answers of the user, each test image is annotated with one of the eight age categories. Figure 7.14 illustrates the AgeCAPTCHA interface.

7.5 Social Recognition CAPTCHA In the last years, the CAPTCHA test has also been spread over the social media platforms, where it has mixed to the social networking technology in order to create the so called “social authentication” [2]. This interesting idea was proposed by Facebook in January 2011 in order to verify the authenticity of the accounts. This type of CAPTCHA test consists in showing to the user a few images of friends randomly selected from the Facebook friends list. After that, the user is asked to recognize the name of the specific visualized friend. A list of possible names to be associated with the images are reported inside a list positioned just next to the corresponding image. After identifying the name of the friend visualized on the image, the user clicks on

7.5 Social Recognition CAPTCHA

43

Fig. 7.15 A sample of social recognition CAPTCHA on Facebook

the submit button to receive the confirmation of correct or wrong recognition. The main difference with the other types of CAPTCHA tests is that the social recognition CAPTCHA should not allow a human hacker to access to the system. Hence, the main actors should be the Facebook account holder and the human hacker. The main limitation of the social authentication is connected to human factors concerning the way the friendship in Facebook is conceived. In particular, the users sometimes ask the friendship to unknown users, or they have hundreds of friends that are difficult to manage and remember. It has the consequence that sometimes the authentication can fail because the friends which are depicted on the proposed images are not recognized. Figure 7.15 shows a sample of social recognition CAPTCHA on Facebook.

7.6 Game-Based CAPTCHA The evolution of the CAPTCHA test determined the design and implementation of new game-based CAPTCHA types where the interaction and image-based characteristics were merged into a single solution. It has provided new challenges for the user, who is asked to find a solution to games or logical tasks. Obviously, they are almost impossible to be solved by the bots, while they represent a natural task for the human intelligence. Furthermore, they are usually funny and user-friendly, which makes the solution to the CAPTCHA test more appealing for the users. The first example of game-based CAPTCHA is the FunCAPTCHA, taking the name from an Australian startup founded by Kevin Gosschalk and Matthew Ford in

44

7 Direction of CAPTCHA

January 2013 [12]. It differs from the other types of CAPTCHA because it dynamically adapts its security according to the number of the users and their interaction, e.g. by checking the number of attempts to reduce the risk of brute force attacks or considering the past history of the users with the FunCAPTCHA. The user has the aim to play a game characterized by a set of images appearing in different poses, angles, states or textures in real-time. The game is sometimes almost complete, and the user is asked to make some move for completing the game and obtaining its solution. If the user correctly solves the game, then he/she is classified as a human, otherwise he/she is classified as a bot. Sometimes, more trials are allowed to the user in the case when the solution to the game is not correct. In the end, the system evaluates the solutions of the user for the classification as a human or a bot. Different types of games may characterize the FunCAPTCHA. A first sample is shown in Fig. 7.16, where the user is asked to rotate an image appearing in the wrong way inside a panel, until it is correctly oriented. If the user is not able to correctly solve the task, the system gives the possibility to try again. A second type of game is illustrated in Fig. 7.17, where a set of eight 3D images is shown to the user. Only one image is going to depict the face of a woman, while the other seven images depict the face of a man in different poses and angles. The user is asked to move the image of the woman into the middle. A third type of game is reported in Fig. 7.18, illustrating three scenarios of the tic tac toe. The user has to complete the series of crosses that allow to successfully end the game [13]. In particular, a series of three crosses is required along the vertical direction on the left, along the main diagonal in the center, and along the secondary diagonal on the right. A last sample of the game requires the solution of a matching problem between objects. In particular, a set of different objects is visualized to the user inside green colored panels: a tennis ball, a pig, and a tablet on the left, and a racket, a baseball ball and helmet on the right. The user is asked to match the tennis ball to the sports equipment [13]. Figure 7.19 shows this test. Another example of game-based CAPTCHA is the Sweet CAPTCHA [14]. An interactive panel with a set of images on the left and a target image on the right

Fig. 7.16 A first type of game for FunCAPTCHA. The user is asked to rotate the image in the right way

7.6 Game-Based CAPTCHA

45

Fig. 7.17 A second type of game for FunCAPTCHA. The user is asked to move the image of the woman into the middle

Fig. 7.18 A third type of game for FunCAPTCHA. The user is asked to complete three series of crosses in the tic tac toe

is shown to the user in a completely user-friendly manner. The aim is to select the correct image on the left and drag it toward the target image on the right. If the association between the two images is correctly performed, the user is classified as a human, otherwise it is classified as a bot. Figure 7.20 illustrates a first sample of Sweet CAPTCHA. It is composed of four images on the left and the target image depicting a drum on the right. The user is asked to drag the sticks to the drum from the left to the right. A second sample of Sweet CAPTCHA is depicted in Fig. 7.21, reporting a sort of puzzle that the user is asked to solve. In particular, a panel is shown containing four images of different objects on the left and the target image of an incomplete object

46

7 Direction of CAPTCHA

Fig. 7.19 A fourth type of game for FunCAPTCHA. The user is asked to match the tennis ball to the sports equipment

Fig. 7.20 A first sample of Sweet CAPTCHA. The user is asked to drag the sticks on the left to the drum on the right

Fig. 7.21 A second sample of Sweet CAPTCHA. The user is asked to drag the missing part on the left to its place on the right

7.6 Game-Based CAPTCHA

47

on the right. The user’s aim is to drag the missing part of one image on the left to its place inside the object on the right. A CAPTCHA test requiring the solution of a logical task is the Dice CAPTCHA. It was created by Gregory Ravichbach and developed by studio Redorigami in 2010 as an open source project [15]. It is characterized by a logical task concerning the dice game. If the user finds the correct solution to the task, he/she is considered as a human, otherwise he/she is considered as a bot. Dice CAPTCHA is available in two versions: (i) Homo-sapiens Dice CAPTCHA, asking to compute the sum of the numbers reported on the dice, and (ii) All-the-rest Dice Captcha, asking to only report the numbers depicted on the dice [16]. Figure 7.22 shows a sample of Homosapience Dice CAPTCHA. A panel is visualized to the user, containing four dice. The user is asked to roll the dice, by pressing the button “Roll”. After that, the user should compute the sum of the numbers appearing on the dice, enter the sum inside the text field and press the “Go” button. Figure 7.23 shows a sample of All-the-rest Dice CAPTCHA. A panel appears to the user, where three dice are showing. The user is asked to roll the dice by pressing the “Roll” button. After that, the user should enter the digits which are visualized on the dice inside the text field. Then, the “Go” button should be pressed. Other interesting game-based CAPTCHAs requiring the solution of logical tasks are mainly connected with the humans’ ability to detect the shape of an object or to recognize an object from some points in a background. Obviously, these tasks can be particularly critical for the bots, but pretty intuitive for the humans. If the task is

Fig. 7.22 A sample of Homo-sapiens Dice CAPTCHA

Fig. 7.23 A sample of All-the-rest Dice CAPTCHA

48

7 Direction of CAPTCHA

correctly accomplished by the user, the system will recognize the user as a human, otherwise the user will be categorized as a bot. The Motion CAPTCHA test is an innovative type of CAPTCHA based on reproducing a given shape which is visualized inside a box [13]. Figure 7.24 illustrates a sample of this test. In Fig. 7.24a the initial phase of the test is shown, where the user is asked to draw the shape in the box in order to submit the form which is filled in by the user. Usually the form is visualized just above the CAPTCHA test. Figure 7.24b shows the solution to the test which is successfully obtained by the user. Accordingly, the box is framed in green color and a label “Captcha passed!” is interactively visualized inside the box. After successfully solving the test, the user can press the “Submit” button.

Fig. 7.24 A sample of Motion CAPTCHA: a the initial panel, and b the solution to the test

7.6 Game-Based CAPTCHA

49

In [17] a set of animated CAPTCHA tests has been presented and analyzed, where the advertisement is proposed in terms of finding a solution to the CAPTCHA test. They are a sort of animated puzzle, characterized by a series of panels automatically appearing and disappearing in sequence. The user visualizes a panel where some letters of the alphabet are randomly selected and showed inside a normal CAPTCHA panel. Then, the letters are visualized inside a different panel containing an advertisement. It generates an infinite loop in the same order. The aim of the user is to type the letters appearing in sequence inside the panels. If the user correctly recognizes the visualized letters, then he/she is considered as a human, otherwise he/she is considered as a bot. Two samples of the animated CAPTCHA test are visualized in Fig. 7.25. Another similar CAPTCHA test is characterized by animation and resolution of a logical task [17]. In particular, the image of an advertisement is firstly visualized to the user. After that, a sequence of images showing the components of a math expression appears to the user. In the end, the image of an advertisement is visualized. These images appear and disappear in sequence inside an infinite loop. The user has the aim to correctly type the solution of the math expression, in order to demonstrate that he/she is a human. Figure 7.26 shows a sample of animated math CAPTCHA. Animated CAPTCHA test is more secure than the traditional image-based CAPTCHA test against the attacks made by the bots. This is mainly because the letters of the alphabet are visualized inside the image of an advertisement, which contains other letters, i.e. other alphabets. In this way, an automatic program will be confused by the different alphabets and won’t be able to recognize the correct alphabet of the animation which should be typed. Another important aspect is that the animated puzzles are randomly generated. Accordingly, the automatic program should first recognize the type of animated puzzle, making harder the attack. In terms of usability, the animated CAPTCHA test is funnier than the image-based

Fig. 7.25 Two samples of the animated CAPTCHA

50

7 Direction of CAPTCHA

Fig. 7.26 A sample of the animated math CAPTCHA [17]

CAPTCHA test and more intuitive to be solved by users of different age groups. Also, it is a very good medium to propose the advertisement through the solution of animated puzzles. Other types of CAPTCHA are characterized by visual effects puzzles [17]. A first example can show a specific advertisement on the faces of a cube. In particular, one face depicts the brand of the advertisement. The user is asked to rotate the cube, recognize and correctly type the visualized text of the brand, to be identified as a human (see Fig. 7.27). The other visual effects puzzle can be characterized by scenes of advertisements which are rendered in 3D inside the screen together with some text moving inside the screen or positioned on an object rotating inside the screen. The aim of the user is to correctly find and type the moving text in order to be classified as a human. Figure 7.28 shows two samples of other visual effects puzzles. Finally, the Interactive game-based CAPTCHA is a special type of CAPTCHA where the user is asked to play an interactive game containing some details of the advertisement [17]. An example is a game characterized by a flight simulator which advances over a road depicting the brand of an advertisement. Different balloons appear in the sky over the road, showing the brand of the advertisement, too. The aim of the user is to hit three balloons by the click of the mouse, while moving

Fig. 7.27 A first sample of the CAPTCHA test with visual effects puzzles [17]

7.6 Game-Based CAPTCHA

51

Fig. 7.28 Two samples of CAPTCHA with other visual effects puzzles [17]: a an advertisement is rendered in 3D inside the screen together with the moving text, and b, c the text positions on a rotating sphere

the flight simulator over the road. If the user correctly hits the three balloons, the CAPTCHA test is considered to be solved (see Fig. 7.29). In terms of their security, the Visual effects and Interactive game-based CAPTCHA tests are more robust than the image-based CAPTCHA tests because their interactivity makes them harder to be solved by the bots. In terms of their usability, these CAPTCHA tests are considered as funnier and more interesting by both young and older people of different occupation. Also, they influence more positively the users in terms of advertisement.

52

7 Direction of CAPTCHA

Fig. 7.29 A sample of Interactive game-based CAPTCHA [17]: a the flight simulator is positioned on the road depicting an advertisement, and a balloon showing the advertisement is depicted in the sky just above the road, b the first balloon is hit by the user, and c other two balloons are shown to be hit by the user

References 1. They Make Apps. http://theymakeapps.com 2. Bushell D (2011) In search of the best CAPTCHA. https://www.smashingmagazine.com/2011/ 03/in-search-of-the-perfect-captcha/ 3. Adafruit Blog. https://blog.adafruit.com 4. Shet V (2014) Are you a robot? Introducing “NoCAPTCHA reCAPTCHA”. https://security. googleblog.com/2014/12/are-you-robot-introducing-no-captcha.html 5. Invisible reCAPTCHA. https://www.google.com/recaptcha/intro/ comingsoon/invisible.html 6. Brodi´c D, Amelio A (2016) Analysis of the human-computer interaction on the example of image-based CAPTCHA by association rule mining. In: Proceedings of international workshop on symbiotic interaction (Symbiotics ‘16). Lecture notes in computer science 9961. Springer, Berlin, pp 38–51

References

53

7. Brodi´c D, Amelio A, Jankovi´c R (2016) Exploring the influence of CAPTCHA types to the users response time by statistical analysis. Multimed Tools Appl 77(10):12293–12329 8. CAPTCHA samples (2017). http://captchasamples.altervista.org 9. Elson J, Douceur JR, Howell J, Saul J (2007) Asirra: a captcha that exploits interest-aligned manual image categorization. In: Proceedings of ACM conference on computer and communications security, vol 7. ACM, New York, pp 366–374 10. Gossweiler R, Kamvar M, Baluja S (2009) What’s up CAPTCHA?: A CAPTCHA based on image orientation. In: International conference on World Wide Web. ACM, New York, pp 841–850 11. Kim J, Yang J, Wohn K (2014) AgeCAPTCHA: an image-based CAPTCHA that annotates images of human faces with their age groups. KSII Trans Internet Inf Syst 8:1071–1092 12. FunCAPTCHA. https://www.funcaptcha.com 13. House of CAPTCHA. http://techisquest.blogspot.it/p/1.html 14. Sweet CAPTCHA. http://sweetcaptcha.com 15. Dice CAPTCHA. http://dice-captcha.com 16. Engelhardt Ari E (2013) Dice CAPTCHA—A CAPTCHA Add-On with a twist. http://www. datapersona.org/2013/01/18/dice-captcha-a-captcha-add-on-with-a-twist/ 17. Aggarwal S (2013) Animated captchas and games for advertising. In: International conference on World Wide Web. ACM, New York, pp 1167–1174

Chapter 8

CAPTCHA Programming

Abstract This chapter presents the most advanced algorithms which are used to successfully design a CAPTCHA test. In the first part, the most important techniques which are employed for designing a text CAPTCHA are described. In fact, although it is the most attacked type of CAPTCHA, it is still used in many web sites. Then, the attention will be moved toward the design of the image-based CAPTCHA, for which different techniques of image transformation are usually employed. The third part will be dedicated to algorithms and methods for designing other types of CAPTCHA, such as text-based reCAPTCHA, NoCAPTCHA reCAPTCHA, and game-based CAPTCHA. Finally, the last part of the chapter provides a practical miniguide on how to design a simple text and image-based CAPTCHA in JavaScript and PHP.

8.1 Designing the Text-Based CAPTCHA The bots which are designed for making attacks to the text CAPTCHA test are characterized by three main tasks [1]: 1. Pre-processing, 2. Segmentation, and 3. Classification. The first task consists of pre-processing the text CAPTCHA for making easier its resolution. In particular, it uses color removal techniques or noise reduction approaches on the image representing the set of the letters and/or numbers of the CAPTCHA. After that, a segmentation approach is used for separating the letters and/or numbers composing the CAPTCHA. Accordingly, an image clustering algorithm can be employed on the image of the CAPTCHA. Finally, a classifier is used on the segmented letters in order to recognize the type of the letter. Hence, approaches like Support Vector Machine or artificial neural networks are adopted for recognizing each letter composing the CAPTCHA [1].

© Springer Nature Switzerland AG 2020 D. Brodi´c and A. Amelio, The CAPTCHA: Perspectives and Challenges, Smart Innovation, Systems and Technologies 162, https://doi.org/10.1007/978-3-030-29345-1_8

55

56

8 CAPTCHA Programming

Figure 8.1 shows an overview of the attacking process. Considering that an attack to the text CAPTCHA is characterized by these three tasks, an algorithm designed for the creation of a text CAPTCHA should envelope the following two important aspects [1]: 1. An anti-segmentation technique, and 2. An anti-recognition technique. An anti-segmentation technique has the aim to make harder the segmentation of the letters and/or numbers into the image of the CAPTCHA test. An anti-recognition technique should avoid the recognition of the letters and/or numbers. It has been demonstrated that most of the successful attacks to the text CAPTCHA are due to a reliable segmentation of the letters and/or numbers composing the test. Accordingly, the text CAPTCHA should be designed in order to make harder the discrimination of the letters and/or numbers by the bots. However, its efficacy is deeply linked with the design of reliable features and anti-recognition techniques, too. According to these characteristics, the text CAPTCHA can be designed in different manners, considering the anti-segmentation and anti-recognition techniques. Multiple anti-segmentation techniques can be used in the design process, such as [1]: • • • •

Complex background, Extra lines, Collapsing, and Random noise.

The complex background consists of adding some complex texture or image motif as a background to the set of letters and/or numbers of the test. It has the aim to confuse the bot in the case of attack. The extra lines includes some lines which are added on the letters and/or numbers of the test in order to make harder their segmentation. These lines can be small lines across the text of the CAPTCHA, or large lines crossing the overall CAPTCHA image. The collapsing eliminates the space between the letters and/or numbers in order to avoid their segmentation. The random noise consists of adding a random noise to the image representing the set of letters and/or numbers of the test. It is important that the added noise has the same color of the text, otherwise it can be easily removed by an automatic program. Figure 8.2 shows different samples of text CAPTCHA with letters and numbers where anti-segmentation techniques are applied. In particular, in Fig. 8.2a a complex background, composed of smaller letters of different colors, is added to the text in

Fig. 8.1 Overview of the attacking process to a text CAPTCHA. From the left to the right: original text CAPTCHA image, pre-processing with color removal, segmentation and final recognition [1]

8.1 Designing the Text-Based CAPTCHA

57

Fig. 8.2 Samples of alphanumeric text CAPTCHA where anti-segmentation techniques are applied: a complex background, b extra lines, c collapsing, and d random noise

order to confuse the segmentation process of letters and numbers. By contrast, in Fig. 8.2b multiple extra-large lines are shown on the letters and numbers, which avoids their correct segmentation. In Fig. 8.2c a collapsing technique is employed, where all the letters and number are positioned just next to each other without any space. Finally, in Fig. 8.2d some random noise is added just around the text. Furthermore, different techniques for the anti-recognition can be adopted in the design process of the text CAPTCHA, such as [1]: • • • • • •

Multi-fonts, Variable font-size, Distortion, Blurring, Tilting, and Waving.

All these techniques have the aim of confusing the classifier in the recognition of the letters/numbers of the test.

58

8 CAPTCHA Programming

In particular, the multi-fonts introduces multiple fonts in creating the letters and/or numbers of the test. The variable font-size applies a variable size to the used font. The distortion consists of creating distorted or unaligned letters and/or numbers in the test. The blurring visualizes the content of the text as blurred. The tilting has the aim to rotate the components of the text at different angles. Finally, the waving is a type of distortion which rotates the letters and/or numbers of the test in a wave-like manner. Figure 8.3 illustrates two samples of text CAPTCHA with only letters and with only numbers where distortion anti-recognition technique is used. In Fig. 8.3a it is visible that the letters are distorted in different ways on the left and on the right directions. Also, in this specific case some random noise is added to the image representing the text. In Fig. 8.3b the letters are distorted in a wave-like manner, and all the letters are slightly rotated on the right direction. Also, some extra lines are added on the letters. Figure 8.4 reports different anti-recognition techniques applied on some CAPTCHA texts. An interesting website for generating customized text CAPTCHA tests is currently available online [2] (https://captcha.com/captcha-examples.html). BotDetect CAPTCHA provides a professional service of text CAPTCHA generation which is customized to the user’s needs. Accordingly, the user has the possibility to select the image size of the CAPTCHA, the language, the type of letters and/or numbers to be used and their number, the format of the background, the type of distortion or waving to be adopted, and the type of noise to be added to the text of the CAPTCHA. Furthermore, it is possible to select the audio features to be associated with the selected CAPTCHA. All these characteristics are available via the BotDetect CAPTCHA generator, which is compatible with ASP.NET, Java, and PHP based web forms to be integrated inside a website. Figure 8.5 shows the demo panel reporting all the customized features which can be selected via the BotDetect CAPTCHA generator. Fig. 8.3 Samples of text CAPTCHA where distortion anti-recognition technique is applied: a with only letters, and b with only numbers

8.1 Designing the Text-Based CAPTCHA

Fig. 8.4 Different types of anti-recognition techniques applied on the texts of the CAPTCHA

Fig. 8.5 The demo panel of the BotDetect CAPTCHA generator

59

60

8 CAPTCHA Programming

8.2 Designing the Image-Based CAPTCHA Differently from the text CAPTCHA, the algorithms and techniques for designing an image-based CAPTCHA are characterized by image processing and computer vision tasks. Obviously, the specific task to be used mainly depends on the aim of the test. In fact, in Chap. 7 different types of image-based CAPTCHA with different characteristics have been presented and described. As for the text CAPTCHA, the main objective of the techniques used for designing the image-based CAPTCHA is preventing the attacks made by the bots. These attacks may be characterized by the application of machine vision and pattern recognition approaches for automatically solving the test [3]. However, the successful application of a pattern recognition task on a collection of images coming from a very large domain and with an increasing level of complexity, in terms of background, color and geometric transformations, still remains an open challenge. One of the most typical tasks which are performed by an automatic program is the visual concept detection. Although the noticeable progress in the computer vision field, the automatic programs still have a lower ability than the humans in successfully solving this task. Further attacks are realized by employing image similarity algorithms for comparing a set of images. This is typical of image-based CAPTCHA tests where it is asked to recognize a specific image belonging to a given category among a collection of images [3]. The attack consists of collecting a large set of images which are visualized by the test and categorize them. Every time a new task of image recognition is required by the test, the collected images belonging to the category given by the test are matched to each image of the test until the correspondence is found. Based on the aforementioned problems, the techniques for designing a reliable image-based CAPTCHA are characterized by color and geometric image transformations, such as [3]: • • • • •

Resizing, Rotation, Flipping, Controlled distortion, and Shade modification.

These image transformations can be applied on every image selected for appearing in the test. The resizing consists of modifying the size of the images of a random percentage with respect to the original size. The rotation has the aim to rotate the images of a random angle. The flipping determines the moving of the images in their opposite side along the vertical axis. The controlled distortion is a more complex transformation which applies a distortion to the images for making harder the content recognition, the image matching and the object segmentation. Finally, the shade modification modifies a given number of the pixels composing the shade in the images with a random percentage of its original color. Consequently, if the same image is chosen multiple times, it will appear differently each time. This last transformation is useful against attacks based on color recognition.

8.2 Designing the Image-Based CAPTCHA

61

Figure 8.6 reports a sample of image-based CAPTCHA where rotation and controlled distortion are applied on the visualized images. It requires a task of image recognition and categorization to be solved. In particular, the user is asked to find the image depicting the bird among the collection of the proposed images. Figure 8.7 reports another sample of image-based CAPTCHA where rotation is applied on the visualized images. In particular, image D and T are clearly rotated along the left direction. The user has the aim to recognize and categorize the images visualized on the panel, in order to find the images of the food, the boat and the train. Other three important techniques for designing robust image-based CAPTCHAs and not linked with the color and geometric image transformations have been used in the literature: • Variable and large domain of images, • Complexity of the images, and • Pool of categories [3]. The selection of the images to be visualized by the CAPTCHA test should be performed within a large domain of images also changing over time. It guarantees the automatic recognition of the objects and visual concepts to be a critical task.

Fig. 8.6 A sample of image-based CAPTCHA where rotation and controlled distortion are visible on the images [3]

Fig. 8.7 Another sample of image-based CAPTCHA where rotation is applied on some of the visualized images

62

8 CAPTCHA Programming

Fig. 8.8 A sample of image-based CAPTCHA where the visualized images belong to different domains and are quite complex in their composition

Obviously, the application of pattern recognition techniques to complex images will discourage any automatic program which is going to make an attack. Figure 8.8 shows a sample of image-based CAPTCHA where the visualized images are quite complex in their composition. Furthermore, the images belong to different domains. The aim of this image-based CAPTCHA is to solve an image recognition task, which asks the user to recognize the image of the flower among a set of selected images. A further interesting method for preventing the attacks made by the bots is the pool of categories [3]. In particular, it consists of associating only a sub-set of image categories, called pool of categories, to a given request with the same IP address. Accordingly, each time the image-based CAPTCHA test is accessed by the same IP address, the visualized images will be only selected by this pool. This method avoids the potential attacks to an image-based CAPTCHA test which is based on image categorization. In fact, the initial attacking phase of selection and categorization of the images visualized by the test should be performed each time the pool changes. A change of the pool may occur every time the test is completed. Hence, this approach makes very difficult to automatically categorize the visualized images. In the case of image-based CAPTCHA which requires the solution of a human face recognition task, the design of a secure CAPTCHA can be performed by adopting other two important strategies [4]: • Natural variation in faces, and • Illumination changes. A natural variation in a face can be a variation of expression characterizing a specific context or state of mind. An illumination change can occur on a face when it is under different poses or in different contexts. While the face recognition task is

8.2 Designing the Image-Based CAPTCHA

63

easy to be solved by the humans even if natural variations or illumination changes occur on the images of the test, the automatic programs are particularly penalized by these image variations [4]. These strategies can be further improved by adding a complex background to the faces and by applying a controlled distortion to the faces which is based on the optimization of a set of parameters. This optimization guarantees a compromise between the ease of solving the CAPTCHA by the user and the difficulty of the test to be solved by the bots [4]. Figure 8.9 shows a sample of multiple human faces with different types of natural variations, illumination changes, controlled distortions and complex backgrounds. In [5] Datta et al. discussed about the importance of different types of distortions for the image-based CAPTCHA. They typically affect the correct extraction of low-level features from the images which is the key aspect of their recognition by automatic programs. Different types of distortions can influence the extraction of different types of features. In particular, the main atomic distortions are the following: • Luminance. Modifications in the brightness of the image has a negative impact on its recognizability. They can be obtained by changing of a scale factor the RGB components of the image. • Color quantization. It consists of reducing the color levels of the image. This can be realized by applying the transformation of the image pixels from the RGB to CIE-LUV color space. Then, the K-means clustering is employed on the color points in order to reduce the color levels of the image. The new color levels are the clusters computed by the K-means. This process introduces loss of information which makes harder the image recognition.

Fig. 8.9 Different sample human faces with natural variations, controlled distortions, illumination variations, and complex background [4]

64

8 CAPTCHA Programming

• Dithering. It is based on using a limited set of colors to create an illusion of color depth in the image. In order to make it more robust to mean filters which can delete its effects, the dithering is locally applied on image regions. After partitioning the image, for each region a set of colors is chosen and used for dithering that partition. Hence, the image is differently dithered at each region. This makes difficult the automatic application of image segmentation approaches and the image recognition. • Cutting and Re-scaling. It performs a cut along one of the four sides of the image which is randomly selected. Then, it re-scales the image to its original size. This approach creates some difficulties in the image recognition, because the main elements are usually located in the center part of the image. • Line and Curve noise. It adds a strong noise to the image, in terms of lines, sinusoids and other thick and complex curves. They are colored by reducing their RGB components in order to create dark effects near to the black color. Curves and lines are added at random positions and in a random number. This noticeably alters the image recognition. Figure 8.10 shows four types of atomic distortions: luminance, color quantization, line and curve noise, and dithering. The y factor corresponds to the luminance scaling factor in the case of the luminance, to the level quantization in the case of the color quantization, to the density of noise in the case of line and curve noise, and to the dithering level in the case of the dithering. However, it has been demonstrated that these atomic distortions are not so effective against the automatic attacks if they are separately used [5]. Consequently, it is suggested to use them in a composite manner by varying their parameters, too. Figure 8.11 illustrates a sample of composite distortion which is applied on two images of the CAPTCHA. For each image, a color quantization is applied with the K parameter of the K-means algorithm set to 15. After that, the dithering is employed on the quantized image. Then, a line noise with randomly spaced lines is used. At the end, a 10–20% of cut/re-scale is performed on a random selected side of the image.

8.3 Designing the Other CAPTCHA Types The algorithms for designing other types of CAPTCHA are various and include the procedure for creating the reCAPTCHA test [6]. Next, we will consider the traditional reCAPTCHA including text strings and words. This procedure is started with an image of some scanned text, which is subjected to two different OCR programs for its analysis. The outputs of the programs are compared to each other by a stringmatching algorithm and verified by an English dictionary. Hence, the words which are differently codified by the two programs or which are not present inside the English dictionary are marked as “suspicious”. After that, an image is created which includes a suspicious word and another word for which the answer is known. Both the words are distorted and visualized to the user as a reCAPTCHA. The word which

8.3 Designing the Other CAPTCHA Types

65

Fig. 8.10 Four types of atomic image distortions: a luminance, b color quantization, c line and curve noise, and d dithering [5]

is already known is referred as “control” word, while the new word is referred as “unknown” word. For each reCAPTCHA, one unknown and one control word are visualized together in a random order. One normalization in frequency is applied on the control words, such that the less frequent word has the same probability to appear as the most frequent word. Furthermore, the vocabulary of the control words is very large, containing more than 100,000 elements. Finally, only the words which are not correctly recognized by both the OCR programs are selected as control words. These aspects are of prior importance to lower the probability that an automatic program may randomly guess the words. Every suspicious word is visualized to different users as an unknown word with random distortions. If the user correctly recognizes the control word appearing together, then his/her answer to the suspicious word is considered as plausible for the unknown word. The unknown word becomes a control word if three humans give the same answers, which differ from the answers of the OCR programs. If the answers of the humans did not match, the unknown word is visualized to more humans and the answer is established by majority voting in order to be selected as a control word in the next challenge. If a word is considered as unclear, a button is provided to the user for changing the pair of visualized words.

66

8 CAPTCHA Programming

Fig. 8.11 Two samples of composite distortion [5]

If six users decide to reject a word before guessing a type of spelling, the word is considered as unreadable and eliminated. The reCAPTCHA includes three different types of distortions for the words [6]: • Natural distortions, • Noise, and • Artificial transformations. The natural distortions are mainly linked with the fading of the texts through time. Furthermore, the scanned text is subjected to noise. Finally, some artificial transformations can be used similarly to those used in the traditional CAPTCHA for making harder the text recognition also to the most sophisticated OCR systems (see Sect. 8.1). Figure 8.12 shows two samples of reCAPTCHA where a mixture of noise and artificial transformations are applied on the text. Other types of advanced CAPTCHA, such as the NoCAPTCHA reCAPTCHA, are characterized by a level of encryption which makes quite difficult to understand what is the mechanism underlying them. In the particular case of the NoCAPTCHA reCAPTCHA, it seems that the system adopts a technology called Google’s Botguard, which was used as anti-spam and bot detection in Gmail [7].

8.3 Designing the Other CAPTCHA Types

67

Fig. 8.12 Two samples of reCAPTCHA where a mixture of noise and artificial distortions are applied on the text

Also, the NoCAPTCHA code and the encryption key are periodically changed, so that it is difficult for a bot to successfully attack the system. According to AdTruth’s lead engineer Marcos Perona, Botguard firstly checks if a Google cookie is included inside the user’s machine. After that, the NoCAPTCHA moves that cookie from Google to the browser web. Then, it registers different information about the user’s browser window at the time of using the NoCAPTCHA, such as [7]: • Size and resolution of the screen, date, languages, Javascript objects and browser’s plug-ins, • IP address, • CSS features of the web page, • A sort of statistics of the number of mouse and touch events. Also, the NoCAPTCHA will find and use other different cookies coming from other Google services for the users, such as Gmail, Search, Analytics, etc. Obviously, the aim is to find specific patterns from these collected data which may distinguish a human behavior from a bot behavior. All this information is encrypted and sent to Google. According to the AdTruth’s study about the problem, the different scenarios which are linked to the use of the NoCAPTCHA are mainly associated with the ability of the system to retrieve the needed information. In particular, three use-cases can be found as follows: • If the system finds the Google cookies and all data about the mouse events, IP address, etc. can be successfully obtained, the NoCAPTCHA will ask to the user to prove that he/she is a human by ticking the typical checkbox. • If no information can be retrieved about the Google cookies (because they have been deleted by the user), the system will ask to the user to solve a two-word CAPTCHA.

68

8 CAPTCHA Programming

Fig. 8.13 A sample of interaction between the legitimate user and the web server [9]

• If the user activated an anti-tracking plugin, the system will ask to the user to solve a two-word CAPTCHA regardless of the status of the cookies. In Chap. 7, different types of game-based CAPTCHA have been described. For some of these CAPTCHA tests, the animation is realized by using the GIF technology [8]. First of all, a set of appropriate images and text strings should be carefully selected. All the obtained material should be stored inside a dataset which will be used by the CAPTCHA system for realizing the animation. Some techniques for text randomization can also be adopted for automatically modifying the position and the character of the text. The animated GIF will be generated by integrating the image text and the image objects as frames. Some other interactive game-based CAPTCHA tests are developed by using Flash and HTML5 with JavaScript, which allow to download the game to the user’s computer and execute it locally [9]. In order to obscure the solution of the game, e.g. in terms of object to be moved from one side to another side of the screen, the game code should be encrypted by the server. Also, a mechanism to discriminate between legitimate user gameplay and human-solver gameplay in a relay attack can be designed [9]. It consists of interacting with the web server which sends the game code to the user’s machine. If the user successfully solves the game, the log of the mouse interactions of the user is sent back to the web server, which analyzes it by using a detection algorithm and answers by accepting or rejecting the user. The interaction between the user and the web server is performed by a secure communication channel (e.g. SSL/TLS) [9]. Figure 8.13 shows the flowchart of this mechanism performed on a legitimate user.

8.4 How to Create a Simple CAPTCHA An easy text or image-based CAPTCHA including text and/or numbers can be implemented using web-oriented languages and libraries, such as JavaScript [10], PHP [11] and jQuery [12]. A first example is the implementation of the CAPTCHA test given

8.4 How to Create a Simple CAPTCHA

69

in Fig. 8.14. It is a simple text CAPTCHA which is composed of four digits [13]. The aim of the user is to read the digits which are shown in the text field on the top, and type them inside the text box on the bottom. Finally, the user is required to press the Submit button. The system analyzes the answer which is provided by the user and classify him/her as a human if the typed digits are identical to the digits which are shown in the text field. Otherwise, the user is considered as a bot. Also, the user has the possibility to press the Refresh button to modify the digits which are visualized in the text field. The JavaScript code for generating this text CAPTCHA is as follows:

JavaScript Captcha Example





Refresh Submit

The JavaScript functions are embedded inside an html page whose title is “JavaScript CAPTCHA Example”. The function generateCaptcha() is used for generating the CAPTCHA test when the page is loading. It is achieved by calling this method in the onload attribute of the body tag. Inside the function generateCaptcha() a set of four numbers in the range between 1 (included) and 10 (excluded) is randomly generated. After that, each number is converted into a String by invoking the method toString() on each number. Finally, a unified String catcha is generated containing the sequence of the four numbers. Hence, it is assigned to the CAPTCHA text field. The check() method is used to verify the validity of the digits which are entered by the user. In particular, it verifies the input which is typed by the user inside the text box. Hence, if it is equal to the digits which are visualized in the text field, a small panel is shown to the user (an alert) notifying that the two text strings are equal (the test is passed). Otherwise, a small panel is visualized to the user (an alert) notifying that the two text strings are not equal (the test is not passed). It is observed that pressing the button Submit invokes the method check(). By contrast, pressing the button Refresh invokes the method generateCaptcha() for the generation of a new text CAPTCHA interface, with new randomly generated numbers. Finally, captcha and inputText are two text fields. The first one cannot be manually modified by the user, because it contains the randomly generated digits from the generateCaptcha method. The second one is the text field which contains the digits which are entered by the user. A second example includes the implementation of an image-based CAPTCHA in PHP where also text and numbers are visualized. It is more complex than the previous example and shown in Fig. 8.15 [14]. The user visualizes a CAPTCHA code which is composed of a sequence of six colored letters and numbers which are depicted on a background representing an image. The use of the background should make harder the attacks made by the automatic programs. The CAPTCHA code is shown on the top of the panel. The aim of the user is to enter the visualized CAPTCHA code inside the text box on the bottom. Then, the Verify Captcha button should be pressed for the final verification. Hence, if the user correctly enters the numbers and text of the CAPTCHA code inside the text box, he/she is considered as a human. Otherwise, he/she is considered as a

8.4 How to Create a Simple CAPTCHA

71

Fig. 8.15 A sample of image-based CAPTCHA with also text and numbers [14]

bot. The implementation of this image-based CAPTCHA with text and numbers is realized in PHP and jQuery, and is composed of two files: • The index.php file, and • The captcha.php file. The index.php file contains the code for loading the image of the CAPTCHA and the text box where the user should type the visualized code. The captcha.php file includes the code for generating the image of the CAPTCHA. The index.php file is reported below: