LOAD MODELLING AND GENERATION IN IP-BASED NETWORKS 978-3-658-19102-3, 3658191023, 978-3-658-19101-6

Andrey Kolesnikov proposes an interesting unified approach and corresponding tools for modelling and effective generatio

342 64 4MB

English Pages [325] Year 2017

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

LOAD MODELLING AND GENERATION IN IP-BASED NETWORKS
 978-3-658-19102-3, 3658191023, 978-3-658-19101-6

Table of contents :
Front Matter ....Pages i-xxi
Front Matter ....Pages 1-1
Introduction (Andrey Kolesnikov)....Pages 3-15
Foundations and Research Field (Andrey Kolesnikov)....Pages 17-60
Front Matter ....Pages 61-61
A Formal Workload Description Technique (Andrey Kolesnikov)....Pages 63-97
Examples of Load Models for Different Traffic Sources (Andrey Kolesnikov)....Pages 99-142
Front Matter ....Pages 143-143
Architecture of the Unified Load Generator (Andrey Kolesnikov)....Pages 145-165
Distributed UniLoG Architecture (Andrey Kolesnikov)....Pages 167-176
Load Generation at Network Layer Service Interfaces (Andrey Kolesnikov)....Pages 177-194
Load Generation at Transport Layer Service Interfaces (Andrey Kolesnikov)....Pages 195-216
Generation of Web Workloads (Andrey Kolesnikov)....Pages 217-247
Front Matter ....Pages 249-249
Estimation of QoS Parameters for RTP/UDP Video Streaming in WLANs (Andrey Kolesnikov)....Pages 251-261
Estimation of QoS Parameters for RTP/TCP Video Streaming in WLANs (Andrey Kolesnikov)....Pages 263-280
Front Matter ....Pages 281-281
Summary and Outlook (Andrey Kolesnikov)....Pages 283-286
Back Matter ....Pages 287-316

Citation preview

Andrey Kolesnikov

Load Modelling and Generation in IP-based Networks A Unified Approach and Tool Support

Load Modelling and Generation in IP-based Networks

Andrey Kolesnikov

Load Modelling and Generation in IP-based Networks A Unified Approach and Tool Support With a foreword by Prof. Dr. B. E. Wolfinger

Andrey Kolesnikov Hamburg, Germany Dissertation with the aim of achieving a doctoral degree at the Faculty of M ­ athematics, Informatics and Natural Sciences, Department of Computer Science, University of Hamburg, 2017.

ISBN 978-3-658-19101-6 ISBN 978-3-658-19102-3  (eBook) DOI 10.1007/978-3-658-19102-3 Library of Congress Control Number: 2017951224 Springer Vieweg © Springer Fachmedien Wiesbaden GmbH 2017 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Printed on acid-free paper This Springer Vieweg imprint is published by Springer Nature The registered company is Springer Fachmedien Wiesbaden GmbH The registered company address is: Abraham-Lincoln-Str. 46, 65189 Wiesbaden, Germany

Foreword The number of users demanding computer network services still keeps to be increasing at an enormous rate. This is true, in particular, for mobile networks and for the Internet. Moreover, the traffic generated per user is growing significantly so that a study by CISCO estimated the global annual amount of IP traffic in the Internet to be more than 1 ZB (= 1021 Byte) by the end of 2016. Also, the data rates required by the data streams to be transmitted tend to become increasingly challenging because, e.g., video communication gets more and more popular accompanied with a strong increase of the video quality demanded by the users. Of course, also the performance of network components will continue to increase at an astonishingly high rate such as in case of computer hardware (multicore processors), of switching nodes (optical switches) and in particular in data transmission media and techniques (as well in optical as in radio transmissions). Nevertheless, the extremely strong growth in the traffic to be transmitted can be expected to lead to numerous bottlenecks even in the high-speed networks which are currently emerging. Therefore, performance evaluation of communication systems and computer networks will certainly not become superfluous but, on the contrary, methods and tools will be needed which allow one to analyse (e.g., by means of measurements) how computer networks react to well-defined background loads or traffic peaks. To generate such (artificial but sufficiently realistic) background loads or traffic, dedicated load generators are indispensable, which should be applicable in a flexible and rather general manner. The elaboration of such a broadly applicable load (or traffic) generator, called UniLoG, has been the goal underlying the research documented by this book. In order to be able to generate highly realistic traffic in computer networks the UniLoG approach is as follows: The load generator produces sequences of requests (representing the load), which are handed over at a service interface within a computer network in the same manner as they would be handed over by a real service user at this interface to the component providing the service. As a consequence of the execution of the requests, communication by means

vi

Foreword

of data units (e.g., video frames, TCP segments, IP packets or Ethernet frames) is initiated, which finally represent the traffic in the network. The major contributions in this publication – representing the PhD thesis of the author – are impressive. The results achieved comprise: • a formal description technique (LoadSpec), which allows one to describe load in a unified manner based on a sequence of abstract requests being independent on the concrete interface underlying the load generation, • a variety of load models (e.g. for voice, video and Web traffic), which are specified by means of the author’s formal description technique and considerable effort is spent for a realistic parameterization of all models elaborated, • design of a highly modular architecture for the UniLoG load generator and full implementation of this tool in a very efficient manner, • accomplishment of a geographically distributed version of UniLoG, based on the manager-agent-paradigm, • realization of various adapters for very different interfaces of a computer network such as service interfaces of IPv4, TCP, UDP and HTTP, which proves successfully a unique feature of UniLoG, namely being usable to generate load at all interfaces of a complete network protocol stack (besides Physical Layer), • various case studies which, e.g., demonstrate that UniLoG can indeed be used to generate highly complex background loads in a very realistic manner. This innovative research report not only contains a lot of conceptually and theoretically interesting ideas but the results are also practically relevant. Among others they should be a valuable source of information for Internet Service Providers (ISPs) or Telecoms who are responsible for providing efficient network services in large and complex networks. Moreover, the results achieved should also be of significant relevance to researchers, developers of new network and distributed application services, network administrators, etc., who might be interested, e.g., to analyse what impact new services, change of user behaviour, or increasing load could have on network performance and user’s quality of experience (QoE). Hamburg, in June 2017

Bernd E. Wolfinger

Preface The accurate and realistic modelling and generation of network workload which may consist of a mix of many complex traffic sources is a difficult and challenging task. Analyses and generation of network workload, in particular in large-scale networks, can be aggravated by the heterogeneity and large number of used network devices and protocols, as well as different types of applications and services which may strongly evolve over time. Furthermore, the purpose of the workload modelling and, therefore, the objectives of the corresponding experimental tests and case studies may vary, e.g., from the performance evaluation analyses to the analyses of network neutrality and security mechanisms. Therefore, in order to keep up with the perpetually emerging new requirements and the corresponding technical challenges, networking research community needs to continuously improve the methods and tools used for workload modelling and generation. In this thesis, a unified approach for workload modelling and generation with general applicability in IP-based networks is elaborated and a set of the corresponding tools for the specification and generation of synthetic workloads is developed. The architecture of the Unified Load Generator UniLoG proposed and implemented in the thesis can be used for the generation of realistic workloads and traffic according to various workload and traffic models at different (e.g, application, transport, and network) service interfaces in IP-based networks. The proposed UniLoG architecture provides a high degree of flexibility, extensibility, and scalability in the workload generation process. Further, a set of concrete workload models for exemplarily chosen types of traffic sources (such as VoIP, video, and Web traffic) is elaborated and provided for load generation with UniLoG. Several experimental results related to the study of “hot topics” like performance and QoS analysis of video streaming applications are presented and emphasize how the proposed UniLoG load generator advances the state-of-the-art in workload modelling and generation.

Contents I

Introduction and Fundamentals

1

1 Introduction 3 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Objectives and Scope of the Thesis . . . . . . . . . . . . . . . 8 1.3 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . 10 2 Foundations and Research Field 2.1 Workload Modelling and Specification Techniques . . . . . . . 2.1.1 Selected Workload Modelling Techniques . . . . . . . 2.1.2 Selected Workload Specification Techniques . . . . . . 2.2 State-of-the-Art in Workload Generation . . . . . . . . . . . . 2.2.1 Web Workload and Traffic Generation . . . . . . . . . 2.2.2 Traffic Generation at Transport Layer Service Interfaces 2.2.3 Traffic Generation at Network Layer Service Interfaces 2.2.4 Traffic Generation at Data Link Layer Service Interfaces 2.2.5 Workload Tests in Research and Industry . . . . . . . 2.3 A Unified Approach to Workload Modelling and Generation in Computer Networks . . . . . . . . . . . . . . . . . . . . . .

17 18 23 27 30 34 40 44 47 55

II Workload Specification and Modelling

61

3 A Formal Workload Description Technique 3.1 The Basic Concept of User Behaviour Automata . . . . . . . 3.2 Generalisation of the Basic Concept of User Behaviour Automata 3.2.1 Definition of the Generalised User Behavior Automaton (UBA) . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Specification of Transitions between Elementary States 3.2.3 Aggregation of User States into Macro-States . . . . .

63 63 68

56

69 73 76

x

Contents 3.3

3.4

3.5 3.6 3.7

XML Schema Definition for the UBA Components . . . . . 3.3.1 XSD Simple Elements . . . . . . . . . . . . . . . . . 3.3.2 XSD Complex Elements . . . . . . . . . . . . . . . . 3.3.3 Introduction to the UBA Schema . . . . . . . . . . . Description of Abstract Requests and System Reactions . . 3.4.1 Relevant Abstract Request Types . . . . . . . . . . . 3.4.2 Semantics of Abstract Request Types . . . . . . . . 3.4.3 Definition of Abstract Request Types in the UBA Schema . . . . . . . . . . . . . . . . . . . . . . . . . Specification of Values for UBA Parameters . . . . . . . . . Syntax Rules for Context Expressions . . . . . . . . . . . . Specification of Complex User Environments . . . . . . . .

4 Examples of Load Models for Different Traffic Sources 4.1 Models for Speech Traffic Sources . . . . . . . . . . . . . . . 4.1.1 Voice Codecs with Constant Bit Rate . . . . . . . . 4.1.2 Voice Codecs with Silence Detection . . . . . . . . . 4.2 Modelling of Video Traffic Sources . . . . . . . . . . . . . . 4.2.1 Modelling of the GOP Structure . . . . . . . . . . . 4.2.2 Statistical Characterization and Modelling of Frame Lengths . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Partitioning into Shot Classes . . . . . . . . . . . . . 4.3 Modelling of Web Workloads with UniLoG . . . . . . . . . 4.3.1 UniLoG Approach for Web Workload Modelling and Generation . . . . . . . . . . . . . . . . . . . . . . .

III Workload Generation 5 Architecture of the Unified Load Generator 5.1 Basic Requirements . . . . . . . . . . . . . . . . . . . 5.1.1 Functional Requirements . . . . . . . . . . . 5.1.2 Non-functional Requirements . . . . . . . . . 5.2 Overview of the UniLoG Architecture . . . . . . . . 5.3 Generator Functionality . . . . . . . . . . . . . . . . 5.4 Adapter Functionality . . . . . . . . . . . . . . . . . 5.5 Real-time Requirements of Requests . . . . . . . . . 5.5.1 Impact of Multitasking . . . . . . . . . . . . 5.5.2 Latency Introduced by System Calls . . . . . 5.5.3 Latency Introduced by UniLoG Components

. . . . . . .

77 79 81 81 84 85 86

. . . .

89 91 94 95

. . . . .

99 99 100 102 108 109

. 113 . 126 . 131 . 134

143 . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

145 145 145 146 150 154 155 158 158 160 162

Contents

5.6

xi

5.5.4 Intrinsic Model Factors . . . . . . . . . . . . . . . . . 163 UniLoG Design Extensions for Multi-Core Platforms . . . . . 163

6 Distributed UniLoG Architecture 6.1 Prerequisites and Requirements . . . . . . . . . . . . . . . . 6.2 System Architecture for Distributed Load Generation . . . 6.3 Implementation Aspects . . . . . . . . . . . . . . . . . . . .

167 . 167 . 172 . 175

7 Load Generation at Network Layer Service Interfaces 7.1 Application Scenarios and Requirements . . . . . . 7.2 Design of the UniLoG.IPv4 Adapter . . . . . . . . 7.2.1 Abstract IP Requests . . . . . . . . . . . . 7.2.2 Types of Abstract System Reactions . . . . 7.3 Performance Evaluation . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

177 177 179 182 186 188

8 Load Generation at Transport Layer Service Interfaces 8.1 Design and Architecture of the UniLoG.TCP Adapter 8.1.1 Supported Types of Abstract Requests . . . . . 8.1.2 TCP Load Receivers . . . . . . . . . . . . . . . 8.1.3 Supported Types of Abstract System Reactions 8.1.4 Types of Supported Traffic Matrices . . . . . . 8.2 Performance Evaluation . . . . . . . . . . . . . . . . . 8.3 Aspects of the UniLoG.UDP Adapter Implementation

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

195 196 196 202 203 205 205 215

9 Generation of Web Workloads 9.1 Architecture of the UniLoG.HTTP Adapter . . . . . . . 9.2 Implementation Aspects . . . . . . . . . . . . . . . . . . 9.2.1 Browser Imitation . . . . . . . . . . . . . . . . . 9.2.2 Browser Integration . . . . . . . . . . . . . . . . 9.3 Construction of the Pool of Web Sites . . . . . . . . . . 9.3.1 Estimation of Abstract Workload Characteristics 9.3.2 Measurement Results . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

217 217 220 222 224 227 229 240

IV Applications of the UniLoG Load Generator 10 Estimation of QoS Parameters for RTP/UDP 10.1 Experimental Network . . . . . . . . . . . 10.2 Configuration of the Background Load . . 10.3 Streaming Quality Metrics . . . . . . . . . 10.4 Results and Discussion . . . . . . . . . . .

Video . . . . . . . . . . . . . . . .

. . . . .

249

Streaming 251 . . . . . . . 252 . . . . . . . 255 . . . . . . . 257 . . . . . . . 258

xii

Contents

11 Estimation of QoS Parameters for RTP/TCP Video Streaming 263 11.1 Experimental Network . . . . . . . . . . . . . . . . . . . . . . 265 11.2 Configuration of the TCP Background Load . . . . . . . . . . 268 11.3 Measurement Results and Discussion . . . . . . . . . . . . . . 269 11.3.1 Streaming in the Experimental Network without Background Load . . . . . . . . . . . . . . . . . . . . . . . 270 11.3.2 Streaming in the Experimental Network under Background Load . . . . . . . . . . . . . . . . . . . . . . . 273 11.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278

V Results and Conclusions

281

12 Summary and Outlook 283 12.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . 283 12.2 Outlook on Future Work . . . . . . . . . . . . . . . . . . . . . 286 A Context Expression Functions

287

Bibliography

291

List of Acronyms ACPI

Advanced Control and Power Interface

AJAX

Asynchronous JavaScript and XML

AMR

Adaptive Multi-Rate

API

Application Programming Interface

ATM

Asynchronous Transfer Mode

AQM

Active Queue Management

BBB

Big Buck Bunny

BMAP

Batch Markovian Arrival Process

BRAS

Broadband Remote Access Server

CA

Certification Authority

CBR

Constant Bit Rate

CCP

Compression Control Protocol

CDF

Cumulative Distribution Function

CDN

Content Delivery Network

CET

Central European Time

CIFS

Common Internet File System

COM

Component Object Model

COTS

Commercial off-the-shelf

CPU

Central Processing Unit

CSRF

Cross-Site-Request-Forgery

CSS

Cascading Style Sheet

DCCP

Datagram Congestion Control Protocol

DNS

Domain Name System

DOM

Document Object Model

DPDK

Data Plane Development Kit

DSCP

Differentiated Service Code Point

xiv

List of Acronyms

DSL

Digital Subscriber Line

ECDF

Empirical Cumulative Distribution Function

ECN

Explicit Congestion Notification

EPMF

Empirical Probability Mass Function

EQ

Event Queue

FCFS

First-Come, First-Served

FDT

Formal Description Technique

FEC

Forward Error Correction

FSM

Finite State Machine

FTP

File Transfer Protocol

GOP

Group of Pictures

GSM

Global System for Mobile Communications

GUI

Graphical User Interface

HDTV

High Definition Television

HMM

Hidden Markov Model

HPET

High Precision Event Timer

HTML

Hypertext Markup Language

HTTP

Hypertext Transfer Protocol

HTTPS HTTP Secure IANA

Internet Assigned Numbers Authority

IAT

Inter-arrival Time

ICMP

Internet Control Message Protocol

IDPS

Intrusion Detection and Prevention System

iLBC

Internet Low Bit Rate Codec

IP

Internet Protocol

IPv4

Internet Protocol Version 4

IPv6

Internet Protocol Version 6

IPTV

Internet Protocol Television

iSAC

internet Speech Audio Codec

ISDN

Integrated Services Digital Network

ISP

Internet Service Provider

List of Acronyms ITU

International Telecommunication Union

JSON

JavaScript Object Notation

LAN

Local Area Network

MAC

Medium Access Control

MIME

Multi-Purpose Internet Mail Extensions

MLE

Maximum Likelihood Estimator

MMPP Markov Modulated Poisson Process MPEG

Moving Picture Experts Group

MPI

Message Passing Interface

MTU

Maximum Transmission Unit

NALU

Network Abstraction Layer Unit

NAT

Network Address Translation

NIC

Network Interface Card

NNTP

Network News Transfer Protocol

NPT

Network Port Translation

NTP

Network Time Protocol

OS

Operating System

OSI

Open Systems Interconnection

PCAP

Packet Capture

PCM

Pulse Code Modulation

PDF

Probability Density Function

PDU

Protocol Data Unit

POP3

Post Office Protocol Version 3

PPBP

Poisson Pareto Burst Process

PPP

Point-to-Point Protocol

PTP

Precision Time Protocol

QoS

Quality of Service

QPC

QueryPerformanceCounter

RQ

Request Queue

RMI

Remote Method Invocation

RTCP

RTP Control Protocol

xv

xvi

List of Acronyms

RTP

Real-Time Transport Protocol

RTSP

Real-Time Streaming Protocol

SAP

Service Access Point

SCTP

Stream Control Transmission Protocol

SDP

Session Description Protocol

SDU

Service Data Unit

SIP

Session Initiation Protocol

SMB

Server Message Block

SMTP

Simple Mail Transfer Protocol

SNMP

Simple Network Management Protocol

SNTP

Simple Network Time Protocol

SOA

Service-oriented Architecture

SUT

System Under Test

TCP

Transmission Control Protocol

TLS

Transport Layer Security

ToS

Type of Service

TSC

Time Stamp Counter

TTL

Time to Live

UBA

User Behavior Automaton

UDP

User Datagram Protocol

URL

Uniform Resource Locator

VAD

Voice Activity Detection

VBR

Variable Bit Rate

VoD

Video on Demand

VoIP

Voice over Internet Protocol

VPN

Virtual Private Network

W3C

World Wide Web Consortium

WLAN

Wireless Local Area Network

WWW

World Wide Web

XML

Extensible Markup Language

XSD

XML Schema Definition

List of Figures 2.1

Unified approach to workload modelling illustrated for the case of modelling at the IPv4 network service interface. . . . . . . . . 58

3.1 3.2

. 64

A UBA on the level of macro-states. . . . . . . . . . . . . . . . Modelling of a user at the Web browser command interface by means of a UBA. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Modelling of a user at the HTTP service interface by means of a UBA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The classification of data types supported by XML Schema. . . 3.5 UBA schema definition file. . . . . . . . . . . . . . . . . . . . . 3.6 An example of a UBA model file including a reference to the UBA schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Mapping of the abstract request types from the modelling domain onto the system calls at the TCP socket interface. . . . . . . . 3.8 Specification of valid abstract request types in the schema. . . 3.9 Definition of the request type SendDataBlock to model the TCP socket send() call. . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Definition of the complex type ValueSpec for the specification of values for different UBA parameters. . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5

. 66 . 67 . 80 . 82 . 84 . 88 . 90 . 91 . 92

A UBA model of the G.711 output voice stream, control information ignored. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A UBA model of the G.711 output voice stream, control information included. . . . . . . . . . . . . . . . . . . . . . . . . . . . Output stream of the G.723.1 codec with silence intervals and talkspurts interrupted by short breaks. . . . . . . . . . . . . . . . A UBA describing an ON/OFF model for the voice stream from G.723.1 codec. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A simple UBA model using a chain of six R-states to represent the IBBP BB... sequence of H.264-coded video frames. . . . . .

102 103 104 105 110

xviii 4.6 4.7 4.8 4.9 4.10 4.11 4.12 4.13 4.14 4.15 4.16 4.17 4.18 4.19

List of Figures A universal UBA model using three R-states and conditional state transitions to model different possible GOP structures. . . IP throughput of the BBB video stream at the client side (RTSP streaming using RTP over UDP in a 100 Mbit/s Fast Ethernet). I-frame lengths (in bytes) from BBB video: EPMF, fit by means of the Log-normal and Gamma PDFs. . . . . . . . . . . . . . . . I-frame lengths (in bytes) from BBB video: ECDF, fit by means of the Log-normal and Gamma CDFs. . . . . . . . . . . . . . . . P-frame lengths (in bytes) from BBB video: EPMF, fit by means of the Log-normal and Gamma PDFs. . . . . . . . . . . . . . . . P-frame lengths (in bytes) from BBB video: ECDF, fit by means of the Log-normal and Gamma CDFs. . . . . . . . . . . . . . . . B-frame lengths (in bytes) from BBB video: EPMF, fit by means of the Log-normal and Gamma PDFs. . . . . . . . . . . . . . . . B-frame lengths (in bytes) from BBB video: ECDF, fit by means of the Log-normal and Gamma CDFs. . . . . . . . . . . . . . . . Partitioning of the GOPs into shot classes using geometric boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A universal UBA model for H.264-coded video sources with two shot classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modelling approach MA1 . . . . . . . . . . . . . . . . . . . . . . . Modelling approach MA2 . . . . . . . . . . . . . . . . . . . . . . . Modelling approach MA3 . . . . . . . . . . . . . . . . . . . . . . . Retrieval of multiple pages from the Web server www.foo.com and the corresponding UBA model. . . . . . . . . . . . . . . . . .

111 114 115 116 120 121 123 124 127 129 132 132 133 134

5.1

Overview of the basic UniLoG architecture. . . . . . . . . . . . . 150

6.1

Architecture of the system for distributed load generation on the basis of the UniLoG load generator. . . . . . . . . . . . . . . . . . 173

7.1 7.2

Architecture of the UniLoG.IPv4 adapter. . . . . . . . . . . . . Example of the abstract request type InjectIPPacket to model the generation of IPv4 packets at the network service interface. Experiment duration (time to generate 10 · 106 IPv4 send() requests) for different IPv4 payload sizes. . . . . . . . . . . . . The rate of blocking IPv4 send() requests achievable for different IPv4 payload sizes. . . . . . . . . . . . . . . . . . . . . . . . . . The data rate of the IPv4 flow achievable on a Gigabit Ethernet link for different IPv4 payload sizes. . . . . . . . . . . . . . . .

7.3 7.4 7.5

. 179 . 183 . 190 . 191 . 191

List of Figures

xix

8.1 8.2

197

8.3

8.4

8.5 8.6 8.7

9.1 9.2 9.3 9.4

Architecture and basic components of the UniLoG.TCP adapter. . Definition of the abstract TCP request type TCPOpenRequest (localTCPPort, remoteIPAddress, remoteTCPPort) to model the generation of active TCP connection requests. . . . . . . . . Definition of the abstract TCP request type TCPSendRequest (payloadBuffer, payloadLength) to model the transmission of user data from the payloadBuffer array. . . . . . . . . . . . . Definition of the abstract TCP request type TCPCloseRequest (localTCPPort, remoteIPAddress, remoteTCPPort) to model the connection close requests. . . . . . . . . . . . . . . . . . . . . Experiment duration (the time required to generate a total of 10 · 106 TCP send() requests) for different TCP payload sizes. . The rate of blocking TCP send() requests achievable for different TCP payload sizes. . . . . . . . . . . . . . . . . . . . . . . . . . . The data rate of the TCP stream achievable on a Gigabit Ethernet link for different TCP payload sizes. . . . . . . . . . . . . . . . . Architecture and basic components of the UniLoG.HTTP adapter. Algorithm used to find the best matching HTTP request from the pool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Modules involved into the analysis of induced HTTP traffic and estimation of the abstract Web workload characteristics. . . . . . List of the connections established to the server(s) which are involved into the delivery of the page www.example.com. . . . . .

198

199

201 209 210 210 218 219 232 238

10.1 Experimental network: 100 Mbit/s Fast Ethernet, 54 Mbit/s IEEE 802.11g WLAN, transmission of the BBB video stream by means of RTP over UDP. . . . . . . . . . . . . . . . . . . . . . . 253 10.2 Quality metrics for the BBB video stream observed under different levels of background loads. . . . . . . . . . . . . . . . . . . . . . 260 11.1 Experimental network: 1 Gbit/s Gigabit Ethernet, 54 Mbit/s IEEE 802.11g WLAN, transmission of the BBB video stream by means of RTP over TCP. . . . . . . . . . . . . . . . . . . . . . . 266 11.2 IP throughput and streaming statistics of the BBB video stream (no background load, TCP receive buffer size RCVBUFF in the VoD client set to 17520 Byte). . . . . . . . . . . . . . . . . . . . . . . 271 11.3 IP throughput and streaming statistics of the BBB video stream (no background load, TCP receive buffer size RCVBUFF in the VoD client increased to 65535 Byte). . . . . . . . . . . . . . . . . . . . 272

xx 11.4 11.5 11.6 11.7 11.8

List of Figures IP throughput of the BBB video stream. . . . . . . . . . . Jitter values in the BBB video stream. . . . . . . . . . . . Number of RTP sequence errors in the BBB video stream. Number of lost RTP packets in the BBB video stream. . . Duplicate TCP segments in the BBB video stream. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

274 275 276 277 278

List of Tables 4.1 4.2

4.3 4.4

Packet Types in the G.711 and the G.729.1 Codecs . . . . . . Measurements of ON /OFF phase durations (classical approach and approach from [MBM09]) using the set of typical telephone conversations available in [BAS96]. . . . . . . . . . Geometric partitioning of the BBB video into n shot classes for n = 2, 3, . . . , 7. . . . . . . . . . . . . . . . . . . . . . . . . Inter-shot class transition probability matrix P for n = 7 shot classes S1 –S7 . . . . . . . . . . . . . . . . . . . . . . . . . . . .

100

107 128 130

6.1

Control commands supported by the UniLoG load agents. . . 174

8.1

TCP send() request rate and utilization of the Gigabit Ethernet link measured during the generation of 10 · 106 requests using different sizes of the TCP send buffer. . . . . . . . . . . 208

9.1

Summary of measurement results for abstract Web workload characteristics of Web pages. . . . . . . . . . . . . . . . . . . 242 Number of pages in different classes of inducedServerLoad and inducedServerProcessingTime. . . . . . . . . . . . . . . 244

9.2

10.1 Parameters of the IP background load streams in the case study. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 11.1 Parameters of the TCP background load streams. . . . . . . . 269

Part I.

Introduction and Fundamentals

1. Introduction 1.1. Motivation The trend towards the convergence and integration of media and communication services on the basis of the IP protocol, the variety of different types of load sources and the rapid growth of the number of network users, applications and end systems lead to complex traffic loads in communication and computer networks [Lea10]. According to the Cisco Visual Networking Index (Cisco VNI) global IP traffic forecast [VNI15], this trend will further continue in the coming years. Such, the global IP traffic will reach 1.1 zettabytes per year or 88.4 exabytes (one billion gigabytes) per month in 2016. By 2019, global IP traffic is expected to pass a new milestone of 2.0 zettabytes per year, or 168.0 exabytes per month. It is very important and challenging, that the busy-hour IP traffic in the Internet (observed during the busiest 60-minute period in a day) is growing more rapidly than the average Internet traffic. Busy-hour Internet traffic increased 34 percent in 2014, compared with 26 percent growth in average traffic per year, and will reach 1.7 petabits per second (Pbps) by 2019 [VNI15]. Internet video traffic continues to be a major area of traffic volume growth and may exceed an 80 percent mark of all global consumer Internet traffic in 2019 (up from 64 percent in 2014 according to Cisco VNI). Further, there is a continuing shift of Internet usage towards mobile and Wi-Fi devices with the mobile device users expecting a similar video quality as they get on wired PCs or TV sets. By 2019, wired devices may account for 33 percent of IP traffic, while Wi-Fi and mobile devices may account for 67 percent of IP traffic. The complex traffic loads offered to communication networks and services from an increasing number of application and service users may cause transmission congestion which would induce an unsatisfactory provisioning of Quality of Service (QoS) for network applications. A solid understanding of load characteristics in the real world is a basic prerequisite to investigate the reasons why the performance, the architecture, and the QoS of applications © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_1

4

1. Introduction

are significantly influenced by some specific traffic scenarios. Therefore, measurement, characterization, and modelling of real workloads are very important tasks driving the design of new cost-effective applications and services as well as the optimization of the existing ones. In general, the workload characterization is concerned with the identifying of the basic components that compose the target workload, which depend both on the nature of the observed application or service and on the purpose of the characterization (i.e., on the particular research objectives). Further, workload modelling consists of building a representation that mimics the real workload under study, based on the identified components [AlA11]. Such, modelling of user behaviour during the design of networked systems can focus on predicting individual user behaviour (e.g. in order to provide personalized services) or concentrate on global system behaviour (e.g. in order to evaluate the important system performance characteristics). Realistic workload models can support cost-effective capacity planning, dimensioning, and management decisions, and drive performance analysis and optimization studies. Furthermore, they can assist in generating synthetic workloads for experimental purposes. Analysis of performance and behaviour of networks and their offered services under various load scenarios has therefore become a very important issue for owners of large networks, in particular during network planning and administration. Networking research has long relied on network simulation as the primary means to analyse and demonstrate the effectiveness of proposed algorithms and mechanisms. Possible approaches may be to construct a network testbed and conduct experiments with actual network hardware and software, or to simulate network hardware and software and conduct experiments by means of such simulations of the network [CCG04]. In these cases, the experimenter proceeds by simulating the use of the real or simulated network by a population of users with applications such as, e.g., Web browsing, video streaming, electronic mail, file transfer, or peer-to-peer file sharing. At this point, synthetic (or artificial) loads are of significant importance, because real applications typically cannot be controlled to the extent that they produce exactly a predefined load or traffic required for experimentation. Synthetic network load is usually created according to a model of how the corresponding applications or users behave at specified network interfaces and refers to loads which are induced into the network, e.g., in order to reflect the behaviour of additional users, sessions, connections, etc., or to generate a dedicated background load or a peak load for the network. Accordingly, synthetic (or artificial) load and traffic generators denote the

1.1. Motivation

5

correspondent hardware and software components which are used to create synthetic loads and inject synthetic traffic at different interfaces in real or simulated networks according to a workload or traffic model specified by the experimenter. At this point, an appropriate representation of the workload model to be used for traffic generation becomes a very important issue. First, the representation of the workload model must allow its execution (e.g., in a software traffic generator) and should be, therefore, sufficiently precise, formal, and complete. Second, considering a plethora of workload characterization and modelling studies for different applications and services with their respective objectives, an elaboration of a generally applicable formal workload description technique which would provide an appropriate executable representation for different (types of) workload models is of particular value. An integration of such a formal workload description (workload specification) technique in a load generator would provide an enhanced flexibility and not restrict the use of a load generator to a single and specific implemented model. Traffic generation for experimental networking research has been traditionally classified into packet-level and source-level traffic generation. The classical packet-level traffic generation has been usually associated with generation of traffic at lower network interfaces (i.e., below the transport service interfaces like Transmission Control Protocol (TCP)). It can be performed, e.g., by a simple packet-level replay, i.e., reproducing the exact arrivals and sizes of every packet observed traversing a real link. Or it can also refer to injecting packets in a manner as to preserve some set of statistical properties and analytical characteristics (of packets from the real link) which have been considered fundamental, or relevant for a specific experiment. The classical packet-level replay is a conceptually simple and straightforward technique which, however, may involve a number of engineering challenges. The implementation can range from free software tools such as TCPreplay [TCPrp] to commercial large-scale, feature-rich, and highperformance hardware appliances like, e.g., the Spirent AX/4000 [AX4000] or Ixia Optixia series of test modules for 40 and 100 Gigabit Ethernet [IXIA]1 . The technique is often very useful in situations when the traffic to be generated is invariant of the experimental network configuration and is unresponsive to the experimental and network conditions. For example, the TCPreplay tool has originally been designed to replay malicious traffic patterns to Intrusion Detection and Prevention Systems (IDPSs) and an 1A

first test solution for 400 Gigabit Ethernet has been provided by IXIA at the time of writing this thesis, cf. [IXIA2].

6

1. Introduction

evolution of it includes capabilities to replay malicious traffic to Web servers. However, packet replay techniques may be restricted in their flexibility since the researcher has to limit his experiments to the publicly available traces (e.g., from traffic archives at MAWI [MAWI], CAIDA [CAIDA], or WAND [WITS] research groups) and their respective characteristics. Further (and most important), Paxson and Floyd argued already in their influential work [PaF95] that the classical packet-level traffic generation breaks the end-toend feedback loop (“closed loop”) in adaptive network protocols (such as TCP or RTP/RTCP). Thus, the resulting “open-loop” traffic does not realistically react to the experimental conditions and fails to preserve an essential “closed-loop” property of Internet traffic. Therefore, significant effort has been required in order to introduce network dependency and responsiveness into the structural models used for realistic packet-level traffic generation (cf., e.g., [SoB04, ViV09]). In contrast, classical source-level traffic generation has been associated with the development of realistic models of application behaviour (like, e.g., file transfer, electronic mail, Web, voice, and video traffic, or peer-to-peer file sharing). Floyd and Paxson [FlP01] stressed the particular importance of using source-level traffic generators layered over real or simulated TCP implementations in order to produce application-dependent but networkindependent synthetic traffic that corresponds to a valid, contemporary model of application or user behaviour. This modelling paradigm allows “closed-loop” traffic generation and, therefore, it is far more realistic and applicable to a wider range of scenarios. However, single application models may not be representative of real traffic mixes as they are created by a large number of different applications and service users, e.g., in the Internet. Therefore, the scalability of the traffic generation process becomes a very important issue. Further, the composition of the traffic mixes to be generated may change and the individual applications may also evolve in the way in which they interact with the network, so that the used models have to be modified in the consequence. Despite of the emerging strengths and weaknesses of the classical methods for traffic generation one can question whether a combination of these methods in a kind of hybrid approach can be possible. For example, can we combine the simple and realistic traffic generation (from the classical packet-level methods at lower network interfaces) with the flexibility and responsiveness to network conditions (as it is usually the case with the source-level methods at the application layer interfaces) and provide an enhanced realism, scalability and performance in a single coherent approach? These questions provide the main motivation for this dissertation.

1.1. Motivation

7

A number of dedicated model-based load and traffic generators exists in the academic research, e.g. for Web traffic (Surge [BaC98], Guernica [OSPG09] in real testbeds or [LAJ07] in network simulations), UDP and TCP streams (ITG [APV04], BRUTE [BGPS05], Swing [ViV09]) or IP traffic loads (Harpoon [SoB04], BRUNO [APF08a]). We will describe the existing workload and traffic generators in Sec. 2.2 in more detail. The existing solutions usually address their specific modelling objectives and, therefore, they often do not provide an adequate flexibility in case the underlying model is to be modified or a completely different (type of) model is to be used. An interesting approach for traffic and workload generation, from the point of view of the supervising professor and the author of this PhD thesis, would be the provisioning of a unified tool which allows the researcher to: • generate load as sequence(s) of requests at an interface which can be chosen dependent on the kind of study the experimenter is carrying out from a set of service interfaces supported by the tool, • use different (types of) workload models for application or service users behaviour, • combine different modelling methodologies (e.g., use of packet traces or analytical modelling of load characteristics for the traffic generation process), • provide a sufficient scalability in order to generate traffic mixes representative of a large number and different types of application or service users, • if needed, support the generation of traffic loads which are dependent on the changing network conditions. This approach has been followed by the author of this thesis during the development of the Unified Load Generator UniLoG [KoK10, KoW11, Kol12]. The basic principle underlying the design and elaboration of UniLoG has been to start with a formal description of an abstract load model by means of a finite User Behavior Automaton (UBA) 2 and thereafter to use an interface-dependent adapter to map the abstract requests to the concrete requests as they are “understood” by the service providing component at the real interface in question. 2 as

it has been introduced in [WoK90] and used later, e.g., in [Wol99, Bai99, Con06].

8

1. Introduction

Using this basic principle, the author aims at further combining the load measurement, modelling, and generation into a single, coherent approach. The main goal of this thesis is to provide the corresponding building blocks (“ingredients”) required to fulfil this task.

1.2. Objectives and Scope of the Thesis The main focus of this dissertation is on the elaboration of a unified approach to modelling and generation of synthetic (artificial) workloads at different service interfaces in IP-based networks and providing the corresponding tool support. From this main objective, the following goals arise for the thesis: • Elaboration of the detailed architecture of a tool (Unified Load Generator, UniLoG) for workload and traffic generation and its implementation on the basis of an operating system without explicit real-time extensions using a commodity hardware with either single-core or multi-core CPUs. • Development of a formal workload description technique and integration into the UniLoG load generator in order to provide for its general applicability. An example of such a formal workload description technique is the method based on the concept of a finite User Behavior Automaton (UBA) first introduced in [WoK90] and used later in the research of the TKRN working group at the University of Hamburg. In this thesis, we will use the basic concept of the UBA as a foundation and systematically rework it, elaborating a series of extensions and generalisations of the original concept required in order to provide executable representations of different state-of-the-art workload models (partly developed by us or taken from the literature). A corresponding LoadSpec tool for the specification of workload models using an extended and generalized UBA concept is to be implemented and should become an integrated part of the UniLoG load generator. • Demonstrate the application of the proposed unified approach to modelling of different types of workloads and traffic sources in Internet Protocol (IP)based networks. Present the use of the formal UBA-based workload description technique for the specification of concrete workload models to be used for load generation in UniLoG. In particular, the use of the formal workload description technique is to be shown for the specification of 1) a selected set of existing workload models chosen from the literature (e.g., for voice and video traffic sources), and 2) the workload models developed

1.2. Objectives and Scope of the Thesis

9

by us in the scope of this thesis (e.g., for the transmission of H.264-coded video streams and Web traffic). • On the basis of the proposed UniLoG architecture, design of an enhanced and scalable system for geographically distributed load generation, consisting of load agents, which provide the load generation service to the experimenter, and the management station, which enables the experimenter to remotely configure, control and monitor the load generators from one central point. • Provisioning of a set of interface-specific adapters for generation of concrete loads and injection of the corresponding traffic at network layer (e.g., IP), transport layer (e.g., UDP and TCP), or application layer (e.g., HTTP) service interfaces, including detailed analyses of the performance characteristics of these new adapters. • Conduct a series of comprehensive Quality of Service (QoS) studies for multimedia applications in various realistic scenarios in wired and wireless networks in order to demonstrate the practical use of UniLoG for generation of different types of loads. It should be noted, that the main focus of this dissertation is on the formal description (specification) and generation of workloads, while the development of concrete models for user behaviour of different applications and services is not aimed to be the primary field of our activity. We argue that due to the integration of the formal workload description technique into the UniLoG load generator, we will be able to build executable representations (in form of the corresponding UBAs) not only for workload models developed by us in this thesis, but also for different (types of) workload models existing in the research community. In this way, we can provide an enhanced flexibility and general applicability of our load generator and will not restrict its use only to a set of specific predefined models shipped with it. The expected outcomes of the current research project include, among others, the following results: • With the embedded formal automata-based load specification technique, the UniLoG approach provides a high level of abstraction and flexibility during load modelling. A major contribution of this thesis is expected in the ability to confirm that an astonishingly precise and effective generation of traffic loads in real time can be guaranteed at this high level of abstraction.

10

1. Introduction

• From the geographically distributed load generation facility incorporated in the new architecture, significant gains are expected in situations, where the centralized load generation may not satisfy the experimenter’s goals like, e.g., the generation of traffic streams with very high bit rates (in the order of many Gbit/s) at the selected target link, network node or subnet, generation of complex traffic mixes resulting from a large number of different load sources in irregular meshed networks, or the testing of the effectiveness of security mechanisms. • In the context of the comprehensive case studies for various practically relevant scenarios in wired and wireless networks, it is expected to obtain, e.g., concrete results for QoS parameters of live video streaming under various background loads and to demonstrate the practical use and the potential application fields of load generators. Finally, in combination with the integrated workload specification technique, a predefined set of user behaviour models, adapters and the distributed load generation facility, UniLoG is expected to provide a highly universal and effective tool for load generation in IP-based networks.

1.3. Structure of the Thesis The structure of the thesis results from its main objective and the particular goals defined in Sec. 1.2. The thesis is comprised of twelve chapters which are organized in the following five parts:

Part I “Introduction and Fundamentals” Chapter 1 “Introduction”: In the introductory Chapter 1 we first start with the extensive motivation for this dissertation in Sec. 1.1 and define the main goals and objectives of the thesis in Sec. 1.2. The organisation of the thesis in five parts consisting of a total of twelve chapters is described in the current Sec. 1.3. Chapter 2 “Foundations and Research Field”: In Chapter 2 we start with an overview of the fundamental research work in the area of workload modelling and specification. In Sec. 2.1 we first present some prominent examples of workload models for selected applications and services in IP-based networks (e.g., voice and video applications, Hypertext

1.3. Structure of the Thesis

11

Transfer Protocol (HTTP) and File Transfer Protocol (FTP) applications, etc.) and then move on to the existing methods for the formal description (specification) of such workload (models) which are indispensable for the generation of a realistic load and traffic both in network simulations and in the real network test beds. An overview of the state-of-the-art in workload and traffic generation along with the discussion of the existing softwarebased traffic generators is given in Sec. 2.2. Finally, the unified approach to workload modelling and generation in computer networks first introduced in [WoK90] along with the basic concept of the UBA used in this thesis for the formal description (specification) of workloads is presented in Sec. 2.3.

Part II “Workload Specification and Modelling” Chapter 3 “A Formal Workload Description Technique”: In Sec. 3.1 we present the basic concept of the UBA for the specification of workloads as it has been introduced in [WoK90] and been later used in [Bai99, Wol99, Con06]. In Sec. 3.2 we present our proposal for the generalisation and extension of the basic UBA concept introducing the elementary states and the context in the UBA along with an aggregation function for the elementary states and context expressions as a means to describe the context changes in the automaton. Our next proposal to use XML Schema Definition for the specification of UBA components is described in Sec. 3.3 and the concrete application of Extensible Markup Language (XML) Schema for the specification of the abstract request and reaction types, values of different UBA parameters, and context expressions is illustrated in Sec. 3.4, Sec. 3.5, and Sec. 3.6, correspondingly. Finally, we close this chapter by introducing a means for the description of complex user environments which may consist of a different number and different types of service users in Sec. 3.7. Chapter 4 “Examples of Load Models for Different Traffic Sources”: In this chapter we present the applications of the proposed unified approach to modelling of different types of workload and traffic sources in IP networks. We remark, that all UBA models presented in Chapter 4 have been constructed using the LoadSpec tool for workload modelling and specification developed by the author of this thesis.

12

1. Introduction

We start with voice traffic sources in Sec. 4.1 and describe the construction of the corresponding UBA models for different voice codecs generating Constant Bit Rate (CBR) traffic (e.g., G.711 and G.729.1) and Variable Bit Rate (VBR) traffic (e.g., G.723.1 and iLBC). In Sec. 4.2 we deal with video traffic sources and start first with the construction of a universal UBA model for different Group of Pictures (GOP) structures of video frames being used by the H.264 encoder. Thereafter, we conduct comprehensive statistical analysis and characterisation of the frame lengths for a chosen VBR H.264-encoded video in order to provide concrete values for the parameters of our proposed universal UBA model. At the end of Sec. 4.2 we demonstrate an extension of the model in order to consider the fragments of different motion intensity in the original video by means of the corresponding shot classes. In Sec. 4.3 we present the application of the proposed unified approach to modelling of Web traffic and Web server workloads. In particular, we identify the prevalent characteristics of Web workloads, give their exact definitions as they will be used in the thesis and show how they can be incorporated into a UBA model for Web service users. The corresponding methods for the estimation of concrete values for the specified Web workload characteristics are described later in Sec. 9.3 in Chapter 9.

Part III “Workload Generation” Chapter 5 “Architecture of the Unified Load Generator”: In this chapter we describe the design and the architecture of the Unified Load Generator UniLoG which has been elaborated by the author of this thesis according to the unified approach to load modelling and generation presented in Chapter 2 Sec. 2.3. We start with the basic functional and nonfunctional requirements to such a load generator in Sec. 5.1, give an outline of its general architecture in Sec. 5.2 and then discuss the responsibilities of its main components, the Generator of abstract requests (in Sec. 5.3) and the Adapter (in Sec. 5.4). In Sec. 5.5 we explicitly discuss the real-time requirements of the load generation process and present the corresponding solutions developed by us when the UniLoG architecture is implemented on the basis of an operating system without explicit real-time extensions. Finally, in Sec. 5.6 we address the scalability issues of the proposed architecture in respect of multiprocessing on current multi-core systems also in combination with network interface cards implementing multiple independent receiving and transmitting (RX/TX) queues.

1.3. Structure of the Thesis

13

Chapter 6 “Distributed UniLoG Architecture”: In this chapter we first explain the typical application scenarios of a system for geographically distributed load generation and specify the corresponding requirements to such a system in Sec. 6.1. In Sec. 6.2 we present our design of the system for distributed load generation based on the architecture of the UniLoG load generator presented in Chapter 5 and address the important aspects of the implementation of its main components in Sec. 6.3. Chapter 7 “Load Generation at Network Layer Service Interfaces”: In this chapter we describe the design of an adapter for the UniLoG load generator for the Internet Protocol Version 4 (IPv4) network service interface (we call it UniLoG.IPv4). We begin with an overview of the typical application scenarios for load and traffic generators at the IPv4 interface and determine the corresponding requirements to such a load generator in Sec. 7.1. The detailed architecture of the UniLoG.IPv4 adapter and the supported types of abstract requests and system reactions are presented in Sec. 7.2. Finally, in Sec. 7.3 we describe the experiments conducted by us in order to systematically analyse the main performance characteristics of the new adapter for different packet lengths. We discuss the obtained measurement results and compare the performance characteristics of UniLoG.IPv4 with the characteristics of the other IP traffic generators available in the research community. Chapter 8 “Load Generation at Transport Layer Service Interface”: We begin this chapter with an extensive motivation for the development of specific UniLoG adapters for transport service interfaces in respect of the adapter developed for the IPv4 network service interface which has been presented in Chapter 7. In Sec. 8.1 we describe the architecture of the UniLoG.TCP adapter, the required TCP load receiver components, the supported types of abstract requests and system reactions as well as the possible types of traffic matrices which can be generated in UniLoG.TCP. In Sec. 8.2 we describe the experiment series and discuss the results of measurements conducted by us in order to obtain the performance characteristics of the UniLoG.TCP adapter. In particular, we analyse the influence of the socket send buffer size and the Nagle algorithm used in the TCP sender on the values of the performance metrics of the adapter. As the architecture of the UniLoG adapter for the User Datagram Protocol (UDP) service interface is quite similar to the architecture of the UniLoG.TCP adapter, we shortly

14

1. Introduction

explain only the key differences between these adapters in Sec. 8.3. Finally, we mention the different versions of the UDP adapter (among others, its prototypical implementation on top of a real-time operating system) provided by the author in the scope of this thesis. Chapter 9 “Generation of Web Workloads”: In this chapter we continue the elaboration of the UniLoG approach to the generation of Web traffic and Web server loads initiated in Sec. 4.3 where we have introduced the basic modelling concept and the main abstract characteristics of Web workloads. In Sec. 9.1 we explain the architecture of the UniLoG.HTTP adapter and our proposal to use a pool of Web pages with predetermined characteristics for the generation of Web traffic and Web server loads. In Sec. 9.2 we present two possible methods to provide such an adapter by imitating the behaviour of a real HTTP user agent (browser imitation method) or integrating a real HTTP user agent directly in the adapter (browser integration method). In Sec. 9.3 we describe the procedure for the construction of a comprehensive, representative, and stable pool of Web pages required to use UniLoG.HTTP for load generation. In particular, we explain the algorithms for the estimation of values for abstract Web workload characteristics in Sec. 9.3.1, present concrete measurement results obtained for a set of popular Web pages using the browser integration method and compare them to the measurement results obtained for the same set of Web pages using the browser imitation method in Sec. 9.3.2.

Part IV “Applications of the UniLoG Load Generator” In this part of the thesis we demonstrate the concrete use of the UniLoG load generator in various practically relevant scenarios. For both case studies presented in Chapter 10 and Chapter 11 we picked up a video streaming scenario which is quite often taking place in home area or small business corporate networks. The case studies use a different configuration of the experimental network and different transport techniques for the video stream so that different types of background loads are required to be generated using the appropriate UniLoG adapters. Chapter 10 “Estimation of QoS parameters for RTP/UDP video streaming in WLANs”: In this case study we conduct experiments with the Real-Time Streaming Protocol (RTSP) streaming application using the Real-Time Transport

1.3. Structure of the Thesis

15

Protocol (RTP) over UDP packet stream to transmit an H.264 encoded video under different IPv4 background loads. We describe the configuration of the experimental network, the IPv4 background loads used in the experiments, and define the quantitative metrics used to characterise the transmission quality of the video stream. Finally, we discuss the obtained measurement results for each particular video streaming quality characteristic. Chapter 11 “Estimation of QoS parameters for RTP/TCP video streaming in WLANs”: In this case study we conduct experiments with the same H.264 encoded video but we use an RTP over TCP packet stream to transmit the video in the experimental network with a different configuration than in the case study presented in Chapter 10. Therefore, the number of the additional background TCP load streams (generated by means of the UniLoG load generator used in combination with the TCP adapter) and the aspect of the TCP fairness in the experimental network becomes important in this case study. We present the obtained measurements results for the video streaming quality characteristics and provide a final discussion which may be interesting for the providers of the video streaming services in practice.

Part V “Results and Conclusions” Chapter 12 “Summary and Outlook”: In this final chapter we summarise the obtained results and emphasize the main contributions of this thesis to the research field of workload modelling and generation in IP-based networks. We complete the thesis with an outlook on potential topics for (our) future work in the end of Chapter 12.

2. Foundations and Research Field The concept of workload is one of the most fundamental aspects in performance evaluation of modern computer and communication systems. While a precise definition of a system’s workload is elusive, a commonly accepted definition considers the amount of requests offered to a system by its users during some specific period of time [Fer72, Fer84]. For instance, if the System Under Test (SUT) is a stand-alone Web server, then its workload consists of all queries submitted to the server during an observation interval. A similar definition of a workload can be found in [MAD04]. A system’s workload is defined as a set of all inputs that the system receives from its environment. In the context of workload for computer networks, the system is usually represented by the communication system serving the requests with their corresponding resource demands. It becomes apparent, that the workload induced in the communication system is mainly characterized by the requests offered to it. So, for the purpose of workload modelling, the requests, which may be created by the environment and handed over to the system, along with their resource demands have to be described precisely. A solid understanding of how users typically behave when interacting with an application and the workload patterns derived from such behaviour helps the researcher to identify not only users’ needs but also application features that are more useful and attractive as well as possible system vulnerabilities. Moreover, the performance optimization, tuning, and management of a system with many clients and complex server and network infrastructures, which is typical of many popular Internet applications like, e.g., YouTube or online social networks, depend heavily on an accurate knowledge of the workload typically experienced by such a system. Therefore, the measurement, characterization, modelling, and generation of real workloads are the key steps driving the design of new cost-effective network applications and services as well as the optimization of existing ones [AlA11].

© Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_2

18

2. Foundations and Research Field

Workload characterization: consists of identifying the basic components that compose the target workload, which depend both on the nature of the target application and on the purpose of the characterization. Workload modelling: consists of building a representation that mimics the real workload under study, based on the identified components. Workload measurement: is a key step to all tasks in performance engineering and relates to gathering representative datasets to support the characterization task, helping the researcher to obtain workload parameters and establish a link between the real workload and its model. Workload generation: consists of the injection of requests, flows, or packets into a network in a controlled manner. At this point, synthetic (or artificial) workload and traffic generators can be used to create synthetic loads and inject synthetic traffic at different interfaces in real or simulated networks according to a workload or traffic model specified by the experimenter. Understanding of the important characteristics of offered workloads can help to improve the design and construction of efficient network systems and mechanisms. Realistic workload models can further support cost-effective capacity planning, network and service dimensioning and management decisions, and support performance analysis and optimization studies. Furthermore, workload and traffic models are very often indispensable for generating synthetic workloads and traffic for experimental purposes during load testing in the networking research community and industry. We discuss this issue in the next section in more detail.

2.1. Workload Modelling and Specification Techniques A large number of workload and traffic models for different sources of load and traffic has been proposed in the networking research community. The studies may differ, among others, in: The type of workload or traffic sources: the characterization, modelling and simulation of network workloads can be related to different (types of) network applications or services, e.g.: • Web workloads have been studied, e.g., in [BMS11, BMS14] in terms of different content and service complexity metrics. Characteristics of the

2.1. Workload Modelling and Specification Techniques

19

resulting Web traffic have been investigated, e.g., in [IhP11] in order to improve the service response time and to evaluate the effectiveness of caching and intermediary systems. The author in [Cha10] used active measurements to obtain a set of different characteristics of Web workloads and traffic in order to assess the efficiency of client side caching for modern Web sites. The study [SAMFU12] discovered potential pitfalls in the analyses and modelling of Web/HTTP traffic, such as the nonconsideration of persistent connections or pipelined requests, and the mismatches between the values reported in the request headers from the actual content type and data volume being transmitted. The authors in [CaM10] analysed the Web traffic intensity and its temporal variability using the Web server logs. • Voice traffic from Voice over Internet Protocol (VoIP) applications using different types of voice codecs has been analysed in a series of studies [MSS05, PEA05, HGB06, MBM09, HHCW10]. The large-scale study in [BMPR10] presents results of VoIP traffic measurements and analyses at a backbone link of a commercial ISP in Italy. • Video traffic from live video streaming applications has been studied, e.g., in [BMW05] using the UDP protocol for the delivery of video packets, or in [BBM10] for the delivery of real-time video streams using the TCP protocol. A large-scale study of video streaming applications in operational networks presented, e.g., in [EGRSS11] may help to understand such important video streaming characteristics as the use of the adaptive bit rate streaming protocols, achieved streaming rates, and details of the user behaviour (e.g., the content popularity or number of cancelled video sessions). The results of the study may be used, e.g., in order to identify potentials for object caching during the delivery of the video content. Further, a number of models for MPEG-like encoded VBR video traffic sources considering different types of frames in the video stream have been proposed [Ros95, SRS03]. • The live (multicast) TV component of the Internet Protocol Television (IPTV) service has been studied, e.g., in [CRCM08, QGL09, GJR11] in respect to the user access patterns, channel popularity or switching dynamics of the users in such a system. Corresponding analyses of the Video on Demand (VoD) service component (where the user access patterns have a direct impact on the performance of the VoD servers) have been presented in [GJCG13]. An extensive empirical analysis of access patterns and user behaviour in a large centralized VoD system at China Telecom has been conducted in an earlier work [YZZZ06]. Further,

20

2. Foundations and Research Field results of statistical analysis and modelling of VoD and VoIP workload characteristics in a nationwide commercial IP network in Korea have been presented in [CSK11].

• Analyses of SMTP and POP3 email traffic have been conducted, e.g., in [OhC05, AcP12]. Further, models to study the evolution of email networks in 3G mobile network scenarios have been presented in [SKR07]. The purpose of the study: one of the major objectives of workload characterisation and modelling studies is the identification and analysis of basic components that compose the target workload. Understanding of the key workload features can significantly contribute to the design and development of efficient network applications, services, and systems. For example, analyses of the video content popularity [YZZZ06, CKR09] can help to improve the corresponding techniques for the popularity prediction and to support the design of more effective caching and content delivery strategies [HLR07, QGL09, EGT11]. Realistic workload and traffic models can be used in performance analysis and optimization studies and support capacity planning and dimensioning decisions for different network applications and services. For example, traffic models for HTTP, FTP, near real-time video streaming, VoIP, gaming, and live video streaming sources have been used in the specifications of standards proposals for different network technologies, e.g., for the performance evaluation of the cdma2000 systems [3GPP2] or the multihop relay system in the IEEE 802.16 broadband wireless access systems [IEEE802.16]. The studies may also focus on the investigation of temporal variations in the network workload and analyses of the distinct hourly, daily and weekly patterns which may be present in the corresponding traffic [SPT07]. On the one hand, when the workload is analysed over a period of great variability and treated as a static snapshot, the analysis will reflect an “average” behaviour which might not accurately describe the workload experienced by the network at any time interval. On the other hand, a sound workload characterization should be performed over time periods of approximate stability, to avoid introducing spurious effects due to the aggregation of multiple workloads [AlA11]. For example, Gaussian traffic models have been used in [Has06] to bound the probability of overload on network links and other network resources, which only take the stationary distribution of the traffic rate into account.

2.1. Workload Modelling and Specification Techniques

21

Further, the workload studies may be concerned with the evolution of network applications and services, which results in permanently changing workload patterns. A thorough understanding of the dynamic workload properties can be exploited in order to optimize the system performance. For example, understanding of the evolution of the user behaviour and workload offered to the YouTube and similar Web 2.0 video sharing services is crucial to evaluate the data rate requirements and scalability of the YouTube (and similar Web 2.0) video sharing sites [AbS10]. Furthermore, the analyses of the distribution of the popular video files suggest that proxy caching of the popular YouTube videos can reduce the network traffic and increase the scalability of the YouTube Web site. The workload modelling studies can also be focused on the identification of qualitative patterns (also called invariants) that may hold across different workloads of the same target application or different applications of the same type (e.g., file transfer, Web traffic, video streaming, etc.) and may provide a valuable and accurate insight into the application design, optimization, and management [FGV06]. Finally, workload and traffic models can be used for the generation of network workloads and the corresponding traffic in network simulations [CCG04, LAJ07] or real network testbeds [BPGP12, BDP12]. The origin of the analysed workload or traffic: for example, workload modelling studies may follow a source-based approach and concentrate on the characterization of traffic generated by different (types of) applications and services running on the single hosts in the network (e.g., [DPRPV08, ViV09]). Or the studies may also consider the aggregated traffic as it appears, e.g., on backbone or high-speed access links, and analyse the effects of superimposition of multiple synthetic traffic sources (e.g., the temporal variability [Has06, LBFE09] and dependency of aggregated traffic characteristics [SPT07]). The modelling methodology: for example, the class of the underlying stochastic model used in the development of the concrete workload or traffic model (see below). The challenging task of workload characterization and modelling is further exacerbated by the problem of limited availability of real representative workloads for the analyses due to privacy restrictions imposed, e.g., by service providers or governmental law authorities. For this reason, measurementbased studies may very often rely on data sampling, thus raising the issue of

22

2. Foundations and Research Field

a possible sampling bias and its implications for accurate workload characterization [MMV05]. Furthermore, workload modelling studies may require large-scale real-world datasets, which may be collected also for different (types of) applications, across different periods of time (cf. the large-scale studies of user access patterns for live (multicast) IPTV [GJR11] or VoD applications [GJCG13]). In such large-scale scenarios, the use of (partially available) local user access patterns would have inherent limitations in the face of the country-wide or global nature of the considered applications or services. Once again, availability of traces with real measurement data is often a strongly restricting factor, so that the researchers may be inclined to rely on simulation models to conduct their design and development efforts [CSK11]. Workload characterization and modelling studies can employ a variety of different modelling techniques, ranging from conventional inferential statistics, to more sophisticated methods using Markov models, MarkovuT06], in addition to modulated processes [Kin90], or arrival curves [K¨ clustering, principal-component analysis, and other data mining techniques [HMS00, Jai91]. Furthermore, there exist a series of very complex stochastic processes aimed mainly at modelling of temporal dependencies in traffic characteristics at different time scales (e.g., Fractional Auto-Regressive Integrated Moving Average (FARIMA) [SSLL09], Fractional Sum-Difference (FSD), or Finite Brownian Motion (FBM), cf. [GrS05]). Generally, the workload characterization and modelling is a very broad research field with a plethora of studies with their respective specific modelling purposes and objectives. However, not every workload model developed in such studies can be directly used for the generation of realistic synthetic network load or traffic for experimental purposes. At this point, a representation of the workload or traffic (model) which can be executed in a workload or traffic generator is of particular importance. Therefore, we recall that one of the major objectives of this thesis is the elaboration of a generally applicable method for the sufficiently formal, precise, and complete description of workload models which would allow one to generate the corresponding real workloads or traffic as a sequence of requests at different service interfaces in networks. The development of workload models for concrete types of applications or services in networks is, however, not in the primary focus of this thesis. In the following sections we present some selected well-known methods proposed for modelling and specification of workloads in the networking research community. Some of these methods may allow to reflect only one specific characteristic (or dimension) of workload, e.g., packet inter-arrival

2.1. Workload Modelling and Specification Techniques

23

time or packet size. For example, the models based on the class of univariate Markov processes [Kin90] or arrival curves [K¨ uT06] can, in general, support only one such dimension. This fact may represent a significant challenge for modelling (in respect to the complexity of the resulting model) because real network workloads do very often possess a number of different (and possibly also dependent) characteristics.

2.1.1. Selected Workload Modelling Techniques In this section we first present the Markov Modulated Poisson Process (MMPP) [Hef80], Poisson Pareto Burst Process (PPBP) [ZNA03], and Batch Markovian Arrival Process (BMAP) [Luc91] model classes which can be used for modelling of network workloads in particular in order to describe the burstiness in the observed traffic. Next, we describe the Hidden Markov Model (HMM) [Rab89] model class which, among others, provides a means to build workload models capable to jointly take into account the first order statistics as well as temporal dynamics and correlation of different network traffic characteristics such as the inter packet time and packet size. Finally, we shortly address the techniques for modelling advanced properties of network workloads such as self-similarity and long range dependence. Markov Modulated Poisson Process (MMPP) Markov Modulated Poisson Process (MMPP) is a generalisation of the Poisson process where the job arrival rate may change over time. The use of MMPP for the modelling of network traffic has been first proposed in [Hef80]. An m-state MMPP can be viewed as m independent Poisson processes where λi is the arrival rate of the ith process. An underlying continuous-time mstate Markov chain determines which of m arrival processes is active, i.e., the one in accordance with which arrivals are generated. After the ith arrival process is activated, it remains active for an exponentially distributed amount of time with mean σi−1 . At the end of the active period, the j thprocess is chosen as the next active process with probability pi,j where j pi,j = 1 and pi,i = 0. So, an m-state MMPP can be characterized by the parameters λi , σi−1 , and pi,j , where i, j = 1, 2, . . . , m. The effectiveness of MMPP as a traffic model and, in particular, its ability to capture the burstiness in the traffic, have been evaluated, e.g., in the context of resource provisioning in Web applications [RCW12]. An important issue is the estimation of parameters of an MMPP such that the job arrivals generated using this MMPP have statistical properties that are

24

2. Foundations and Research Field

similar to those derived from a real trace. Therefore, several algorithms have been proposed in the literature to fit an MMPP to the observed data [HeL86, DeM93, BaF07]. Poisson Pareto Burst Process (PPBP) The use of Poisson Pareto Burst Process (PPBP) for the modelling of aggregated traffic as it appears, e.g., on backbone or high-speed access links or in the Internet, has been first proposed in [ZNA03]. PPBP allows to specify multiple overlapping bursts whose lengths follow a heavy-tailed (Pareto) distribution. The authors in [ZNA03] presented methods to map the parameters of the PPBP to the set of measurable network traffic characteristics, described a technique for fitting the PPBP to a given traffic trace, and showed the ability of PPBP to accurately predict the queueing performance of a sample trace of aggregated Internet traffic. Batch Markovian Arrival Process (BMAP) The (continuous-time) Batch Markovian Arrival Process (BMAP) was proposed by Lucantoni [Luc91] as a generalization of the (simple) Markovian arrival process (introduced, e.g., in [LMN90]) by allowing more than one arrival at a time. The use of the BMAP model class for the development of analytically tractable models for aggregate IP traffic focusing on the burstiness and self-similarity properties is presented, e.g., in [KLL03]. The use of a discrete time version of the BMAP process model (called dBMAP) to characterize the long-range dependence present in traffic traces of aggregate link traffic has been proposed, e.g, in [SPV04]. The proposed model jointly characterizes the packet arrival process and the packet size distribution of IP traffic. In particular, packet arrivals occur according to a discrete-time Markov modulated Poisson process (called dMMPP) and, each arrival is further characterized by a batch whose size has a general distribution that may depend on the phase of the dMMPP describing the packet arrival process. The authors developed a parameter fitting procedure that is capable of achieving accurate replication of queuing behaviour for IP traffic exhibiting long-range dependence. Hidden Markov Models (HMM) Hidden Markov Models (HMM) have been used, e.g., in [SaV01, WWT02] for modelling the states of packet channels using the corresponding loss probabilities and end-to-end delay distributions. Further, a specific HMM has

2.1. Workload Modelling and Specification Techniques

25

been proposed in [DPRPV08] and used for the packet-level characterization of the network traffic in terms of the Inter Packet Time (IPT) and Packet Size (PS) stochastic processes. The authors in [DPRPV08] followed a source-based approach, i.e. sessions of traffic generated by different network applications and services running on single hosts have been analysed separately (and the proposed models do not focus on aggregated traffic which can be observed, e.g., on backbone or high-speed access links). A Hidden Markov Model can be defined as a probabilistic function of a (hidden) Markov chain and is composed of the following two variables: • The hidden-state variable xn , whose temporal evolution follows a Markovchain behaviour. The state at discrete time n is represented by xn ∈ {s1 , . . . , sN } where N is the number of states. • The observable variable yn , that stochastically depends on the hidden state. The observable at discrete time n is represented by yn ∈ {o1 , . . . , oM } where M is the number of observables. The authors of [DPRPV08] adopted a specific HMM with the discrete random state variable xn introduced to account for memory and correlation phenomena between IPT and PS, which are assumed to be statistically independent given the state. The observable variable is a continuous bidimensional vector yn = [dn , bn ]T where dn and bn describe the IPT and the PS for the nth packet, respectively, and are specified by means of conditionally independent (given the state) Gamma distributions. The proposed model allows to capture important joint dynamics (in terms of both marginal distributions and auto- and cross-covariances) of IPT and PS and remains still analytically tractable. Further, the model capabilities of learning, generation, and prediction have been evaluated and concrete realistic packet-level models have been constructed from the automated analysis of empirical traffic traces. The approach has been applied to traffic traces of various application-layer protocols and services, e.g., Simple Mail Transfer Protocol (SMTP), HTTP, an online network game, and an instant messaging application. The obtained models have been validated by comparing the synthetically generated sequences of IPT-PS pairs with the corresponding values from the original traces. The experimental investigation conducted by the authors revealed that the proposed HMM-based models can provide acceptable results with a moderate number of states (N = 5 for SMTP and HTTP models, or N = 4 for online gaming and instant messaging models).

26

2. Foundations and Research Field

Modelling of Advanced Characteristics of Network Workloads Investigation of the effects of advanced traffic properties such as traffic variability (i.e. fluctuation of traffic characteristics as a function of time) has been a subject of a large number of studies in the networking research community (cf. [WTSW95, CrB96, SPT07, CMCS08, FCS08, LBFE09, SSLL09], to name a few). High variability in traffic may have, under certain conditions, a significant impact on the network performance [LTWW94, ENW96] and its understanding can help to improve the efficiency of different network techniques such as traffic-control mechanisms and QoS schemes [SPT07, LBFE09]. One of the reasons for the high variability in traffic could be the long-range dependence (LRD) property of the traffic process (if such property can be observed in the concrete traffic trace). In general, a (weakly) stationary discrete-time real-valued stochastic process X = {Xn , n = 0, 1, 2, . . . }, with 2 2 mean μ = E[X ∞n ] and variance σ = E[(Xn − μ) ] < ∞, is long-rangedependent if m=1 r(m) = ∞, where r(m) measures ∞ the correlation between samples of X separated by m units of time. If m=1 r(m) < ∞, then X is said to exhibit short-range dependence (SRD). Several possible causes of correlation and LRD in the aggregated IP traffic have been identified, such as the inherent structure and interactions of protocol layers [MiG98] or the superimposition of traffic sources with heavy-tailed distributions of the transfer durations [CrB96, WTSW95], the latter being sufficient for the generation of self-similar traffic. Self-similar processes are often used to build models of traffic which possess the LRD property. Self-similarity in the context of network traffic refers to the scaling of variability (i.e. burstiness) in traffic. A time series X = {Xt , t = 1, 2, . . . } is mt d said to be exactly second-order self-similar if Xt = m−H i=m(t−1)+1 Xi for d

H ∈ R, 1/2 < H < 1 and ∀m > 0 where = means equality in distribution and m is the time lag [CrB96]. The parameter H (which is called the Hurst parameter) measures the degree of self-similarity for the random processes used for modelling network traffic and represents, basically, a measure of the speed of decay of the tail of the autocorrelation function. The definition suggests a simple test for self-similarity in  network traffic, called the variancemt time plot. In such a test the variance of i=m(t−1)+1 Xi is plotted against m on log-log axes, where the Xi are measurements of traffic in bytes or packets per time unit. Linear behaviour of the plot with the slope greater than −1/2 suggests non-trivial self-similarity in the random process used for

2.1. Workload Modelling and Specification Techniques

27

traffic modelling. For further details on the self-similar processes we refer the reader, e.g., to [GrS05]. We should emphasize, that in this thesis we follow rather the source-based approach to workload modelling. According to such approach, the workload model aims to characterize and analyse separately several sessions of traffic generated by different applications or services on single network hosts and does not focus on the aggregate link traffic (as it may appear, e.g., on backbone or high-speed access links or in the Internet). The possible effects of the superimposition of multiple synthetic traffic sources, e.g., the presence of self-similarity or long range dependence in the aggregated synthetic traffic, is outside of the scope of this thesis.

2.1.2. Selected Workload Specification Techniques In the following we will introduce the user behaviour graphs, finite state machines, and timed transition automata as a possible means for the specification of user behaviour models. User Behaviour Graphs The user behaviour graphs have been proposed by Ferrari [Fer84] in order to describe workloads offered to an interactive communication system whose performance can be analysed by a product-form closed queueing network model satisfying the conditions of the BCMP theorem [BASK75, Kin90]. The basic component of the workload were the possible types of user commands or interactions of the terminal users with the system. The offered workload has been described as a set of partially overlapping sequences of commands issued by terminal users. In order to describe the behaviour of each of the m interactive terminal users of the system a probabilistic user behaviour graph has been introduced in [Fer84]. Each node in the graph represents an interactive command type, with the exception of node 0, which is the “dormant node”. Users who are not using the system reside in node 0. When a terminal session starts, the state of the terminal user becomes 1 (the “login node”). During the session, different commands are executed by the user and the corresponding nodes of the graph are visited following the arcs of the graph. At the beginning of each terminal time period, a terminal user chooses the next command based on the probabilities from the user behaviour graph. The R different possible command types modelled in the request nodes in the graph are modelled as R different classes of customers in the queueing network.

28

2. Foundations and Research Field

It should be noted, that the interactive workload to be modelled and the workload model to be constructed have been assumed to be stationary in [Fer84]. This means that the desired workload model is not intended to reproduce any particular dynamic variations in workload characteristics and aims at reproducing the approximately similar time-invariant distributions of characteristics of the original workload. The representation of the workload model by means of user behaviour graphs has been used in [Fer84] in order to analyse the problem of reducing the number of command types that appear in the workload model while preserving their relative frequencies of occurrence. The author has shown, that the workload model resulting from a simple aggregation of command types in command classes (which may ignore the existing sequential dependencies among different command types) may be – under certain conditions such as steady-state assumption, product-form queueing model – sufficient to generate workloads with similar characteristics as in the case when the sequential dependencies are considered. User behaviour graphs have been used, e.g., by Calzarossa [CMT90] or later by Menasc´e [Men03, MeV00] as customer behaviour model graphs (CBMGs) and customer visit models (CVMs) for the modelling of workloads induced by Web e-business applications. Finite State Machines Finite state machines provide a simple and straightforward technique for the specification of workload models because they allow one to directly represent the waiting of the entities for certain events (inputs), and their reaction to them (outputs) including the transition to a successor state. A finite state machine1 can be defined (cf. [K¨ on12]) as a quintuple (S, I, O, T, s0 ), where • S is a finite, non-empty set of user states (|S| < ∞), • I a finite, non-empty set of inputs, • O a finite, non empty set of outputs, • T ⊆ S × (I ∪ {τ }) × O × S a state transition function, and • s0 ∈ S is the initial state of the automaton. 1 The

terms finite state machine and automaton are used synonymously in the following.

2.1. Workload Modelling and Specification Techniques

29

A transition t ∈ T is defined by the quadruple (s, i, o, s ) whereby s ∈ S denotes the current state, i ∈ I ∪ {τ } an input (event), o ∈ O the associated output, and s ∈ S the successor state. Note that τ ∈ / I is a special event which designates an empty input and can be used for modelling spontaneous transitions to describe internal events. The execution of the transitions takes place simultaneously. It should be noted that finite state machines allow to describe only the functional control flow, e.g., the sequences of requests submitted by the service users at a service interface. However, the automaton may become too complex (in terms of the number of required states) in case when changes in the data structures (e.g., the modifications of values of request attributes) or timing aspects are to be represented. Therefore, different extensions of the basic automaton concept have been proposed in order to be able to adequately represent the data flow and the timing aspects. The concept of finite state machines has been used as a foundation for the user behaviour automata introduced by Wolfinger in [WoK90]. In Sec. 3.2 of this thesis, we present a generalization of the basic concept of user behaviour automata and propose a set of extensions which are required for the adequate representation of data flows and timing aspects during the specification of network workload and user behaviour models. Timed Transition Automata The concept of timed transition automata has been used, e.g., in [MJS08] for the specification of possible user interactions in structured interface environments like, e.g., Web applications and Web services. The domain of the action types available to a user in such an environment has been described by a state transition diagram extended with time constraints and each possible type of action is represented by a state transition label in the automaton. A timed transition automaton (TTA) can be defined (cf. [AlD94]) as a tuple (Σ, S, s0 , C, E, F ) where • Σ is a finite alphabet, • S is a finite set of states, • s0 is an initial state, • C is a finite set of clocks, • F ⊆ S is a set of final acceptance states,

30

2. Foundations and Research Field

• E ⊆ S × S × Σ × 2C × Φ(C) defines the transition table for the automaton. Each transition e ∈ E is a quintuple e = (s, s , a, Λ, δ) representing a transition from state s into state s on input symbol a which can occur at a certain time τ when clock constraint δ is verified by the current values of clocks. The transition also resets to 0 the clocks from the subset Λ ⊆ C of clocks. A TTA is able to recognize timed words, i.e. a finite sequence of pairs [(a0 , τ0 ), . . . , (ak , τk )] made by symbols ai ∈ Σ∗ over a given alphabet Σ and time values τi ∈ R for i ∈ [0, k] with τi ≤ τi+1 with i ∈ [0, k − 1]. The pairs in the sequence can be seen as a sequence of log records, describing user actions or events annotated with the time in which they occurred. The TTA has been used in [MJS08] to specify the timing constraints for the user actions to be executed only when some certain time conditions are met (e.g., submitting a reply from a server search engine within a given interval of time). A domain automaton can then be defined in order to represent the legal sequences of user actions which can occur in the system.

2.2. State-of-the-Art in Workload Generation Generation of realistic synthetic workload and traffic is very often required for experimental activities in networking research. The corresponding workload and traffic generators can be implemented as hardware or software platforms or include both hardware and software components. Commercial hardware platforms are typically able to reach a high degree of performance and precision and are usually provided with detailed datasheets containing certified specifications of the supported precision and performance characteristics (e.g., packet and data rate). Therefore, due to their reliability, the hardware-based platforms can be indispensable, e.g., for performance, capacity, and stress tests of different network hardware appliances and devices (such as switches, routers, firewalls, IDPSs, etc.). For example, Spirent AX/4000 [AX4000] is a large-scale feature-rich high-performance hardware traffic generator with a modular, multi-port architecture capable of testing access, mobile backhaul, routing, multicast, switching, MPLS and other technologies in Asynchronous Transfer Mode (ATM), IP, Frame Relay and Ethernet networks at speeds up to 10 Gbps. The platform provides a set of different conformance test suites for a number of protocols and a set of corresponding traffic models. Hardware appliances (such as IXIA Optixia series [IXIA], Agilent/HP 1735A LAN Protocol test modules, or Napatech [Napatech] devices) can also perform a trace-replay,

2.2. State-of-the-Art in Workload Generation

31

i.e., inject traffic from a trace captured on real network links. However, as the corresponding models and the trace replay capability are implemented in hardware, introducing new features is rather difficult and the approach may not provide a sufficient flexibility for selected test scenarios. In particular, hardware-based traffic generators can hardly be deployed on a large number of nodes (mainly due to economical reasons), which may in some way limit their applicability in tests to be performed with complex workloads in largescale networks or testbed scenarios in order to be representative of the reality. In contrast to a limited flexibility of hardware-based generators, softwarebased generators typically allow a very easy configuration of the traffic stream to be generated (often with graphical interfaces) and can be rapidly modified and extended for a specific research purpose. Such, new features, statistical models, support of additional protocol stacks, new operating systems and hardware platforms can be added and the tools can be more easily deployed onto a large number of network nodes in order to reproduce distributed scenarios. Moreover, when executed on top of real operating systems and network protocol stacks, the software-based generators may allow to perform more realistic experiments and to test concrete implementations of different protocol mechanisms [BDP10]. However, the software-based platforms inherently rely on the used hardware (which may be intentionally chosen to be commodity or Commercial off-the-shelf (COTS) hardware for economical reasons), the adopted operating system (which may provide explicit real-time extensions), and the software configuration of the host(s) used for traffic generation. For this reason, the accuracy, precision and performance characteristics of the traffic generation process can strongly vary among different software-based generators. In the following, we will concentrate on the software-based workload and traffic generators. Generally, software-based workload generators can be classified according to the modelling methodology which has been applied in the underlying workload model and implemented in the generator, while the particular differentiating factors may be, e.g.: Abstraction level: is determined by the types of objects and entities considered in the underlying workload model. For example, in case of applicationlevel model of Web traffic, the HTTP request/response pairs exchanged between the Web client and the Web server may be such entities. On the flow-level, the traffic can be described by means of flows (identified by the IP address and port number of the sender and the receiver and

32

2. Foundations and Research Field

the number of the transport protocol to be used, e.g. TCP, UDP, or Stream Control Transmission Protocol (SCTP)) with the specified number of packets, bytes, and duration. On the packet-level, the traffic may consist of packets characterized by means of stochastic variables for the distribution of the packet inter-departure times and packet sizes. Generation method: the two major approaches to generate synthetic network workload are trace-based and analytical model-based methods which can be applied at different abstraction levels (see above). Because of the well-known strengths and weaknesses of these two approaches, workload generators can combine both techniques in order to achieve a higher degree of flexibility. Recall that trace replay provides a simple, straightforward technique to inject traffic with almost arbitrary application payload pattern and may be very useful in situations when traffic to be generated is not responsive to changing network conditions. However, the experiments may be limited to the concrete available traces and their characteristics. Further, relevant traffic traces may be subject to privacy restrictions imposed by service providers and are hardly available for the purpose of testing (while storing such traces may be officially forbidden). As opposed to the traced-based techniques, stochastic models may provide the required flexibility. However, the decisions which relevant properties of the real workload are to be reproduced and the correctness and validity of the workload model for the specific scenario must be proven in order to produce sufficiently realistic workloads. Open-loop versus closed-loop generation: this feature characterizes the ability of the workload generator (and the underlying workload model) to appropriately respond to changing network conditions as emphasized in [FlP01]. In the open-loop mode the generator operates independently from the observations of the network conditions that must be performed during the workload generation. In the closed-loop mode the tool is able to change its behaviour during the workload generation according to these observations and to appropriately modify the characteristics of the traffic to be generated (e.g., to adjust the parameters of the statistical distributions of the inter-departure times and sizes of packets, or to change the content of the packet payloads). Application field: can the traffic generator be used in the network simulation (or emulation) environments and/or in the real network testbeds? Is the

2.2. State-of-the-Art in Workload Generation

33

generated workload more appropriate to analyse the characteristics of the network or the characteristics of the used applications (e.g., a Web application server)? Further, the software-based workload and traffic generators may also differ in their architectural features, e.g.: Target service interface: a target service interface (or a set of service interfaces) at which the workload generator is able to inject the generated requests or traffic. Strictly considered, the target service interface is not to be confused with the abstraction level of the underlying workload model used in the generator. For example, Web traffic can be generated according to an application-level model of HTTP traffic sources and injected at the application layer HTTP service interface into the network (such as in Surge [BaC98]). Alternatively, Web traffic can be generated according to a model which incorporates a set of application-level, flow-level, and packet-level characteristics of the traffic induced by the HTTP sources and, thereafter, be injected at the network layer (such as in Harpoon [SoB04] or LiTGen [RRB07a, RRB07b]) or transport layer (e.g., Swing [ViV06, ViV09]) service interface. Software and hardware co-design: does the generator architecture make use of dedicated hardware components like, e.g., the Intel IXP2400 Network Processor (NP) [IXP2400] as it is the case in BRUNO [APF08a, APF08b] or Pktgen [BBCR06]. User-space versus kernel-space: the generator architecture may consist of only user-space modules (e.g., MGEN [MGEN] or D-ITG [AEPV05]), only kernel-space modules (e.g., KUTE [ZKA05]) or include both user-space and kernel-space modules (as it is the case with the generator proposed in [BPGP12]). Distributed workload generation: does the generator provide only centralized workload generation function or is it able to produce and inject workloads from geographically distributed hosts (e.g., D-ITG, or LoadStorm [loadstorm])? Scalability on multi-core platforms: does the generator architecture make use of parallelism and is it able to appropriately exploit the multi-core processor architectures and multi-queue Network Interface Cards (NICs) [BPGP12]?

34

2. Foundations and Research Field

Performance characteristics: The performance of a traffic generator can be characterized, e.g., by the maximum achievable packet and data rate for a given packet length. Further, according to the definitions in Paredes-Farrera et al. [PFFG06], the term of precision is related to the quality and stability of the system, while the term of accuracy is related to measurements of the similarity among the created values with the true ones. Therefore, when one is interested in timeliness of generated packets, the precision can be referred, e.g., as the standard deviation of generation times, while the accuracy can be described by the average errors between the actual and specified generation times. For example, the experimenter may be interested in the ability of a particular traffic generator to saturate the capacity of a 1 Gigabit or a 10 Gigabit Ethernet link (also with the smallest possible 64 byte long Ethernet packets). Conditions and availability: is the workload generator available as a commercial tool or is it freely available, may be also as an open source tool? The following list of workload and traffic generators is not pretended to be complete. In contrast, we tried to choose the most representative examples of traffic generators for networking research in order to be able to demonstrate the different possible approaches followed by various solutions. Related surveys of the traffic generation tools available for networking research can also be found, e.g., in [AEPV05], [BDP10], or [BDP12]. At this point we should emphasize, that the workload modelling and generation approach proposed in this thesis is strongly oriented on the target service interface for workload generation. This target service interface for load generation is to be chosen by the experimenter strongly according to the objectives of the particular workload study being carried out. For this reason, we decided to arrange the list of existing traffic generators according to the target service interface, at which the generated traffic is injected. We start with the generation of Web traffic at the application level HTTP service interfaces.

2.2.1. Web Workload and Traffic Generation Software-based Web workload and traffic generators are based either on traces reflecting real Web user sessions or on workload models that are designed and implemented to generate HTTP requests. Floyd and Paxson demonstrated in their study [FlP01] how difficult it is to generate representative Web requests,

2.2. State-of-the-Art in Workload Generation

35

especially when some particular characteristics in a dynamic Web site should be modelled, and how these characteristics impact on the behaviour of the Web clients. One of the first studies trying to identify the common characteristics in Web server workloads is the work done by Arlitt and Williamson [ArW97], which used logs of Web server accesses at six different sites (three from university environments, two from scientific research organizations, and one from a commercial Internet service provider). The observed workload characteristics were used to identify the possible strategies for the design of a caching system to improve Web server performance. Web workload generation at the application-level (emulated Web clients, real Web servers) Barford and Crovella [BaC98] applied a number of observations of Web server usage to create a realistic Web workload generation tool, called SURGE (Scalable URL Reference Generator) which mimics a set of real users accessing a server and generates Uniform Resource Locator (URL) references matching empirical measurements of request and server file size distribution, relative file popularity, embedded file references, temporal locality of reference, and idle periods of individual users. The relevance of these Web workload characteristics as well as their concrete values were identified based on single (non-recurring) measurements, so that later revisiting done by Williams et al. [WAW05] was required due to emerging Web technologies and a nearly 30-fold increase in overall traffic volume in 2005. The study [ACC02] proposes different benchmarks (partly based on the TPC-W benchmark [TPC-W] which has been declared obsolete in 2005) to be used for online book-store applications, auction sites, and bulletin boards with dynamic Web content. The benchmarks use a real Web server infrastructure (consisting of a Web, application, and a database server) and specify a predefined set of Web pages and database items which can be requested by the experimenter from the client side. Along with the real Web server application objects, the authors provided a freely available workload generator tool (a Web client emulator) to drive a dynamic content Web server with various workloads specified in the benchmarks. Following the TPC-W specifications, the workload generated by the client emulator consists of a specified number of concurrent clients and their interactions with the SUT. Each emulated client opens a session (which is a persistent connection) with the SUT and repeatedly makes a request, parses the server’s response to the request, and, after emulating the specified amount of time

36

2. Foundations and Research Field

(“thinking time”) of a real client, follows a (hyper-)link embedded in the response. The tool uses a simple state machine with a transition probability matrix to determine the next link (contained in the server response) to be followed in the automaton. A state in the transition matrix corresponds to a particular interaction of the SUT and the Web page while a transition corresponds to clicking on a link in the page. Different system utilization statistics can be collected on the machines belonging to the SUT including, e.g., the throughput and response time statistics, and utilization of Central Processing Unit (CPU), memory, network and disk for the duration of the experiment. The tool has been used for different research studies on dynamic Web content generation, clustering, caching, and Web application server design. GUERNICA [OSPG09] is a Web workload generator with the ability to precisely generate the dynamic workload of Web 2.0 by implementing the Dweb model introduced in [OSPG05]. The underlying model makes use of the customer behaviour model graphs proposed by Menasc´e et al. [MeV00] and is based on the following three main concepts: 1) navigation, which defines the behaviour of a single dynamic Web user interacting with the Web server(s) and is specified as a sequence of URLs for HTTP requests where each visited URL depends on the previously visited one, 2) workload test, consisting of the set of navigations launched during the simulation process which can be executed concurrently, and 3) workload distribution, which refers to a set of workload tests that are concurrently executed by one or more generators in different nodes or in different machines when simulating the Web client’s behaviour. GUERNICA consists of three main components implemented as Web applications using the Web services technology: the workload generators, the performance evaluator, and the performance tests planner. These components allow to carry out the workload test process consisting of the following four steps: 1) defining the client behaviour by using the navigation concept, 2) defining the workload of the target site by using the workload test and the workload distribution concepts, 3) executing the workload tests gathering performance statistics, and 4) analysing the performance of the target site on the basis of the obtained statistics. Further, in order to obtain concrete sequences of users’ navigations, an external Mozilla plug-in has been integrated in GUERNICA to capture the URL requests from the users in the Mozilla browser. The following two tools are examples of commercial Web workload generators performing an application-level trace replay of Web/HTTP traffic recorded from real browser user sessions (e.g., in the HTTP archive format

2.2. State-of-the-Art in Workload Generation

37

[Odv15]). LoadStorm [loadstorm] is a cloud-based platform for load testing of Web applications and (mobile) Web services. The tool is provided with the large supplemental set (a “cloud”) of dedicated load generation machines and allows to perform the generation of Web traffic also from geographically very strongly distributed Web clients (e.g., hosts located in the USA, Ireland, Singapore, and Tokyo can participate in the same load experiment, provided there are dedicated LoadStorm cloud hosts available in these regions). In order to mimic the behaviour of real users, LoadStorm relies on the recordings of user interactions which can be captured using the developer tools of the browser and stored, e.g., in the HTTP archive (HAR) format (cf. [Odv15]). The recording contains every request made by the browser (including Hypertext Markup Language (HTML), Cascading Style Sheet (CSS), images, Javascript, and Asynchronous JavaScript and XML (AJAX)) and can be customized for each individual virtual user to be emulated using the advanced user interface. For example, the experimenter can specify customized test data and think times between subsequent requests, time-outs for different types of object requests, user names and passwords, custom query strings, and provide different application security identifiers (CSRF tokens, SessionIDs, hidden input fields, etc.). The tool includes a scenario builder to specify different actions of virtual users like open a new page, click a specific link, click a random link, or submit a form. Finally, LoadStorm provides reporting of key performance characteristics (such as the number of active users, throughput, requests rate, response time, error rate, etc.) and in-depth request error analysis during the test (of errors captured, e.g., from response status codes, request time-outs, and server connection problems). WAPT Pro is a tool for workload, stress, and performance testing for Web applications provided by the SoftLogica Inc. http://www.loadtestingtool. com. The procedure of performing the load tests is similar to LoadStorm, i.e. the experimenter constructs the test by navigating through the Web site in the browser to record a user session. Each session is recorded to a virtual user profile as a sequence of HTTP requests. WAPT provides an extended framework for editing the properties of every particular request in the profile (e.g., request headers, page elements, and other options) and can then replay different profiles with a specified number of virtual users (also considering the specifications how the number of virtual users changes during the test). Furthermore, it includes capabilities to perform testing from different geographical locations using a number of load agents. The tool automatically generates cookies and session variables for correct user sessions, supports testing secure HTTPS web sites with different types of

38

2. Foundations and Research Field

user authentication and client certificates, and provides detailed reports on different performance characteristics and errors after the test completion. Web workload generation at the application-level (emulated Web clients and Web servers) In the context of a comprehensive study [SCK03] a set of models has been derived from an analysis of the content from six representative news and e-commerce sites. The models capture the characteristics of dynamic Web content both in terms of independent parameters (such as the number of objects, distribution of the object sizes and object freshness times) as well as derived parameters (such as content reusability across time and linked documents). The authors proposed a Java-based dynamic content emulator (DYCE), which emulates a Web server that serves dynamic Web content. The emulator uses the proposed models to generate parameterizable server-side include-based dynamic content and serve requests for the whole documents or separate objects being requested (e.g., from an idealized Web cache simulator provided by the authors for the validation of DYCE). Further, it uses delay models from previous research in order to replicate the appropriate delays induced by the dynamic content generation [ICDD00]. In comparison, e.g., to SURGE [BaC98] which has been designed to model client access patterns to static Web pages, DYCE focuses on the complementary goal of emulating the behaviour of the Web server, both in terms of its workload properties and the nature of the dynamic content itself. ParaSynTG [KRL08] is a synthetic trace generator for source-level representation of Web traffic with different characteristics such as document size and type, popularity (in terms of frequency of reference), temporal locality, and the fraction of dynamic requests and of requests been requested only once (“one-timers”). ParaSynTG is able to consider the dependency between the size of the documents and their frequency of reference as well as between the type of documents and their size. The tool has been designed for the generation of synthetic Web workload traces only (which can be used, e.g., in simulation experiments) and, in the opposite to the design objectives of our Web workload generator UniLoG, provides no facilities to generate and to inject real HTTP requests into the network. Generation of Web-like traffic at the packet-level or flow-level The Web traffic generators presented above have been designed with the primary goal to generate the workload for a Web service or a Web ap-

2.2. State-of-the-Art in Workload Generation

39

plication (which are hosted at a Web server or a number of Web servers and may involve additional application and database servers). Therefore, these generators attempt to include more application details and follow a “page-based” approach, i.e. they explicitly consider the Web page structure, the location of page components on the server(s), the human actions of thinking and page selection to control the creation of new HTTP requests, etc. Such page-based methods can also be used when the researcher’s aim is to generate a “Web-like” traffic at the transport layer interface (e.g., at the TCP service interface) as it has been done e.g. in [BaC98] or in [LAJ07]. For example, the Web traffic generator used in [LAJ07] in order to study the effects of Active Queue Management (AQM) and Explicit Congestion Notification (ECN) techniques on Web performance consisted of a program to emulate client-side user actions (the “browser”) and a server-side program to respond to client generated requests (the “server”). The client and the server communicate by means of the TCP socket interface using the socket operations connect(), send(), and recv(). For each request, the client generates a message of random size sampled from the request size distribution and sends this message over the network to an instance of the server program. The message specifies the number of bytes the server has to return as a response (which is determined according to the distribution of response sizes separately for top-level or embedded request). The server generates a message of the specified size and transmits it back to the browser. Despite a relatively comprehensive model for the HTTP source used by the client-side “browser”, the resulting test traffic remains to be an HTTP-like TCP traffic, because the tool does not set the HTTP request headers in the generated messages. Such HTTP-like traffic may be sufficient in [LAJ07] for the performance evaluation of AQM and ECN techniques (which are both QoS mechanisms employed at the network layer) but it will be not suitable for the performance evaluation of, e.g, Web proxies and caches at the application layer. PackMime-HTTP [CCG04] is a tool for generating realistic synthetic Web traffic in network simulations using the source-level models for aggregated HTTP traffic proposed in [CCG04] and implemented as the corresponding objects in the ns-2 network simulator. Aggregated HTTP traffic (as it appears, e.g., on backbone or high-speed access links) is described as a collection of independent TCP connections, each characterized by a set of source variables: arrival time of the connection, round-trip time for the client and for the server, number of request/response exchanges, time gaps between exchanges, sizes of individual requests and responses, and server delays.

40

2. Foundations and Research Field

The authors argued, that such a “connection-based” approach for modelling of HTTP traffic is able to capture relationships and significant dependencies in the collection of the source variables which were not considered in the existing page-based models. Further, the approach is more likely to scale to modelling the traffic generated by other application classes and different application traffic mixes (provided that the applications use TCP for transport, e.g., file transfer, Internet video streaming, instant messaging, peer-to-peer file sharing)2 . Therefore, the authors recommend to use their connection-based approach for the generation of synthetic Web traffic carried by network links, routers, and protocol stacks (i.e., in the “traffic for the network” scenarios). In PackMime-HTTP a lot of effort has been spent by the authors on the ability to consider different network and protocol characteristics (such as the round-trip times for the client and the server, link capacities and error rates, and the dynamic TCP interactions between Web clients and servers). As a consequence, not only the Web clients and Web servers but also the other components of the network under study have to be modelled. For example, the authors in [CCG04] had to provide the interaction of the proposed PackMime-HTTP model with the TCP layer objects in the ns-2 network simulator. Therefore, the application field of the traffic generators following the approach of PackMime-HTTP in [CCG04] or Swing in [ViV09] can be assumed to be rather restricted to scenarios, which are similar to those covered by the network simulation experiments. Finally, Harpoon [SoB04] and LiTGen [RRB07a, RRB07b] are open-loop generators of aggregated network traffic at the flow level which are able to generate traffic also from HTTP sources. We will describe these solutions later in this section, as they are, strictly considered, generators of IP traffic at network layer service interfaces.

2.2.2. Traffic Generation at Transport Layer Service Interfaces MGEN The Multi-Generator (MGEN) is an open source software developed by the Naval Research Laboratory (NRL) PROTocol Engineering Advanced Net2 The

connection-based approach proposed in [CCG04] has been later used in [ViV09] in order to generate realistic and responsive network traffic consisting of mixes from different application classes. The corresponding traffic generator Swing developed in [ViV09] will be presented later in this section (because it is, strictly considered, a generator of TCP traffic streams).

2.2. State-of-the-Art in Workload Generation

41

working (PROTEAN) Research Group which provides (in its current version 5.0) the ability to generate, receive, and log real-time traffic patterns of unicast and/or multicast UDP and TCP applications in order to perform IP network performance tests and measurements [MGEN]. The tool suite currently runs on various Unix-based (including MacOS X) and Win32 platforms, is implemented in user-space, and can also be used in network simulation environments like ns-2 and Opnet. Traffic generated by MGEN consists of a series of sequence-numbered messages with different sizes and inter-departure times determined according to a traffic pattern specified by the experimenter. Currently, MGEN supports the pattern types PERIODIC, POISSON, BURST, JITTER, and CLONE (the latter allows to extract the message sizes and/or inter-message times from a trace file in the binary tcpdump format). Further, script files are used in order to control the generated loading patterns over the course of time. Finally, MGEN log data can be used to calculate performance statistics, e.g., on throughput, packet loss rates, and communication delay. However, the performance of MGEN is reported to be rather low [DBP07, BDP10]. Such, the maximum achievable packet rate for small packets of 64 byte length remains below 80.000 pps. Further, the accuracy of the message inter-departure times may be violated when the precise option is disabled (which activates polling, if needed, to precisely schedule the message inter-departure times). RUDE/CRUDE RUDE/CRUDE (Real-time UDP Data Emitter / Collector for RUDE) is a small and flexible user-space generator of UDP traffic which can be received and logged using the corresponding collector module [RUDE]. The development of RUDE was motivated mainly by the accuracy limitations in the MGEN traffic generator due to the used low-resolution system timers in the Linux kernel on PC-platforms (the precise option was not available in MGEN at that time). Therefore, the operation and configuration of RUDE are very similar to MGEN. The tool can generate and measure only UDP traffic, is provided with a non-extensible script language to control the generated traffic patterns over time, and is not suitable to work at high packet rates, especially with small frame lengths (cf. [BGPS05, BDP10]). The RUDE project seems to be not longer supported (since the last release 0.70 in 2002, cf. [RUDE]).

42

2. Foundations and Research Field

ITG / D-ITG The Internet Traffic Generator (ITG) has been introduced in [APV04, AEPV04] with the aim to generate (network, transport, and application layer) traffic at packet level and accurately replicate appropriate stochastic processes for both inter departure time (IDT) and packet size (PS) random variables. For this reason, ITG supported a set of different statistical distributions (e.g., exponential, uniform, constant, Pareto, Cauchy, normal, etc.) and has been first used to generate synthetic UDP and TCP traffic according to the source-level models for different application-level traffic sources, e.g., Telnet, SMTP, Network News Transfer Protocol (NNTP), FTP, HTTP, Domain Name System (DNS), VoIP, Video, etc. In [AEPV05] a distributed platform for traffic generation (called D-ITG) has been developed on the basis of ITG in order to increase the number of application and traffic scenarios and improve the scalability and performance of the original centralized ITG traffic generator. D-ITG provides facilities for measurement of different traffic parameters at the packet level (like delay, jitter, packet loss and throughput). In the first variant of the proposed architecture, a log server is in charge of recording the information transmitted by senders and receivers and the required communication is based either on TCP or UDP. In the second variant senders and receivers make use of a Message Passing Interface (MPI) library to implement a control channel. As of [AEPV05], D-ITG was provided with a set of classical packet-level models for different traffic sources, e.g.: • TELNET, NNTP, SMTP, and FTP traffic sources [DJCME92, Pax94, LFJ97], • WWW traffic [CrB96, ArW97, LFJ97], • VoIP traffic [LFJ97, Cisco2], and • MPEG encoded video streams [GaW94, KrH95, Ros95, LFJ97]. However, the aforementioned models could probably be seen as rather out-dated already at the time of introduction of D-ITG. For this reason, a Hidden Markov Model (HMM) for Internet traffic sources at packet level has been proposed in [DPRPV08], which allows to jointly analyse Inter Packet Time (IPT) and Packet Size (PS) stochastic processes. The model is able to capture the behaviour of marginal distributions, mutual dependencies, and temporal structures of the traffic generated by a heterogeneous set of sources and can be used for traffic generation in D-ITG. According to

2.2. State-of-the-Art in Workload Generation

43

the source-based approach, the model does not focus on the aggregate link traffic but aims at replication of separate traffic sessions originating from single hosts and related to specific application-level protocols. The proposed approach has been applied to various real traffic traces in order to obtain concrete source models of packet-level traffic generated by SMTP, HTTP, a network game (“Age of Mythology”), and an instant messaging application (MSN Messenger). According to the experimental results presented in [BDP10], D-ITG offers a quite moderate performance and accuracy. Such, the tool can achieve a maximum packet rate of ca. 140.000 pps, meaning that it is not able to saturate the capacity of a Gigabit Ethernet link with the smallest possible (64 byte) Ethernet packets (because the corresponding data rate achievable with such packets in D-ITG cannot exceed 650 Mbit/s). The authors of D-ITG frequently emphasized that they aim to simulate (i.e., to reproduce a traffic profile according to the stochastic models of IDT and PS), and not to emulate the traffic (which is defined by the authors as a reproduction of traffic resulting from a specific protocol, e.g., reproduction of HTTP messages without using a browser). So, every time when the authors speak about the generation of, e.g, VoIP traffic, they actually mean a reproduction of a “VoIP-like” traffic induced at the transport service interface (e.g., UDP in this case) meaning that only the UDP header fields (with the randomly chosen payload buffer) and no other VoIP specific payload fields are set in the generated traffic. In the consequence, the traffic generated by ITG using its analytical modelling function can be used in the performance experiments on the network or transport layer (e.g., to evaluate performance of an IP router) but not in the experiments at the application layer (e.g., to evaluate performance of a VoIP gateway) because the generated “VoIP-like” traffic will not be recognized as real VoIP traffic by a real VoIP/SIP gateway. The same would be valid for the traffic from other network applications and services at the application layer. For these reasons, the functionality of D-ITG has been extended in a new version [BDP12] in order to be able to combine the already existing analytical models with the trace-based techniques. As of [BDP12], the tool is now able of replicating Packet Capture (PCAP) traces which allows to support arbitrary payload patterns (i.e., also from application layer traffic sources). Further, the authors mentioned the possibility to improve the performance characteristics of the generator by means of using novel socket families (e.g., PF RING DNA, cf. [ntop1] and later in this section) in the traffic transmitter component.

44

2. Foundations and Research Field

Network throughput measurement tools iPerf [iPerf3] and Netperf [netperf] are typical examples of tools developed with the aim to be used primarily as network throughput measurement and benchmark tools. iPerf was originally developed by NLANR/DAST and the current version iPerf3 of the tool provides facilities for active measurements of the maximum achievable throughput in IP networks. It supports tuning of various parameters related to timing, buffers and protocols (TCP, UDP, SCTP with IPv4 and Internet Protocol Version 6 (IPv6)). For each test it reports the throughput, packet loss, delay jitter, and other parameters (like, e.g., observed buffer sizes). Netperf provides tests for both unidirectional throughput, and end-to-end latency for TCP and UDP traffic via BSD Sockets as well as SCTP traffic for both IPv4 and IPv6. Benchmark tools like iPerf3 and Netperf usually generate as much traffic as possible to measure the network performance. Therefore, strictly considered, they are no traffic generators because they cannot generate specific traffic profiles specified by the experimenter, e.g., in terms of interdeparture times and sizes of packets.

2.2.3. Traffic Generation at Network Layer Service Interfaces Harpoon Harpoon is a tool developed by Sommers and Barford in [SoB04] for generating representative packet traffic at the IP flow level. A flow is defined as a series of IP packets between a given pair of tuples (IP address, port number) using a specific transport protocol (e.g., TCP or UDP). The tool can be used in a router or emulation testbed environment and generates TCP and UDP packet flows that have the same byte, packet, temporal (in terms of the inter-arrival times of connections) and spatial (in terms of the IP address ranges for the sender and receiver) characteristics as measured at routers in live environments. Harpoon is distinguished from other tools that generate statistically representative traffic in that it can self-configure by automatically extracting parameters for its hierarchical traffic model from standard Netflow [Cisco1] logs or packet traces. The flow-level traffic generation is abstracted into a series of applicationindependent file transfers that use either TCP or UDP protocols for transport. Harpoon uses a hierarchical two-level flow-based traffic model which consists of sessions comprising a series of connections separated by durations drawn

2.2. State-of-the-Art in Workload Generation

45

from the inter-connection time distribution. Source and destination IP address selection is weighted to match the frequency distribution of the original flow data. The number of active sessions determines the overall average load offered by Harpoon. A heavy-tailed empirical file size distribution and an ON/OFF transfer model can generate self-similar packet-level behaviour. In summary, the model used in this tool is made up of a combination of five distributional models for TCP sessions: file size, inter-connection time, source and destination IP address ranges, and number of active sessions. Each of this distributions can be specified manually or extracted from packet traces or Netflow data collected at a live router. It is important, that the approach taken in Harpoon uses source-level traffic descriptions that do not make assumptions about the transport layer, rather than packet-level descriptions based on prior network state embedded in low-level timings [FlP01]. Swing Swing is a closed-loop, network responsive traffic generator for network emulation test-beds developed by Vishwanath and Vahdat in [ViV06, ViV09]. The tool uses a rather comprehensive structural model for the traffic observed at a single point in the real network and automatically extracts distributions for different characteristics of user, application, and network behaviour in order to generate synthetic traffic at a single target link modelled as a dumb-bell in a network emulation environment ModelNet [VYW02]. In particular, the proposed structural model consists of four levels: 1) Users, characterized by the client IP address, the number of requests, and the think time between individual requests, 2) Sessions, characterized by the number of parallel connections and the time between the start of connections, 3) Connections, characterized by the destination or server IP address, the number of request-response exchanges per connection, the size of the request and the corresponding response, think time between exchanges on a connection, type of the transport protocol (TCP or UDP), and packet size and packet arrival distributions for individual responses, 4) Network characteristics including link capacities, loss rates, and latencies (delays) for paths connecting each host in the original trace to the target link. The authors claim that their main contributions are 1) the ability to both extract wide-area network conditions from an existing packet trace and to replay these network conditions with sufficient fidelity to reproduce essential characteristics of the original trace, and 2) the understanding of the requirements for matching the burstiness of the packet arrival process of an

46

2. Foundations and Research Field

original trace (e.g., well-known Auckland, MAWI, and CAIDA traces) at a variety of time scales, ranging from fine-grained (e.g., 1 ms) to coarse-grained (e.g., multiple minutes). Swing aims at matching burstiness in terms of 1) both number of bytes and number of packets, 2) both directions (arriving and departing) of a network interface, 3) a variety of individual applications within a trace (e.g., HTTP, peer-to-peer file sharing, SNMP, NNTP, etc.), and 4) original traces at a range of speeds and taken from a variety of locations. The modelling methodology proposed with Swing has also some known limitations. First, the application behaviour is modelled based on the information extracted from the publicly available packet traces (Auckland, MAWI, CAIDA) which contain only network and transport layer headers. Second, the accuracy of the tool is limited by the accuracy of the used traces and the model parameters extracted for user, application, and network behaviour. Further, the focus is on generating traffic for the single network link modelled as a dumb-bell in a network emulation environment. So, the distribution of requests and responses among particular clients and servers in the original trace is not modelled. LiTGen LiTGen is an easy to use and tune open-loop traffic generator developed by Rolland, Ridoux, and Baynat in [RRB07a, RRB07b] that statistically models IP traffic on a per user and application basis. From a packet level capture originating in the operational wireless access network of Sprint Labs, and taking the example of Web (in [RRB07a]) and P2P and mail wireless traffic (in [RRB07b]), the authors show that their hierarchical traffic model is sufficient to reproduce accurately the traffic burstiness and scaling properties at small and large time scales. LiTGen relies on a hierarchical description of traffic entities, which are represented by one or several uncorrelated random variables either related to a time (duration or inter-arrival time) or a size metric. For example, the model used for Web traffic in [RRB07a] consists of four levels with the following corresponding entities: 1) Session level, characterized by the number of downloaded pages and the inter-session durations, 2) Page level, characterized by the page size (defined as the number of objects involved in a page) and the corresponding page reading duration, 3) Object level, with the objects inter-arrival times within a page and the number of packets in an object, and 4) Packet level, characterized by the inter-arrival times between packets in an object. Selected entities can

2.2. State-of-the-Art in Workload Generation

47

be removed from the model for simplicity, if needed (as it has been done, e.g., with the page level in [RRB07b]). The authors emphasize that the proposed model is intentionally kept simple since the client / server interactions are not modelled and the network or protocol characteristics (like, e.g., round-trip times, link capacities, TCP dynamics) are not considered. So, the model does not rely on a complex emulator (that would reproduce the link layer or TCP dynamics) and allows fast computation when being executed on a commodity hardware (while, e.g., Swing relies on a third-party network emulator ModelNet requiring high computing resources). Similar to the methodology in [ViV06, ViV09], the authors used second-order analysis (wavelet-based methods) in order to identify the dependencies across the random variables composing the underlying traffic model and to prove the ability of LiTGen to reproduce accurately the captured traffic and its properties over a wide range of time scales. The analysis showed that an introduction of a simple dependency between the object sizes and the distribution of the packet inter-arrival times can succeed in reproducing the traffic correlation structure accurately. The authors claimed, therefore, that under certain conditions it can be possible to reproduce the second order traffic characteristics without introducing more complex non-renewal processes and considering network or protocol peculiarities in the LiTGen model while leading to a much simpler traffic generator than, e.g., Swing. However, in order to use LiTGen in an operational network, one must characterize the dependency of the packets inter-arrival times distribution on the objects sizes. The authors propose to model this relation analytically, by finding suitable distributions for different object sizes or by involving simple (e.g., Markovian) TCP and/or network models as an input of the traffic generator. And this is exactly the same modelling effort made in Swing in order to provide for realistic and responsive traffic generation in network emulation environments. So, in the general case, the critique on the open-loop traffic generators stressed in [FlP01] can be applied also to LiTGen.

2.2.4. Traffic Generation at Data Link Layer Service Interfaces KUTE KUTE (a Kernel-based UDP Traffic Engine) is a generator of UDP traffic which is designed to achieve high performance over Gigabit Ethernet [KUTE,

48

2. Foundations and Research Field

ZKA05]. It is based on two Linux 2.6 kernel modules (the sender and the receiver) that operate directly on the network device driver bypassing the Linux kernel networking subsystem. The KUTE sender generates packets for a specified duration, computes the inter-packet gaps based on the specified sending rate (in packets per second), and uses polling of the CPU cycle counter in order to wait for the sending time of packets. The following parameters can be specified: source and destination IP address, source and destination ports, packet rate, packet length, duration of the flow, packet payload, Time To Live (TTL), Type of Service (ToS), and whether the UDP checksum and IP identification field should be used. The sender can create up to four different flows concurrently. The flows may have different packet rates, but must have the same duration. The KUTE receiver creates a packet inter-arrival histogram that can be accessed via the Linux proc file system. Furthermore, when the module is unloaded, it outputs the necessary information to compute the mean and standard deviation of the distribution into the kernel log file. It should be noted that KUTE is strongly restricted to the generation of UDP packets and their injection as Ethernet frames (at the data link layer) with specified inter-departure times. The tool achieves a maximum packet rate of 740.000 pps (for packet length of 64 byte and infinitesimal inter-arrival times) and is able to saturate the capacity of the Gigabit Ethernet link with packets of 256 byte length [BGPS05]. However, KUTE does not provide any further traffic modelling support and cannot compute the traffic statistics directly because the Linux kernel does not provide floating point arithmetic. The sender can not be controlled from user-space while it is running. Since the architecture of the tool is strictly related to the architecture of the kernel, it lacks of extensibility and cannot take advantage of the support of kernel-space extensible interfaces. BRUTE The Browny and RobUst Traffic Engine (BRUTE) presented in [BGPS05] is a user-space application running on Linux 2.4-6, that is able to accurately generate customizable IPv4 and IPv6 Ethernet traffic flows with very high data rates. BRUTE uses a script language in order to control the generated traffic pattern over time and can be further extended by means of additional traffic patterns (T-modules) implemented in C language. A parser is responsible for reading the user commands from the script files and storing them in an internal database. The traffic engine examines the database entries and instantiates the corresponding traffic handlers (micro-engines) defined

2.2. State-of-the-Art in Workload Generation

49

in the T-modules. The micro-engines are sequentially executed in order to generate the specified traffic. BRUTE is provided at [BRUTE] with a set of predefined traffic patterns (T-modules): CBR (constant bit rate), CIDT (constant inter-departure time), POISSON (exponential inter-departure time), PAB (Poisson Arrival of Burst), CBR-EXP/OFF-EXP (VoIP), RTCP SR (send-report message to measure RTT), and TRIMODAL (trimodal Ethernet frame size distribution). BRUTE is reported to be able to achieve a maximum packet rate of 650.000 pps for packets of 64 byte length (which corresponds to a data rate of approximately 400 Mbit/s) [BGPS05]. So, its performance is comparable to the performance of KUTE (which is a kernel-based solution) while providing a high level of accuracy and precision of the packet inter-arrival times. Further, as reported in [APF08a], BRUTE can achieve higher values of throughput (up to 1.090.000 pps) with 64 byte packets only in intermittent bursts. BRUNO A possible solution to improve the performance and accuracy of the traffic generation process may be the use of flexible hardware platforms and cooperative software/hardware design. For example, the Intel IXP2400 Network Processor (NP) is a multi-core processor dedicated to packet processing [IXP2400] which has been used in traffic generators presented, e.g., in [BBCR06] (Pktgen) and [APF08a, APF08b] (BRUNO). BRUNO (BRUte on Network prOcessor) available at [BRUNO] is based on a modified version of BRUTE which has been designed to run on the PC that hosts the Network Processor card and is responsible for the computing of the packet lengths and departure times according to the specified traffic model [APF08a]. The host PC writes the computed data into the memory shared with the packet processing units of the Network Processor (so-called micro-engines) which are responsible for the generation of real packets and sending them with the proper timeliness. In this way, BRUNO retains the high flexibility of BRUTE while improving its performance characteristics in terms of the achievable packet and data rate. Furthermore, a feedback mechanism and a time correction scheme have been introduced in BRUNO in order to improve the system precision and accuracy in reproduction of packet departure times determined according to the traffic model. The Traffic Generator micro-engines report in a feedback ring the actual packet departure times, which are then used by the Load Balancer micro-engine for an adaptive time modification. The experiments

50

2. Foundations and Research Field

carried out in [APF08a] prove the effectiveness of this approach in reducing the mean inter-departure time error. Further, results of experimental tests have shown the ability of BRUNO to generate 64 byte packets with a short term packet rate (calculated as a mean over intervals of 0.10 s) of up to 1.488.000 pps (which means that the tool can saturate the capacity of a Gigabit Ethernet link already with the 64 byte packets). Tools for packet-trace replay Packet trace replay can be performed in a flexible manner in software at different network interfaces and a series of corresponding tools have been proposed in the network research community. The most prominent example is probably Tcpreplay, which has originally been designed for a classic packet-level trace replay of TCP traffic in order to inject malicious traffic patterns into IDPSs. In the meantime, Tcpreplay has strongly evolved and has obtained capabilities to replay traffic patterns to Web servers. Currently, Tcpreplay is a suite of free (GPLv3 licensed) open source utilities for UNIX and Win32 operating systems for editing and replaying previously captured network traffic in libpcap format (PCAP) with the aim to test a variety of network devices (such as switches, routers, firewalls, and IDPSs). In particular, Tcpreplay suite includes a set of PCAP file editors and network playback utilities, e.g.: tcpprep: is a multi-pass PCAP file pre-processor that allows the researcher to classify the captured packets as originating from the client or server and split them into different output files to be used by tcprewrite and tcpreplay. tcprewrite: is a PCAP file editor which allows to modify and rewrite the Ethernet, IP, and TCP/UDP packet headers. tcpreplay: is the tool to replay PCAP files at arbitrary speeds onto the network with an option to replay with random IP addresses. tcpreplay supports both single and dual NIC modes for testing both sniffing and in-line devices. tcpreplay-edit: extends the tcpreplay tool by a large set of functions to modify the packets on the fly during the replay. tcpliveplay: provides the replay function for TCP traffic stored in a PCAP file with the possibility to adapt the rate to the responses of a concrete

2.2. State-of-the-Art in Workload Generation

51

remote TCP server. The utility can be used to conduct tests at the application layer. As of the current version 4.0, Tcpreplay has been enhanced to support the corresponding functions for testing and tuning IP Flow/NetFlow hardware. The accuracy and the performance of the playback tool has been significantly improved by introducing support for the modified netmap device drivers for 10 Gigabit Ethernet NICs [netmap]. Similar tools have been proposed for high performance packet replay, e.g., TCPopera [HoW06] and TCPivo [FGB03]. TCPopera tries to accomplish two primary goals: (1) replaying TCP connections in a stateful manner, and (2) supporting traffic models for trace manipulation. To achieve these goals, TCPopera emulates a TCP protocol stack and replays trace records interactively in terms of TCP connection-level and IP flow-level parameters. The second tool, TCPivo, employs novel mechanisms for managing trace files and accurate low-overhead timers in order to achieve high throughput and accuracy. In addition, through the use of low-latency kernel patches and priority scheduling, TCPivo can be made highly resilient to background system load. Using these mechanisms, the tool is able to support packet replay and achieve sufficient packet and data rate, e.g., for the OC-3 links. Both tools TCPopera and TCPivo have been used in test environments for Intrusion Detection and Prevention Systems (IDPSs). However, these projects seem not to be supported any more for quite a long period of time. High-performance traffic generation for 10 Gigabit Ethernet With the rapidly increasing capacities of the links deployed on production networks (e.g., 10 Gigabit Ethernet links are becoming common) the highperformance traffic generation becomes very important. A possible direction of research to improve the performance of traffic generation is the use of parallelism, which is increasingly provided by modern commodity hardware3 . One can expect that traffic generators can efficiently generate packets on multi-core systems if they are able to properly exploit such architectures. Further, the design of currently available 10 Gigabit Ethernet NICs (such as those based on the Intel 82599 controller) is already logically partitioned into several independent receive and transmit (RX/TX) hardware queues, so that multiple cores can therefore receive and transmit packets in parallel. From 3 For

example, the recently released processor family Intel Xeon Processors E7 v2 can have up to 15 cores providing up to 30 logical processors to the applications by means of the hyper-threading technology.

52

2. Foundations and Research Field

the operating system point of view, it is possible to simultaneously poll and send packets per queue thus maximizing the overall throughput. Therefore, it is important that the operating system makes these queues available to applications and does not force the multi-threaded applications to serialize their operations when all threads need to access the same Ethernet device [RDC11]. An other crucial factor for the performance of a traffic generator on multicore systems is a capacity of the socket which is used for sending packets towards the NIC device driver. Most of the software-based generators presented in this section use either the PF PACKET socket family (on Linux distributions) or AF INET socket family (on Windows systems). However, these socket families have been designed in a single-core architecture and show a number of strong limitations and bottlenecks when being used on multi-core systems (cf. [BPGP12]): • The PF PACKET socket does not allow to select a specific hardware queue for transmission when used on top of multi-queue NICs. This results in thread serialisation when multiple threads send packets on the same device, no matter if they share the same socket or not. • It is based on a per-packet send() system call which represents a remarkable overhead. The system calls versions provided for batch transmission (e.g. sendmmsg() on Linux and a version of send() on Windows) are hardly useful for a traffic generator, because the inter-departure times of packets in the batch cannot be specified in these calls. • The packet payload must be transferred into the kernel, which induces a higher overhead than a normal memcpy operation performed to a memorymapped region in the user-space. • Further, packets are not immediately directed to the NIC device driver but pass through a series of mechanisms (like, e.g., registered packet filters, traffic control modules, etc.) which induce additional overhead. A single socket cannot be used exclusively for packet transmission, so that a severe performance penalty may result in a multi-core scenario when several sockets are used for parallel transmission. A series of different solutions have been proposed to improve the efficiency of Linux networking subsystem in respect to the above-mentioned bottlenecks in the PF PACKET socket. For example, netmap [netmap] integrates in the same interface a number of heavily modified device drivers mapping the

2.2. State-of-the-Art in Workload Generation

53

NIC transmit and receive buffers directly into the user space. A version of this driver has been integrated into the new PF RING DNA framework [ntop1] and allows to saturate the capacity of a 10 Gigabit Ethernet link with the smallest possible (64 byte long) packets both in generation and in transmission, when simple test programs are used, e.g., for packet generation. However, even when the bottlenecks in the packet transmission are removed by using such a properly modified driver, a non multi-core aware design of the packet generation application itself may strongly limit the performance of the overall system (as it has been reported, e.g., in [ntop2] for the Ostinato packet traffic generator available at [Sri16] used in combination with PF RING DNA). Based on the research in [APF08a, APF08b], a modular architecture of an Ethernet traffic generator using the integrated co-design of both kernel-space and user-space components is presented in [BPGP12]. A set of traffic engines (which are completely user-space threads) is responsible for the generation of a global ordered stream of packets according to a set of independent traffic models and dispatching the generated packets across a set of packet transmitters. A set of parallel packet transmitters is in charge of actually sending the generated packets to the NIC (using polling in order to precisely meet the specified inter-departure times). Each transmitter is implemented using a novel socket type PF DIRECT proposed by the authors (see below) and an active context implemented as a kernel-space thread, which can be assigned to a specific hardware queue on the NIC. In order to avoid the above-mentioned limitations of the Linux networking subsystem, the authors in [BPGP12] designed a novel socket type PF DIRECT which consists of 1) a memory-mapped single-producer-single-consumer queue for payload and meta-data (which avoids the overhead of a system call to copy data into the kernel-space and provides a wait-free mechanism for data sharing), 2) a pool of pre-allocated socket buffers (which allows to keep using the mandatory sk buff socket structures in order to work with non-modified NIC device drivers), and 3) a direct interface to a hardware queue (in order to avoid the overhead induced by the optional traffic control or packet filter modules). The results of experimental tests4 revealed that the proposed traffic generator is able to saturate the capacity of a 10 Gigabit Ethernet link 4 The

experiments have been conducted on a machine using 6 cores with an Intel X5650 Xeon CPU (2.66 GHz clock, 12 MB cache), 12 GB DDR3 RAM, and an Intel E10G42BT NIC with the 82599 controller on board, running Linux with the 3.0.1 kernel and the ixgbe 3.4.24 NICs driver. The hyperthreading has been enabled, so that the experiments have been carried out on 12 virtual cores.

54

2. Foundations and Research Field

with 128 byte long packets and can achieve a packet rate of 13.000.000 pps with minimum size (64 byte) packets, which is very close to the theoretical maximum of 14.880.000 pps for the packet rate achievable on a 10 Gigabit Ethernet link. The authors in [BPGP12] claim that the proposed modular architecture allows to transparently use parallelism for generating traffic according to arbitrary traffic models, which must conform to a simple interface and can be added by the user through a factory pattern implemented in C++. However, the presented experimental tests have been conducted with the simplistic models for CBR and Poisson Ethernet traffic only. Further, the closed-loop traffic generation required, e.g, to emulate TCP traffic is not supported by the tool in general. A flexible high-speed packet generator MoonGen has been recently proposed for the generation of Ethernet packet traffic on 10 Gigabit Ethernet links [EGRWC15]. It can saturate the capacity of a 10 Gigabit Ethernet link with minimum-sized packets while using only a single CPU core by running on top of the packet processing framework Data Plane Development Kit (DPDK) [DPDK] and commodity hardware. The authors note, that MoonGen utilizes several hardware features of commodity NICs that have not been used in Ethernet packet generators previously. In particular, it uses hardware time-stamping capabilities of the NIC in order to perform measurements of latency with sub-microsecond precision and accuracy. Furthermore, the authors proposed a novel method to control the inter-packet gaps (between the Ethernet packets) in software in order to mitigate the timing issues arising with the software-based packet generator. From the point of view of the author of this thesis, one can legitimately question how realistic the resulting traffic will be in respect of the variety of existing network applications which are currently using and probably will continue to use the original sockets (and not their versions optimized for performance in the proposed manner) on Linux and Windows platforms. We should note, that a significant drawback of the approach followed in [BPGP12, EGRWC15] is that it allows the traffic generating application to take direct control of the network hardware removing all software layers and making the application extremely vulnerable in case of a crash or malicious attack. This can hardly be allowed for regular network applications and services in particular due to the reasons of network security and stability. Therefore, the architecture proposed, e.g., in [BPGP12] is expected to remain a very specific solution for open-loop Ethernet traffic generation for performance tests in selected 10 Gigabit Ethernet scenarios. Finally, we can conclude that the architecture of the workload generator to

2.2. State-of-the-Art in Workload Generation

55

be developed in this thesis should, wherever possible, make use of parallelism in the workload generation process in order to be able to exploit the potentials of current state-of-the-art multi-core system platforms equipped with multiqueue NICs used in combination with the appropriate multi-core aware device drivers and the corresponding network sockets software. We note, however, that the improvement of the networking subsystem of the underlying operating system, development of novel types of sockets or specialized NIC drivers (e.g., optimized for performance) is definitely outside of the scope of this thesis.

2.2.5. Workload Tests in Research and Industry After the presentation of a comprehensive list of different workload and traffic generators in previous sections, we should address a remaining question how these tools can be actually used for workload tests in networking research experiments. Load testing can be generally defined as the simulation of multiple users using the observed service or application at the same time and working with it concurrently. According to this rather general definition the load testing may comprise several different types of testing. According to the testing objectives and procedure, the following types of testing can be identified. Performance testing: In this type of tests, the workload is increased gradually by adding more and more virtual users to the test while the performance parameters of the System Under Test (SUT), e.g., throughput, response time, error rate, etc., are monitored at any test phase. Capacity testing: This type of test is concerned with one of the most common questions in load testing: how many concurrent users the service or application can handle while maintaining an acceptable response time and error rate? Virtual users are added gradually to the test, but in this case the values of the performance parameters are known in advance and the experimenter just needs to check that the expected target values are really achieved. Performance or capacity tests help to reveal potential bottlenecks in the observed service or application. For example, a Web application can consist of several modules used to process requests. If one of them has a technical limitation, it limits the performance of the whole system. Stress testing: When the load goes beyond the capacity limit of the SUT, the observed service or application starts responding very slowly and can even produce errors. The main purposes of stress testing are: a) to find

56

2. Foundations and Research Field

the capacity limit, b) to check that when the capacity limit is reached, the system handles the stress situation correctly, i.e. it produces graceful overload notifications and does not crash, c) when the load is reduced back to regular level, the system returns to normal operation retaining the performance characteristics. Volume testing: During a volume test, the experimenter tries to maximize the amount of processed data and/or the complexity of each transaction, operation, or request. For example, for testing the file upload facility of a Web application, the experimenter should use the largest files available. And, in order to test the application’s search engine functions, he should try to produce the longest search results possible. Endurance testing: This type of testing is used to check that the system can stand the load for a long time or a large number of transactions or requests. It usually reveals various types of resource allocation problems. If a small memory leak is present, this may not be evident on a quick test, but will influence the performance after a long time. For endurance testing it is recommended to use changing periodic load to provoke resource reallocation. Regression testing: Integration of the load testing as a part of the regular development process by creating regression load tests and applying them to every new version of the application or service being designed. Considering the above-mentioned types of load testing with their partially different testing goals, it will be a very interesting and challenging task to elaborate a unified method for load specification and generation which would allow to perform such different types of tests in one single coherent approach.

2.3. A Unified Approach to Workload Modelling and Generation in Computer Networks The wide range of complex tasks to be fulfilled in a computer network forces one to structure the communication software and to use layered architectures for the horizontal layering of different network functions. Such architectures define the functionality of the particular layers, the interaction principles between them, and specify the concrete types of services provided by every layer at the corresponding service interface(s) [K¨on12].

2.3. A Unified Approach to Workload Modelling and Generation

57

So, for the task of workload modelling and generation, it is very important to choose the target service interface, at which the workload is to be considered. The choice of the service interface, in turn, determines the possible types of requests submitted by the service user(s) (comprising the environment E) to the service providing components (representing the system S) at the target service interface (IF ). In this dissertation, we will use the definition of workload for computer and communications systems proposed by Wolfinger in [WoK90]. Definition (Workload). The workload L = L(E, S, IF, T ) denotes the total sequence of requests which is offered by an environment E to a service system S at a well-defined interface IF during the time interval T . It should be emphasized, that the workload definition given above is strongly oriented on the target service interface IF for the modelling and generation of workload. Therefore, an appropriate approach for modelling and generation of workloads based on this interface-oriented definition must provide a means for the characterization of 1) the arrival process of requests at IF during T , and 2) the resource requirements and other relevant attributes of the individual requests. Such characterization must be as precise and as realistic as it is necessary for the specific research purpose and desired spectrum of use of the workload model. Furthermore, considering the discussion of various workload and traffic generators in networking research along with different possible types of load testing presented in Sec. 2.2, a unified approach for workload modelling and generation in this thesis should meet at least the following basic requirements: Support for different levels of abstraction: it should be possible to specify workload models using different levels of abstraction, e.g., application-level, flow-level, or packet-level models, cf. Sec. 2.2). Support for different levels of detail: it should be easily possible to refine or coarsen the specification of the relevant workload characteristics if required in a concrete modelling study. The choice of the appropriate level of detail refers to both inter-arrival times and attributes of requests. Measurement-based workload modelling: directly use measurements regarding arrival process as well as types and attribute values of requests as observed in measurement studies tracing workload and traffic generation from real network applications and services. In particular, the approach should be able to combine both trace-based and analytical model-based techniques (cf. Sec. 2.2) in order to achieve a high degree of flexibility in the specification of different model entities.

58

2. Foundations and Research Field Step 3 + 4: Modelling of virtual users by means of UBA

Environment (E) Users of the network service U1

U2

U3

UBA4

Service Users

U4

SU1 SU2 SU3

UBA3

UBA1,2

SU4 Abstract reactions

IP (Network) Ethernet (Data link)

Network node with TCP/IP protocol stack

Step 1: System decomposition

Abstract requests

IF

IF

IF

E

Service System (S) Conceptual model

Step 2:

S

Load modelling

Load model components

Figure 2.1.: Unified approach to workload modelling illustrated for the case of modelling at the IPv4 network service interface (own Fig.).

Support for different service interfaces: it should be possible to generate the workload as a sequence of requests at a concrete real service interface according to the specifications in the underlying workload model. For example, Web traffic can be generated as a sequence of HTTP requests and the corresponding HTTP responses at the (application layer) HTTP service interface according to an application-level model of Web traffic. Consider the current system state: wherever required, the approach should provide support also for the “closed-loop” workload models which reflect the dependency of the workload generation process from the current network state and are, therefore, responsive to changing network conditions [FlP01]. We note that it can be necessary to consider the current state of the system S in the workload model in case when the internal behaviour of the service provider (affected, e.g., by the changing network conditions) significantly influences the interactions of the user with the system at the service interface. A generalized approach for workload modelling and description has been introduced by Wolfinger in [WoK90]. According to the proposed approach, the procedure of constructing a workload model can be accomplished systematically based on the four main steps which are illustrated in Fig. 2.1 and are described in the following. The first of these steps is motivated by the fact that workload modelling necessarily requires a well-defined interface at which the workload is offered (cf. the definition of workload ). Step 1: Decomposition of system and environment. At the beginning, the modeller (which may be a single researcher, experimenter, test engineer or

2.3. A Unified Approach to Workload Modelling and Generation

59

a whole quality assurance team) has to decide where to place the boundary line between what he considers as system S on the one hand and environment E on the other hand. This decomposition directly provides the interface IF between S and E which can, e.g., consist of one particular local interface IFl or correspond to the union of several (also geographically distributed) interfaces IF1 , IF2 , . . . , IFk in the network. Further, the modeller identifies the set of load generating users which are relevant for the given modelling task and, therefore, belong to the load generating environment E. These users can correspond, e.g., to human end users or to some load generating applications or system processes. Step 2: Choice of abstraction level in modelling. At this step the modeller has to decide which requests (passed from E to S) and which reactions (produced by S and observable by E) have to be taken into account in the load model. For this reason, relevant users are observed and analysed in respect to the requests they are generating. Depending on the objectives of the current modelling task the modeller can decide, e.g., to include only the typical requests from users in the load model. These typical requests are further characterized by the unique request type (so that disjoint request classes are to be built by the modeller for this reason) and the set of associated type-specific attributes (with predefined domains for attribute values). The possible system reactions are to be handled in the same manner (in respect of reaction types and attributes). Step 3: Analysis and description of possible interactions. At this step, the modeller is concerned with the specification of possible sequences of interactions between E and S at the chosen interface IF . This is quite similar to some service specification for a communication service, which also specifies the sequences of service primitives which are possible over time. In order to specify the interactions between the service users and the system S, real service users are mapped to virtual users which are represented by the corresponding components in the modelling domain. Finally, the behaviour of virtual users is described by means of an appropriate formal specification method (e.g., by means of UBAs introduced in Chapter 3). For the choice of an appropriate specification method it is very important that the model description allows its execution without lot of effort (e.g., by means of a corresponding load generator). Step 4: Description of actual interactions between E and S. In order to describe the actual interactions between the environment E and the system S, the following two tasks have to be solved:

60

2. Foundations and Research Field

• For each virtual user U ∈ E in the environment the sequence of requests L(U, S, IF, T ) which U generates during the time interval T and passes to S, has to be described. For a given interface IF a sequence of requests generated by the user U can be represented by a finite vector of (time, request)-tuples L(U, S, IF, T ) = ((t1 ,r1 ), (t2 ,r2 ), . . . , (tk , rk )) for some k ∈ N. In each (time, request)-tuple (ti , ri ) the value of ti ∈ T denotes the generation time of the request ri at IF . The request generation times ti are assumed to be real values (ti ∈ R) and their sequence (t1 , t2 , . . . , tk ) characterizing the arrival process of requests in the described request sequence is assumed to be non-decreasing (thus, ti ≤ tj for any i < j, i ∈ N, j ∈ N, 1 ≤ i ≤ k, 1 ≤ j ≤ k). • The total workload L(E, S, IF, T ) offered from the environment E to the system S is described by means of the superposition of the sequences of requests generated by the virtual users U being part of the environment E. The superposition of request sequences may be specified in different ways, depending on the interface IF chosen for workload modelling. For example, the requests from different users can be arranged according to a chosen service discipline (e.g., First-Come, First-Served (FCFS)) before they are handed over to the system S. At the end of step 4, the experimenter should be able to generate the total workload L for an experiment of the given observation time interval T . To accomplish this task, each of the users U being part of the environment E can be replaced by an individual load generator creating the specified sequence of requests L(U, S, IF, T ). We should emphasize, that the target service interface IF for workload modelling must be chosen by the experimenter strongly according to the objectives of the specific experiment or the particular study to be carried out. For example, the experimenter would choose the target interface presumably to the IP service interface in order to evaluate the performance of IP forwarding functions in a router. But the experimenter would rather decide to select the HTTP service interface as a target interface for workload modelling in order to estimate the mean response time of a Web server under different server loads. Further, in the case study presented in Chapter 11 of this thesis, values of different QoS metrics for RTSP video streaming in Wireless Local Area Networks (WLANs) have been obtained while the reliable TCP transport service has been used for the transmission of the RTP video frames. Therefore, it was straightforward to choose the TCP transport service interface in order to generate background traffic (represented by additional TCP traffic sources) in the experimental WLAN.

Part II.

Workload Specification and Modelling

3. A Formal Workload Description Technique One of the main objectives of this thesis is to combine the steps of formal description and generation of workloads in a single unified load generation approach (which is called UniLoG approach in the following). In order to achieve this goal, a Formal Description Technique (FDT) we are looking for has to fulfil the requirements specified by the four steps of the UniLoG approach (cf. Sec. 2.3). So, our technique for the formal workload description should include capabilities for the description of different types of requests along with their parameters (attributes) as well as for the description of interactions (“request/reaction”-pairs) between the users and the system being possible at the chosen interface. Moreover, the representation of a workload model provided by the technique should be easily executable (e.g., by means of a workload generator to be developed) in order to enable the experimenter to generate the corresponding real workloads for experimental studies.

3.1. The Basic Concept of User Behaviour Automata The idea of using Finite State Machines (FSMs) to describe workloads offered to a communication system from users of various communication services (like data, voice, and video communication) has been first proposed by Wolfinger and Kim for the case of Integrated Services Digital Networks (ISDNs) (cf. [WoK90]). Based on the assumption that for every stream SR of requests there exists exactly one source of requests in the environment E, namely the user U (SR ) generating SR , the authors proposed to describe the behaviour of user U by means of an extended finite automaton U BA(U ) = {Φ, Tφ }, where Φ = {φi , φa , φb , φt } denotes the set of macro-states φi , φa , φb , φt with predefined semantics described below, and TΦ denotes the set of transitions between these macro-states (cf. Fig. 3.1). © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_3

64

3. A Formal Workload Description Technique

Figure 3.1.: A UBA on the level of macro-states (Fig. from [WoK90]).

Further, in order to describe the possible types of user behaviour characteristic for users of different network and communications services (e.g. digital video, voice or data communication over ISDN networks) the corresponding macro-states with the following predefined semantics have been introduced in the UBA (cf. [WoK90]): • φi : idle (or initial ) macro-state, which is entered when a new user described by the automaton is created in the environment E and some resources to serve it may already have to be allocated. φi is left exactly at the time when the corresponding virtual user starts the generation of requests. • φa : active macro-state is provided to model the main actions of the service user, i.e. the generation of requests (with their corresponding resource demands) separated by the user-dependent delays between these requests (so-called request Inter-arrival Times (IATs)). • φb : blocked macro-state which can be used to explicitly model the waiting for a certain kind of reaction from the system S. In the original work

3.1. The Basic Concept of User Behaviour Automata

65

[WoK90], system reactions have been assumed to be always consequences of prior user requests. If the system reactions have no impact on the user behaviour or if their impact is of no interest in a particular modelling study, the blocked macro-state φb may be left empty. • φt : terminated macro-state, which is entered when the user has finished using the service. After the terminated user is removed from the environment, any resources that were in use to service it can be reclaimed by the system. The transitions between the macro-states in [WoK90] were allowed to be defined by means of transition probabilities (e.g. the probability p(φa → φb ) for changing the macro-state from φa to φb , cf. Fig. 3.1) or according to some arbitrary present “knowledge” of the user U (here, the authors mentioned traces resulting from measurements of user behaviour at the considered interface). We note that the four types of user behaviour represented by the macrostates mentioned above are rather general and can be observed at different application or network service interfaces. For example, let us consider the (inter-)actions between the human user (representing the environment E) and the Web browser (representing the system S which also includes the corresponding Web server and the communication network in this case) at the browser command interface (cf. Fig. 3.2). Initially, the corresponding virtual user resides in the idle macro-state φi of the UBA. Each time a real user opens a new browser window or a new tab sheet inside of it, the virtual user changes its macro-state from φi to φa and becomes active. While being in the active macro-state φa the virtual user can generate requests which represent the commands from the real user to the Web browser (e.g., to retrieve Web sites by entering the URL directly in the address line of the browser window or by following a link on the current site). Immediately after the next request has been issued, the user has to wait for the response from the Web server and, therefore, it becomes blocked for a certain period of time until the requested page (or at least its base object) is loaded by the browser. Hence, the virtual user switches from φa into the blocked macro-state φb of the UBA and resides in this macro-state until the full page has been loaded. The virtual user may also leave the blocked macro-state φb when the browser can show the first results of loading the page (which is possible when the basic object of the page is loaded). From this point of time the user can take further actions (like start reading or navigate to another page), i.e. it becomes active again and returns into the macro-state φa of the UBA. The virtual user may

66

3. A Formal Workload Description Technique

Browser user

Environment E o UBA

Ii

Ia

Navigate(“www.foo.com/page1.html“)

System S Web browser

IFc(HTTP)

Web server www.foo.com

GET /page1.html

/page1.html GET /scheme/styles.css

Ib

/scheme/styles.css GET /scripts/tools.js

/scripts/tools.js

Ready(“page 1“)

Ia

Navigate(“www.foo.com/page2.html“)

Ib Error(“page 2“)

GET /page2.html

404 Not Found

It

Figure 3.2.: Modelling of a user at the Web browser command interface by means of a UBA (own Fig.).

change its state to the terminated macro-state φt , e.g., when the real user closes the current browser window or decides to cancel loading the rest of the page while being blocked in φb . We may also descend to the next lower service interface in the Web client and consider the (inter-)actions between the Web browser (now representing the environment E) and the HTTP service provider at the HTTP service interface IFc (HT T P ) in the Web client (cf. Fig. 3.3). The system S is now represented by the HTTP service provider along with the Web server and the communication network. The UBA does not anymore represent the human user but it now describes the behaviour of the Web browser itself, i.e. how the HTTP requests are generated by the browser over time. While the initial and the terminated macro-states of the UBA turn out rather trivial (because the use of HTTP service at the client’s side does not require a lot of initialization effort), the active macro-state φa may become very complex due to complex structure of modern Web sites and different technologies used for their design. A typical Web site consists of a base object and multiple embedded objects. With the first HTTP request the browser retrieves the base object of the site, parses it, and issues subsequent HTTP requests to fetch all remaining objects embedded into the site. Here, a blocking situation may occur after the HTTP GET request for the base object of the site was

3.1. The Basic Concept of User Behaviour Automata

Browser user

System S

Environment E o UBA Web browser

Navigate(“www.foo.com/page1.html“)

Ii Ia

IFc(HTTP)

/page1.html GET /scheme/styles.css

Ib Ia

Web server www.foo.com

GET /page1.html

Ib Ia

67

/scheme/styles.css GET /scripts/tools.js

Ib /scripts/tools.js

Ready(“page 1“) Navigate(“www.foo.com/page2.html“)

Error(“page 2“)

Ia

GET /page2.html

Ib

404 Not Found

It

Figure 3.3.: Modelling of a user at the HTTP service interface by means of a UBA (own Fig.).

issued. The virtual user remains in the blocked macro-state φb of the UBA until the base object is loaded by the browser (so that it can be parsed to issue subsequent HTTP requests for the embedded objects).

Refinement of Macro-States In the simple scenario presented in the last section, there is only one single type of requests present in the active macro-state φa and only one type of system reactions exists in the blocked macro-state φb . In order to model more comprehensive user behaviour patterns, it may be necessary to distinguish between many different types of requests in φa and/or many different types of system reactions being possible in φb . Furthermore, there should be a facility to model the delays between subsequent requests and/or system reactions. For this reason, Wolfinger proposed to further refine the set φ = {φi , φa , φb , φt } of macro-states by means of elementary user states (or simply states) of the following three types (cf. [WoK90]): Request- or R-states: are introduced to model the generation of requests of exactly one predefined request type (e.g., sendto, connect, etc.) being possible in the modelling scenario.

68

3. A Formal Workload Description Technique

System- or S-states: are used to model the waiting of the user for some type of event. The events may be the initialization of the user in the macro-state φi , termination of the user in φt , or various system reactions indicated by the service provider to the automaton in φb . Delay- or D-states: are used to explicitly model the delays between the subsequent requests and/or events as the generation of requests in the automaton is modelled in a non time-consuming manner. The duration of delays in D-states may be a priori known to the modeller. In advanced modelling cases (for example, when the behaviour of a user is dependent on the behaviour of the underlying service provider) the delays in D-states may be interrupted by (unexpected) system reactions indicated to the automaton from the service provider. However, the steps required for the refinement of the set TΦ of transitions between macro-states using the transitions between the introduced elementary states have not been presented neither in [WoK90] nor in the later works on UBAs (cf. [Wol99], [Con06]). The UBA was introduced first on the level of macro-states and the problem of the refinement of macro-states into elementary states was not adequately addressed until now.

3.2. Generalisation of the Basic Concept of User Behaviour Automata We note that the user behaviour automaton presented until now allows to easily describe the sequences of interactions of the user with the system provider at the considered interface. Changes in the data structures, e.g. in each particular request or request attribute, can rather hardly be described. For example, if we wish to model the use of sequence numbers it would be necessary to introduce a separate state for each sequence number value. This soon makes the automaton very complex and impractical for the description of complex user behaviour. Therefore, a generalisation of the basic UBA is presented in this section, which allows to use additional variables to store context information in the automaton. Along with the guard expressions introduced in this thesis to be used in conditional state transitions, the proposed generalisation allows to significantly reduce the complexity of the automaton while using it for the specification of complex user behaviour models.

3.2. Generalisation of the Basic Concept of User Behaviour Automata 69

3.2.1. Definition of the Generalised UBA In contrast to the previous works on UBAs (cf. [WoK90, Wol99, Con06]), we introduce the automaton first on the level of elementary user states. Moreover, we extend the facilities for the description of user behaviour by introducing additional variables comprising the context of the automaton. Changes in user behaviour can henceforth be described not only by changing the state in the automaton but also by altering the values of the corresponding variables from its context. Definition (User Behaviour Automaton). A User Behavior Automaton (UBA) is an extended finite state machine given by the tuple U = (Q, C, I, O, T, q0 , c0 , F ), whereby • Q is a finite, non-empty set of elementary user states (or simply states) each modelling a particular user action. According to the possible types of user actions, there are three different types of states (Request- or R-, System or S-, and Delay- or D-states) possible in Q which are described later. • C ⊆ domain(v1 ) × domain(v2 ) × · · · × domain(vn ) is a non-empty set of contexts with vi ∈ V , where V denotes a finite, non-empty set of variables and domain(vi ) a non-empty, countable set of values - the range of vi , i ∈ N, n ∈ N. • I is a finite, non-empty set of inputs to the automaton, represented by the events signalled to the automaton from the environment E (e.g. initialization or termination of the user) or from the service provider S (e.g. different types of reactions being possible). • O is a finite, non-empty set of outputs resulting from the different user actions modelled in the states of the automaton (e.g. requests are generated in R-states, and timers are set in D-states to model the delays between subsequent requests). • T ⊆ Q × C × I × O × Q × C is a set of transitions. The transition and the output functions are integrated in a joint state transition function: δ : Q × C × I → O × Q × C. The tuple representing the state transitions may be extended by additional components to implement more sophisticated cases. For example, guard expressions may be associated with the transition to allow its triggering only when a particular condition is fulfilled. Further, transition probabilities can be used to implement probabilistic state transitions when they are included into the transition tuple.

70

3. A Formal Workload Description Technique

• q0 ∈ Q is the initial state which is always an S-state as the waiting of the initialisation event of the user is modelled in this state. • c0 ∈ C is the initial context of the automaton. • F is the set of final states represented by the S-state(s) modelling the waiting after the user termination event. In order to distinguish between different types of user behaviour in a more comprehensive manner, three types of states are further introduced in Q. The set of states Q is subdivided into three disjunct subsets of Request- (R), Delay (D), and System (S) states (so that Q = R ∪ S ∪ D, and R ∩ S = ∅, S ∩ D = ∅, D ∩ R = ∅). The subsets hold the following semantics: R is the subset of Request- or R-states for short. By entering an R-state the generation of a new request of a predefined request type associated with the state is initiated. A corresponding request type has to be assigned to every R-state and determines the set of attributes associated with the request as well (see Sec. 3.4). The generated requests along with their request attributes represent the output of the automaton in the R-state. Input to the automaton in an R-state is represented by the events indicating the completion of request generation (to be strongly differentiated from the request completion, which is system dependent in general). According to the proposal in [Wol99], the generation of requests should be modelled in a non time-consuming manner. Therefore, an R-state can be left immediately after the new request is generated by executing one of the transitions associated with it. S is the subset of System- or S-states, which are introduced to model the situations where the user cannot generate requests because he has to wait for some reaction from the service provider (e.g. when the user is blocked) or the environment (e.g. user initialisation events). Entering an S-state, a timer may be set to limit the waiting time for a specified reaction. Formally, setting the timer represents the output of the automaton when an S-state is entered. The user sojourns in the S-state until either a specified reaction from the service provider is signalled or the timer has elapsed. Both events are signalled to the automaton by means of corresponding events which represent the input to the automaton in S-states. The reactions from the service provider may be induced by local (internal) events in the service provider itself (“local behaviour” part of the workload description) or be triggered by the service primitives from service users

3.2. Generalisation of the Basic Concept of User Behaviour Automata 71 at remote Service Access Points (SAPs) (“global behaviour” part of the workload description). For example, a send operation call may block for indefinite time on the blocking socket if the socket buffer becomes full. The event signalled to the caller of the send operation when buffer space is available again is an example for local events generated by the service provider. In case the reactions from the service provider are of no interest for the particular modelling task, the subset S includes the only initial state q0 and the final states q, q ∈ F of the automaton (which are considered as termination S-states). D is the subset of Delay- or simply D-states which are used to model the periods of user inactivity between the subsequent requests and/or system reactions. Each time a D-state is entered, a corresponding output is generated by setting a timer associated with the state to model the duration of the inactivity period. The user resides in the D-state as long as the timer has not elapsed, i.e. transitions out of the D-state may be triggered only after the corresponding timer expiration event (representing the input to the D-state) is signalled to the automaton. A context c ∈ C is given by the current values of the variables v1 , . . . , vn , vi ∈ V , whereby n denotes the number and V denotes the set of context variables in the automaton. The variables are created in the global scope (memory) of the automaton in order to be accessed from any of its states. They may be used to represent different information about the behaviour of the user, e.g.: • the reached phase of request generation (described for example by the number of generated requests of a particular request type and/or in a particular state) • values of different model parameters like e.g. sequence numbers, addresses, length fields, etc. It is easy to see that such dependencies can rather hardly be represented only by means of states in the automaton. While the potential successor states are described by the possible state transitions, additional specifications have to be made in order to transform the context c of the automaton before the execution of a transition into the context c after the transition is executed. For this purpose, an expression exp (which is in general allowed to contain many statements to update the context variables) can be specified

72

3. A Formal Workload Description Technique

and associated with a transition t in order to alter the values of context variables during the execution of t. So, a transition t ∈ T between the elementary states of the UBA can be defined by the tuple t = (q, c, i, exp, o, q  , c ), where q ∈ Q denotes the current state, c ∈ C the context before executing the transition, i ∈ I an input event (different types of events are possible depending on the type of the state), o ∈ O the output generated by executing the user action(s) associated with the successor state q  , q  ∈ Q, and c ∈ C the new context after the transition execution. The new context c is obtained by applying the context update statements specified by exp to the old context c of the automaton before the transition. The procedure of executing a transition t = (q, c, i, exp, o, q  , c ) from the state q into the successor state q  comprises the following steps: 1. An event i (of the event type being explicitly waited for in the current state q) is signalled to the automaton. 2. The context update statements specified by exp are executed on the temporary copy ct of the current context c of the automaton. In case t is a conditional transition, the current context c of the automaton can such be kept unaltered until the corresponding guard expression is evaluated (so that the condition required to trigger t is proved to be fulfilled or not). 3. In case t is a conditional transition, the corresponding guard expression exp is evaluated using the variables from the temporary context ct . In case the guard expression evaluated to false the condition required to trigger t is not fulfilled and t can’t be executed (we also say it is not activated ). In case the guard expression evaluated to true the transition is activated and we can proceed with the next step of its execution. 4. The temporary context ct (which has been used to check the guard expression if t is conditional) becomes the current context c of the automaton. 5. The output associated with the successor state q  is generated (e.g. a new request is issued in an R-state, or a new timer is set in a D-state). 6. q  becomes the current state of the automaton. Thereafter, the automaton remains in the state q  until the next event explicitly waited for in q  is signalled to the automaton. The extensions proposed in this thesis (automaton’s context consisting of context variables, context update statements, and conditional transitions

3.2. Generalisation of the Basic Concept of User Behaviour Automata 73 implemented by means of guard expressions) make the UBAs concept more flexible and boost it to a very sophisticated and powerful workload description technique. Changes of user behaviour can now be described not only by means of changes of the automaton’s state in the modelling domain, but also by means of context variables whose values are altered accordingly.

3.2.2. Specification of Transitions between Elementary States We first recall that in state-oriented descriptions (finite state machines and UBAs belong to this category of description techniques) waiting after some input event is implicitly expressed through the state. The possible reactions to the awaited events are described by means of transitions from the current to the potential successor states in the automaton. When the specified event occurs and all additional conditions are fulfilled, the transition is executed simultaneously, i.e. the transition execution does not consume any time in the modelling domain. Deterministic State Transitions In the simpliest case, the current state q has only one successor state q  in the automaton. This situation is given, when the next user action (representing the reaction of the user to the event i signalled to the automaton in its current state q) can be determined in an unambiguous manner. The corresponding transition t = (q, c, i, o, q  , c ) from state q to the successor state q  does not contain any constraints and is called deterministic in this case. A UBA is further called deterministic when the transitions t ∈ T (where T is the set of transitions) between all its states q ∈ Q (where Q is the set of elementary user states) are deterministic. Probabilistic State Transitions If more than one alternative user actions are possible as a consequence of an event signalled to the current state, a non-deterministic situation takes place, i.e. the current state q of the UBA does not have only one successor state but a whole set Q of successor states Q = {q1 , q2 , . . . , qk }, k ∈ N. During the construction of the UBA the relative frequencies fqq of transitions from q to each of its successor states q  ∈ Q can be calculated (e.g. by means of traces taken as a basis for model parameterisation). Thereafter, the calculated relative transition frequencies fqq can be taken as an approximation for

74

3. A Formal Workload Description Technique

the corresponding probabilities pqq of the transition from q to each of its successor states q  ∈ Q . Therefore, the tuple representing a transition t = (q, c, i, o, q  , c ) from state q into the state q  in the UBA is extended by the associated transition probability pqq , 0.0 ≤ pqq ≤ 1.0 and becomes the form t = (q, c, i, pqq , o, q  , c ) whereby all the other components of the transition tuple retain their semantics. Formally, the introduction of the transition probabilities pqq in the transition function δ of the UBA can be explained in the following way. Recall that the transition function of the deterministic finite automaton is defined as δ : Q × I → Q and can be represented as a membership function δ : Q × I × Q → {0, 1} so that δ(q, i, q  ) = 1 if q  ∈ δ(q, i) and δ(q, i, q  ) = 0 if q  ∈ / δ(q, i). In case of a non-deterministic automaton the definition of the transition function is extended to δ : Q × I → P (Q) where P (Q) is the power set of Q due to the fact that many states (and not only one state) may act as a successor of the given state. Further, the transition function can be understood to be a square matrix P with matrix entries [Pi ]qq = δ(q, i, q  ) which are zero or one, indicating whether a transition from q into q  is allowed by the automaton when an event i is triggered. Finally, in order to introduce probabilistic state transitions in our UBA, the state transition function can be understood as a stochastic matrix P , so that the probability of a transition from state q into the state q  after consuming the input i is given by [Pi ]qq . A state change from some state to  any state must occur with probability one, of course, and so one must have q [Pi ]qq  = 1 for all input events i, i ∈ I and all user states q, q ∈ Q. It should be noted that a deterministic state transition from state q into the state q  can be seen as a special case of a probabilistic state transition  between the same states in case [Pi ]qq  = 1 (and z [Pi ]qz = 0, ∀z = q  ). Conditional State Transitions Situations where it may be necessary that one or more additional conditions must be fulfilled to trigger the next user action occur very often in workload modelling. Such conditions may refer for example to the context dependencies which persist among the user actions and can hardly be modelled by using only the elementary user states. For example, if we wish to model the dependency between the next user state and the current value of a particular parameter (e.g. a packet sequence number) it would be necessary to introduce a separate state for each sequence number value. This soon makes the automaton very complex and almost impracticable to describe the user behaviour.

3.2. Generalisation of the Basic Concept of User Behaviour Automata 75 For this reason, we propose to specify such advanced context dependencies by means of guard expressions using the set V of variables from the context c of the automaton. Once an additional condition has to be fulfilled to trigger the next user action, a new guard expression exp is created in the set EXP of context expressions and is associated with the corresponding state transition t, which is called conditional in this case. Note that the statements to update the automaton’s context c may also have already been included into the set of context expressions EXP associated with t. A conditional transition t = (q, c, i, exp, o, q  , c ) from the state q into the state q  can be triggered after an event i is signalled to q only if the associated guard condition exp ∈ EXP evaluates to true for the current context c of the automaton. In the latter case the transition t is also called activated. By the execution of t, the current context c of the automaton is updated with the new context c , then the output o is produced, and, finally, q  becomes the current state of the automaton. In order to complete the proposed extensions, we will provide the detailed specification of the syntax for the context expressions and their following components later in this chapter (cf. Sec. 3.6): • declarations of context variables and constants, • supported internal operations and functions, • syntax rules to build valid context expressions (including context update statements and guard expressions associated with conditional state transitions). Trace-Driven State Transitions As an alternative, the visiting order of states in the automaton may be specified according to some arbitrary sequence predefined by the experimenter. For example, a trace containing a specific sequence of requests can be used for this purpose. Traces of application or service user behaviour can be obtained by means of traffic captures at the interface(s) chosen according to the objectives of the particular modelling study. The obtained raw measurement data can hardly be used directly, i.e. without any modifications, and thus preprocessing and mining techniques have to be employed on it, to specify the transitions in the automaton. Therefore, this additional procedure has to be accomplished by the experimenter himself in order to extract the required request sequence from the raw trace data.

76

3. A Formal Workload Description Technique

3.2.3. Aggregation of User States into Macro-States As already mentioned in Sec. 3.1, the problem of the refinement of macrostates into the elementary user states has not been adequately addressed until now. Furthermore, the classification of possible user actions into only four categories (initialisation, active, blocked, terminated) represented by the corresponding four macro-states (φi , φa , φb , and φt ) may be limiting the modeller’s scope too strongly. A survey of the different workload modelling studies prepared in this thesis (cf. Chapter 4) revealed that in many cases other criteria played an important role by the decision to group the user states into more abstract model entities, especially in the part of user behaviour represented by the existing active macro-state φa . We give just a few examples reconfirming this point: • In video traffic workload models (cf. [SRS03]), states representing the generation of video frames are very often grouped according to the motion intensity in the scene resulting in different average frame lengths generated by the video coder. • In modelling the behaviour of IPTV users (cf. [QGL09, RSR09, GJR11, AWL12]), a number of states describing a particular type of user activity (e.g., viewing, target switching, zapping, forwarding or rewinding) are often put together in more abstract model units. • Different states of a TCP connection (among others, congestion avoidance, slow start, fast recovery, time-out) may be assigned to the more abstract macro-states, e.g. when the observed TCP source is considered to be application-limited or network-limited (cf. [BBM10]). It should be noted, that the user actions exemplarily listed above would be modelled by the elementary states all belonging to the same and singular active macro-state φa , so that this macro-state would become extremely oversized and impractical. Therefore, in this thesis we propose to look at the macro-states from a slightly different perspective yet paying more attention to the context dependencies between the user actions modelled by the corresponding elementary states. According to this strategy, states modelling user actions with stronger context dependencies are the likely candidates to be grouped together and to be included in the same macro-state of the UBA. In order to provide for even more flexible workload descriptions, we extend the set Φ of macro-states and allow the modeller to define k, k ∈ N macrostates φ1 , φ2 , . . . , φk representing the groups of elementary states with strong

3.3. XML Schema Definition for the UBA Components

77

context dependencies within each group in the given modelling scenario. Each of the macro-states φk may bear a specific semantics (e.g., generation of video frames with a particular motion intensity, phases of a TCP connection, etc.) For each elementary state q, q ∈ Q the corresponding macro-state of the UBA can be specified by means of the aggregation function a(Q) : Q → Φ. Note that a is a function, so that every state q ∈ Q can be associated at most with one macro-state φ ∈ Φ. Finally we emphasize that the proposed extension of the set of macro-states is completely compatible with the original definition of the UBA comprising the set of only four predefined macro-states for modelling the initialisation (φi ), active (φa ), blocked (φb ), and terminated (φt ) phases of the user first introduced by Wolfinger in [WoK90]. The previous macro-states φi and φt have been rather trivial since φi contained the only initialisation S-state Si and φt contained the only termination S-state St . In our extended version of the UBA in this thesis, the initialisation state is given by the elementary state q0 , where a(q0 ) = φa , and there are potentially many termination states possible which are given by the set F of final states, where a(q) = φt , ∀q ∈ F (cf. UBA definition in section 3.2.1).

3.3. XML Schema Definition for the UBA Components In the previous two sections an algebraic notion has been used to define the UBA and its components. In case the workload model represented by the UBAs should be used for load generation, a corresponding description of the UBA being suitable for the execution is required. To provide for such executable representation of the UBA in this thesis we will use XML techniques to describe the components of the UBA. XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is defined in the XML 1.0 Specification [XML08] produced by the World Wide Web Consortium (W3C), and several other related specifications (e.g. XML Schema language [XSD12] also referred to as XML Schema Definition (XSD)) which all are open and available for free. Meanwhile, XML is widely used for the representation of arbitrary data structures, for example in Web services. In particular, the following complex components of the UBA have to be described and to be included into the XML document representing the UBA: • the target service interface IF at which the UBA is defined

78

3. A Formal Workload Description Technique

• possible types of abstract requests with their associated request attributes • possible types of system reactions (if taken into account by the workload model) along with their corresponding attributes • states of the automaton (differentiated into R-, S-, and D-states) • transitions (conditional and/or stochastic) between the states • possible aggregation of states into macro states (if specified by the experimenter). Furthermore, the XML description must include specifications for the following simple components of the UBA: • names, descriptions, and values of request attributes in R-states • delays between the subsequent requests and/or reactions in D-states • names, descriptions, and values of attributes for system reactions in S-states (if present in the UBA) • probabilities for the stochastic state transitions • context expressions for the conditional state transitions • context update statements (which can be specified in every type of states in the automaton). An XML document containing the description of the UBA is called wellformed if it doesn’t violate the general syntax rules given by the XML specification [XML08], like: • the document must begin with the XML declaration • it must have one unique root element • start-tags must have matching end-tags • elements are case sensitive • all elements must be closed (by the corresponding end-tag) • all elements must be properly nested • all attribute values must be quoted

3.3. XML Schema Definition for the UBA Components

79

• entities must be used for special characters. However, even if an XML document representing the UBA is well-formed it can still contain errors which may have serious consequences for the following generation of load. For example, IP packets of length 1500 KByte instead of intended 1500 Byte can erroneously be produced due to the missing specifications of packet length units in the UBA document. For this reason, XML Schema language specified in [XSD12] is used in this thesis in order to describe the valid structure of the UBA and the valid content of its simple and complex components. XML Schema provides a very comprehensive datatyping system which allows • to describe the structure of the UBA, e.g. by means of detailed constraints on the set of components that may be used in the UBA document, attributes which may be applied to them, the order in which they may appear, the allowable parent/child relationships, and • to validate the correctness of data, e.g. by specifications of different data facets (restrictions on data), data patterns (data formats) as well as default and fixed values for UBA parameters. Further, the XSD defines the concepts of simple and complex elements which are used in this thesis to describe the simple and complex components of the UBA, correspondingly.

3.3.1. XSD Simple Elements The simple components of the UBA can be described using the concept of simple elements in XSD. A simple element is defined in XSD as an XML element that contains only text and is not allowed to contain any other (child) elements or attributes. “Text” means in this context the textual representation of a specific value which may be one of the data types included in XSD (e.g. numeric, boolean, string, date, cf. Fig. 3.4), or it can be a custom type defined by the user (user-defined derived data type). The set of built-in data types provided by XSD is meanwhile quite comprehensive and sufficient to describe the simple components of the UBA like e.g. names and values of request attributes, delays between requests or probabilities for the stochastic state transitions. It should be noted, that the UBA may also contain binary data, e.g. the values of request attributes referencing some predefined data blocks or bit patterns to be transmitted. Therefore, it is very important that

80

3. A Formal Workload Description Technique

Figure 3.4.: The classification of data types supported by XML Schema as defined by [XSD12] (Fig. from [XSD12]).

the XML Schema provides the corresponding data types to express binaryformatted data, e.g. hexBinary (for hexadecimal-encoded binary data) and base64Binary (for Base64-encoded binary data, cf. Fig. 3.4). The set of XSD built-in data types can be extended by means of userdefined derived data types. A derived data type can be defined, among others, by using constraining facets which serve to restrict the value space of the derived data type to a subset of the value space of its base type (e.g. it is possible to limit the range or the length of values or to require the data to match a specific pattern). For example, the data types needed to represent IPv4 and Medium Access Control (MAC) addresses as numeric values are per default available in XML Schema as built-in data types unsignedInt and unsignedLong, correspondingly. But there are no built-in data types available to directly represent IPv4 addresses in the common “dotted decimal notation”, IPv6 addresses in the convenient “colon-hexadecimal” format or

3.3. XML Schema Definition for the UBA Components

81

MAC addresses in the canonical format. In such cases, the corresponding derived data types have been prepared and included into the UBA schema document to be used in the workload model files (UBA files).

3.3.2. XSD Complex Elements As already stated in Sec. 3.3.1, simple components of the UBA can be described by means of XSD simple elements which may be one of the built-in or user-defined derived data types. In order to describe complex components of the UBA (e.g. a request type containing a set of request attributes with their corresponding names, descriptions, and value specifications) simple elements cannot be used as they are not allowed to contain other (child) XML elements or attributes. XSD complex elements provide a solution to this issue. A complex element is defined in XSD as an XML element that contains other elements and/or attributes. Especially, it also may contain another complex elements as child elements. For example, a request attribute can be described by means of a complex element requestAttribute consisting of two simple child elements name and description along with the complex element valueSpec describing the specification of the attribute’s value (e.g. according to a statistical distribution or to a trace). Further, different indicators can be used to specify the valid order, number of occurrence, and grouping of child elements within the complex element. Using these techniques, request attributes can be inserted into the list of request attributes and associated with the abstract request type they are belonging to. Each abstract request type is described by means of a complex type and inserted into the list of request types associated with the UBA. In this way, a UBA can be completely described by means of one single complex element.

3.3.3. Introduction to the UBA Schema Let’s now introduce a first example of a UBA schema and its use in a UBA model file. UBA schema starts with the element which is mandatory root element of every XML Schema (cf. Fig. 3.5, lines 2–5) and may contain some additional attributes. The attribute xmlns:xs="http: //www.w3.org/2001/XMLSchema" indicates that the elements and data types used in the schema to define UBA components come from the "http://www. w3.org/2001/XMLSchema" namespace (where all basic XSD datatypes are defined). It also specifies that the elements and data types that come from

82

3. A Formal Workload Description Technique

1 2 6

7

8

9

11

13

15

16

17 18

Figure 3.5.: UBA schema definition file (own Fig.).

the "http://www.w3.org/2001/XMLSchema" namespace should be prefixed with xs: (this is the reason for the root element to be prefixed). The attribute targetNamespace="http://www.informatik.uni-hamburg. de/TKRN/UniLoG" specifies that the elements defined by the UBA schema do belong to the "http://www.informatik.uni-hamburg.de/TKRN/UniLoG" namespace. Further, the xmlns attribute indicates that the namespace "http://www.informatik.uni-hamburg.de/TKRN/UniLoG" is the default namespace which will be used to resolve unprefixed element references. Finally, the attribute elementFormDefault="qualified" specifies that references from UBA instance documents to any elements from this UBA schema must be prefixed by a namespace qualifier (in order to avoid potential ambiguities with identically named data types from other XML schemas). The complete UBA is defined by means of a uba complex type (cf. line 6) which contains the list of the possible types of abstract requests (in the child element requestTypes described by the user-defined complex type ListOfRequestTypes), the list of relevant system reactions (in the element reactionTypes described by the user-defined complex type ListOfReactionTypes), and the list of UBA states (in the ubaStates element described by the user-defined complex type ListOfUBAStates, cf. lines 7–16). The xs:sequence indicator (cf. line 8) specifies the strict order of

3.3. XML Schema Definition for the UBA Components

83

the child elements requestTypes, reactionTypes, and ubaStates within a uba element. The maxOccurs attribute (lines 9–14) specifies the maximum number of occurrences of the corresponding element (e.g. the list ubaStates of the UBA states) within its parent element (i.e., within the UBA being described). The attribute minOccurs (line 14) specifies the minimum number of element occurrences and can therefore be used e.g. to declare the list of user states in a UBA to be mandatory (minOccurs="1"). The user-defined complex types used for the description of possible types of abstract requests and system reactions (including their corresponding attributes), user states and macro states, state transitions, and specifications of values for different UBA parameters will be described in detail later in this thesis. The main reason for the development of the UBA schema is the ability to check the conformity of a workload model description elaborated by an experimenter and represented by a UBA file to the syntactical and grammatical rules specified by the UBA schema for valid UBA components. In the context of the XML specification, the term “validation” is used to address the procedure of these conformity checks. However, “validation” means here only that the elements and attributes being used in the given UBA model file are checked to be declared and to follow the grammatical and syntactical rules specified for them in the referenced UBA schema (especially, it does not mean that the workload model itself is being validated). The mentioned schema validation procedure can be accomplished by any XML parser application implementing the schema validation facility, e.g. Xerces [Xer13]. The following example illustrates the use of the UBA schema in a UBA model file (cf. Fig. 3.6). The attribute xmlns of the root uba element specifies the default namespace (cf. line 3) which tells the schema-validator that all the elements used in the model file are declared in the "http: //www.informatik.uni-hamburg.de/TKRN/UniLoG" namespace. In lines 4 and 5 the attribute schemaLocation declared in the namespace "http: //www.w3.org/2001/XMLSchema-instance" is used to inform the schemavalidator that the definitions of the complex elements declared in the namespace "http://www.informatik.uni-hamburg.de/TKRN/UniLoG" to describe the valid UBA components can be found in the schema file located at "http://www.informatik.uni-hamburg.de/TKRN/UBA.xsd". In a UBA model file, the UBA is represented by the root element which contains the list of possible types of abstract requests in the element, the list of types of relevant system reactions in the element, and the list of user states in the element (cf. lines 7–15). The corresponding complex types are introduced in the next section.

84

3. A Formal Workload Description Technique

1 2 7

8 ... 9

10

11 ... 12

13

14 ... 15

16

Figure 3.6.: An example of a UBA model file including a reference to the UBA schema (own Fig.).

3.4. Description of Abstract Requests and System Reactions As already stated in Sec. 2.3, the workload offered by the user(s) to the service system S at the service interface IF is defined as a sequence of request arrival events (tx , rx ) where tx denotes the arrival time of the request rx at IF (x ∈ N). Each request rx is characterised by the corresponding abstract request type from the set RT = {RT1 , RT2 , . . . , RTn }, n ∈ N of abstract request types being possible in the modelling domain. Each of the abstract request types RTi , RTi ∈ RT, i ∈ N, 1 ≤ i ≤ n, is itself represented by a tuple RTi = (name(RTi ), idRef(RTi ), Ai ) where name(RTi ) is the unique name of the abstract request type in the modelling domain, idRef(RTi ) is the reference to the corresponding real request type at the target service interface (hence, it denotes the “semantics” of the abstract request type at the real interface), and Ai = {ai,1 , ai,2 , . . . , ai,j }, j ∈ N, j = j(i) is the set of associated request attributes characterising the request type RTi and, therefore, regarded by the experimenter to be relevant for this particular UBA. In general, every abstract request type RTi can be described by a different set Ai of request attributes as each of the real service primitives may require different parameters for the corresponding service and/or Application Programming Interface (API) calls.

3.4. Description of Abstract Requests and System Reactions

85

Each of the request attributes ai,j is represented by a tuple ai,j = (name(ai,j ), idRef(ai,j ), datatype(ai,j ), valueSpec(ai,j )) where name(ai,j ) specifies the attribute’s name which must be unique within the request type RTi , idRef(ai,j ) is the reference to the corresponding parameter of the service or API call at the chosen service interface (“semantics” of ai,j at the real service interface), datatype(ai,j ) is the data type used to represent the attribute’s values, and valueSpec(ai,j ) is the object used to specify how the values of the attribute are obtained when requests of type RTi are to be generated (e.g. as a constant value or according to a trace or to a statistical distribution). The decisions regarding the relevant types of requests and their attributes (as well as types of relevant system reactions and attributes) are made by the experimenter when the workload model is being elaborated. Considering the behaviour of service users at a specific well-defined service interface (e.g. HTTP service interface), the experimenter can take the underlying service specification as an orientation to determine the relevant types of requests, system reactions and their corresponding attributes for a given modelling task. Service primitives declared by the service specification represent the possible candidates for the abstract types of requests and system reactions in the UBA model. Apparently, a completely different level of detail and abstraction may be required for the description of user behaviour in every particular modelling study, dependent primarily on its specific objectives (cf. Sec. 2.2.5 for different possible types of load experiments). So, the types of abstract requests, system reactions, and their attributes are expected to differ strongly among various workload models even for users of the same service. Moreover, based on the abstract specification in the UBA, real requests are to be generated at the target service interface by means of system (or API) calls to the corresponding service (or application) primitives. So, the description technique should provide support in cases when abstract requests are to be transformed from their abstract representation in the modelling domain (given by the UBA) into calls to real service primitives and in cases when real system events which occurred at the target interface are to be transformed into abstract system reactions supported by the UBA model.

3.4.1. Relevant Abstract Request Types In this section we illustrate how the relevant types of abstract requests can be identified e.g. for the modelling of Web workloads as presented in [KoW11].

86

3. A Formal Workload Description Technique

HTTP/1.1 protocol specification (cf. [RFC2616]) defines nearly a dozen of different types of request methods for the interaction with the Web server. These request methods follow the general request/reply schema, while the methods GET and POST are the most frequently used request methods in current Web applications. Thus, the experimenter can decide to provide many different abstract request types for each HTTP request method or he/she can prefer to define only one single abstract request type (e.g., HttpRequest) addressing HTTP requests in general and to concentrate rather on the identification of the request parameters which affect the workload induced in the Web server and/or network significantly (thus characterising the resource demands to serve the offered requests). On one hand, the experimenter can omit some parameters of real HTTP request methods (like e.g. URL parameters or HTTP header fields) which he may consider as not necessarily relevant for the modelling domain. And, on the other hand, the experimenter can introduce some additional workload parameters (like e.g. user think time, page popularity or temporal locality of requests) which are frequently used in Web workload modelling studies (cf. [BaC98, CCG04, WAW05, KRL08, OSPG09]). For example, in order to analyse the utilization level of some known Web server, the experimenter can decide to include only one single attribute inducedServerLoad into the abstract request type HttpRequest. In case the experimenter plans to test a specific function of a Web service or application deployed to that server, the attribute serverName containing the full URL of the referenced object is most likely to be added to HttpRequest. In order to produce various background loads for the network in terms of Web traffic of different structure and intensity, further attributes like numberOfEmbeddedObjects or replySize are likely to be included into HttpRequest.

3.4.2. Semantics of Abstract Request Types As already stated in the beginning of Sec. 3.4 the workload description technique to be elaborated in this thesis should provide support for the transformation of requests from their abstract representation (by means of complex XML types in the UBA) into the corresponding system or API calls at the specified target service interface. Care should be taken in the situations when some of the service primitives and/or their parameters considered by the experimenter as not relevant for the modelling domain are required for the generation of requests at the target service interface. For example, while modelling a user of a TCP service, the experimenter

3.4. Description of Abstract Requests and System Reactions

87

may decide to omit the TCP.CONNECT service primitive in the set RT of abstract request types in the UBA (e.g. because the workload induced in the network by the requests of this type is expected to be non-significant). The experimenter may be rather interested in the size of data blocks transmitted by the use of subsequent TCP.SEND requests. But in the real service domain, the data transmission cannot be initiated before an explicit TCP connection is successfully established. The same issue arises during the experimenter’s decisions on the relevant request attributes. For example, in a particular abstract request type (e.g. TCP.SEND), the experimenter may decide to omit an attribute (e.g. TCP.DESTPORT) which is (from the experimenter’s point of view) unnecessary in the UBA but will be required by the TCP socket send() call to address the corresponding receiver’s port number. Therefore, a mechanism to map the abstract request types and their attributes from the modelling domain onto the corresponding service or API calls and their parameters is indispensable. In particular, service primitives and parameters, which are declared as mandatory in the service specification but have been omitted by the experimenter in the modelling domain, are to be detected and appropriate realistic values have to be provided for them (e.g. default values specified by the experimenter). For the implementation of the load-generating software components aimed at later in this thesis it will be essential to be able to recognize the meaning (semantics) of every particular abstract request, system reaction as well as of each of their associated attributes. Moreover, the ability to check whether all obligatory types of requests and system reactions as well as all of their obligatory attributes have been included in the UBA or not may be required by the experimenter. For the solution of the issues confronted above, a unique identifier (ID) is generated and assigned to each of the service primitives and each of their corresponding parameters supported at the chosen target service interface (e.g. at the TCP service interface). The unique identifier can be represented e.g. by means of an integer constant with a self-describing symbolic name (cf. Fig. 3.7, right part). For example, the TCP socket connect call required to establish a new TCP connection to a specified receiver is given the identifier TCP.CONNECT = 201 and the TCP socket send call to transmit some data on the specified and already connected socket is given the identifier TCP.SEND = 202 (where TCP.CONNECT and TCP.SEND are the unique symbolic names and 201 and 202 are the corresponding unique values of the identifier). Henceforth, every time a new abstract request type is defined in the UBA, the experimenter can specify its semantics by means of a reference idRef to the corresponding service primitive (e.g. idRef = TCP.SEND) from the list of the service primitives supported at the chosen target interface (i.e. TCP

88

3. A Formal Workload Description Technique TCP socket interface (list of TCP service primitives)

Modelling Domain (UBA) (abstract request types) SendLargeDataBlock ( idRef := TCP.SEND )

TCP.CONNECT = 201 TCP.SEND = 202

SendSmallDataBlock ( idRef := TCP.SEND ) …

TCP.CLOSE = 203 …

Figure 3.7.: Mapping of the abstract request types from the modelling domain onto the system calls at the TCP socket interface (own Fig.).

service interface in the example). It should be noted, that many conceptually different abstract request types (i.e. request types with different semantics in the modelling domain, e.g. SendLargeDataBlock and SendSmallDataBlock, cf. Fig. 3.7) may result into calls to the same service primitives at the target service interface (so that their idRef will point to the same service primitive, e.g. TCP.SEND, cf. Fig. 3.7, left part). Without loss of generality, we assume that each particular abstract request type in the UBA can be associated with at most one service primitive from the target service interface. More complex abstract request types which induce calls to more than one service primitive at the target interface can be implemented e.g. by means of a superposition of a number of R-states, each of them is responsible for modelling the calls to the single service primitives compounded in the complex request type. The mechanism presented above can be applied to implement • the mapping between the attributes of abstract requests and the parameters of real service primitives associated with the particular abstract request type the attributes belong to, • the transformation of different real service events signalled at the target service interface into the abstract system reactions supported in the UBA model. In case the target service interface is not implemented at the time of load specification (e.g. in case when the designed but not yet implemented service architecture is to be tested under various workload scenarios) the mapping of the abstract request types and their associated attributes onto the (not

3.4. Description of Abstract Requests and System Reactions

89

yet available) types of service primitives can be specified at a later point of time (after the specification of the service primitives and their parameters becomes available). Finally, we emphasize that the semantics of every abstract request type, every abstract system reaction, and each of their associated attributes must be specified at least by the time of the generation of requests (in terms of calls to corresponding service primitives) at the target service interface.

3.4.3. Definition of Abstract Request Types in the UBA Schema In this subsection we present the fragment of the UBA schema containing the specification of the list of possible abstract request types in a UBA (cf. Fig. 3.8). The list of possible request types is defined by means of a complex type ListOfRequestTypes which is defined as a sequence of an unlimited number (maxOccurs="unbounded") of UbaRequestType elements each describing a particular abstract request type. Consider that the list of request types in the UBA may be empty (minOccurs="0"). Further, each of the abstract request types is identified by its name (specified in the element) which must be unique within the UBA. The semantics of the request type at the target service interface (e.g. the identifier of the corresponding service primitive or API call) can be specified in the idRef element at the time of UBA specification. Note that by the time of the generation of requests at the real service interface, the semantics field must be specified for each abstract request type. A short description of the request type can be supplied in the element. The request type definition is accomplished by the list requestAttributes of the associated attributes which are represented by the complex type RequestAttribute and characterise the requests of this particular type. Within the complex type RequestAttribute, the element specifies the attribute’s name which must be unique within the request type. Similar to the specification of the semantics of a request type, a reference to the corresponding particular parameter of the real service or API call (which is referenced by the idRef of the abstract request type the attribute belongs to) can be specified in the element of the complex type RequestAttribute. The data type required for the representation of the attribute’s values can be chosen from the list of supported data types presented in Fig. 3.4 and is specified by the element. Further, the element of complex type ValueSpec (which will be defined in Sec. 3.5) is used to specify how the values of every particular attribute are

90

3. A Formal Workload Description Technique



3 SendDataBlock 4 TCP.SEND 5 Send data on a connected TCP socket 6 8 Data Length 9 TCP.SEND.LEN 10 unsignedInt 11 ...... 12 Length of the data to be transmitted (in Bytes) 13

14 22







”Expression” – the value of the parameter is calculated according to a context expression specified in the expression element. The specified expressions use the context variables defined in the UBA and adhere to the syntax described in detail in Sec. 3.6.

3.6. Syntax Rules for Context Expressions Context expressions may be specified by the experimenter in different parts of a UBA, e.g.: • in all types of UBA states in order to initialise, update or change the values of context variables (the corresponding expressions are also called context update statements), • in guards (i.e. in conditions formulated by means of context expressions) for conditional state transitions, • in value specification for different UBA parameters (e.g. attributes of abstract requests or system reactions, delays between requests and/or system reactions, and probabilities for statistical state transitions). Therefore, a well-defined syntax for context expression is required and must be supported by the software components being responsible for the generation of requests according to the specifications in the UBA at the target service interface. In this thesis we use an expression syntax which is similar to the wellknown syntax of mathematical expressions with the few special points to be considered: • In variable definitions, the name of a variable must begin with a letter (and digits may be used in the rest of the variable’s name). If a variable is used in an expression, but that variable does not exist, it is considered zero. If it does exist then its value is used instead. 1 2 3 4

packetsPerFrame = 12; numberOfFrames = 0; ... numberOfFrames = numberOfFrames + 1; numberOfPackets = packetsPerFrame * numberOfFrames;

3.7. Specification of Complex User Environments

95

• Each expression must end with a semicolon. In case multiple expressions are included in the same expression string, the end of each expression must be marked by a corresponding semicolon. • The asterisk ’*’ must be used as the multiplication operator and can not be omitted. • Expressions may have whitespace characters and comments. Whitespace characters such as newlines, linefeeds, carriage returns, spaces, and tabs are ignored. Comments begin with the less than-sign < and end with the greater than-sign >. Comments may be nested as well. • Functions from a set of predefined functions (see Appendix A) may be used in expressions. Some functions may also take reference parameters. These parameters represent references to other variables and the function call may alter their values. • Expressions may be nested with parenthesis. In the following fragment (excerpted from a UBA model for an H.264 video source), the variables sizeOfGOP and numberOfFrames are defined in the initialisation state of the UBA and denote the number of frames in a GOP and the overall number of generated frames, respectively. After the generation of the next request in one of the R-states of the UBA, the number of generated frames is incremented (in line 5). Finally, in line 8 the functions mod(v,d) (defined as remainder of v/d) and equal(a,b) (defined to return 1.0 if a is equal to b, and 0.0 otherwise) are used to determine whether the end of the GOP was reached or not. 1 2 3 4 5 6 7 8

numberOfFrames = 0; sizeOfGOP = 6; ...

numberOfFrames = numberOfFrames + 1; ...

bEndOfGOP = equal(mod(numberOfFrames, sizeOfGOP), 0);

3.7. Specification of Complex User Environments As indicated in Sec. 2.3 the environment E must not necessarily consist of only one single virtual user described by means of a UBA. In general case, the environment can consist of many (and also different types of ) virtual

96

3. A Formal Workload Description Technique

users each described by a corresponding UBA and modelling the behaviour of one or many real users at the target service interface IF . In the latter case we call the environment E complex. The exact decision which and how many real service users are to be aggregated into the same UBAs while developing a model of the environment E must be made by the experimenter. Let U BA(E) = {U BA1 , U BA2 , . . . , U BAn }, n ∈ N, be the set of UBAs each representing a different type of virtual users at the target service interface (e.g. different types of Web service users like users reading pages, users making orders, users sending messages, etc.). The complex environment E can then be represented as a set of virtual users E = {V U1 , V U2 , . . . , V Uk }, k ∈ N, where each virtual user V Uk = V Uk (tinit , tterm , uba) is initialised (i.e. created in the environment) at the time tinit and the model uba ∈ U BA(E) which represents its behaviour is executed at least until V Uk is terminated by the time tterm 1 . Using the presented model of the complex environment many different comprehensive load testing scenarios and profiles (cf. Sec. 2.2.5) can be specified in a very flexible manner, for example: • Virtual users of the same type (i.e. represented by the same uba ∈ U BA(E)) are used in the experiment (starting at tstart and terminating at tf inish ) and the number k of virtual users is kept constant throughout the test: E = {V U1 (tstart , tf inish , uba), V U2 (tstart , tf inish , uba), . . . , V Uk (tstart , tf inish , uba)}. • Gradually growing number of users of the same type (e.g. from 100 to 350 users with step of 10 users every 25 seconds, so that 25 steps are needed to reach 350 users): E = {V U0 (tstart , tf inish , uba(100)), V U1 (tstart + 25 ∗ 1, tf inish , uba(10)), V U2 (tstart + 25 ∗ 2, tf inish , uba(10)), . . . , V U25 (tstart + 25 ∗ 25, tf inish , uba(10))} where uba(100) and uba(10) represent the initial number of 100 and the incremental number of 10 users, correspondingly. • Periodic loads composed e.g. of low (100 users represented by uba(100)), middle (300 users represented by uba(300)), and high (1000 users represented by uba(1000)) load periods alternating every 60 sec throughout the test of 540 sec duration: E = {V U0 (tstart , tstart + 60, uba(100)), V U1 (tstart + 60, tstart + 120, uba(300)), V U2 (tstart + 120, tstart + 180, uba(1000)), . . . , V U8 (tstart + 480, tstart + 540, uba(1000))}. Such al1 Virtual

user V Uk may terminate earlier than tterm in case a termination state St exists in uba and the user termination time specified in St precedes tterm

3.7. Specification of Complex User Environments

97

ternating or periodic loads are recommended for endurance testing, cf. Sec. 2.2.5. The description of the environment E presented above assumes that each of the constituent UBAs (U BA1 , U BA2 , . . ., U BAn ) is defined at the same target service interface IF (e.g. at the TCP service interface). An extension of the environment model which would allow one or many of the constituent UBAs to represent users at some other interface IF  (different than IF and located above IF in the service hierarchy, e.g. an HTTP service interface) is possible in the following ways: • By generalising the definition of the workload offered at the interface IF to the workload offered at many different service interfaces IF1 , IF2 , . . . , IFk which is represented by the superposition of k corresponding UBAs each describing the behaviour of service users at the service interface IFi , i = 1, 2, . . . , k, correspondingly. • In case the process of workload transformation (cf. [Hec11]) is wellunderstood by the experimenter, developing of analytical and/or simulative components may be contemplated to model the process of transformation of the service requests submitted at the interface IF (and referred to as primary load ) into requests at some lower interface IF  in the service hierarchy (referred to as secondary load, correspondingly, cf. [Hec11]). Such components can be involved to model the aggregate workload at the interface IF  (where, in terms of load transformation, secondary load is offered) also in case when one of the participating UBAs is defined for the interface IF (where primary load is offered). For the description of different types of load transformations and their applications we refer the interested reader to the recent dissertation on this topic [Hec11]. In the following Chapter 4 we will present the application of the proposed UniLoG approach to the development of concrete UBA models for workload and traffic from different types of applications and services in IP-based networks. We remark, that all UBA models presented in Chapter 4 are constructed using the LoadSpec tool for workload modelling and specification developed by the author of this thesis. Further details and concrete use of the workload specification technique introduced in this chapter will be presented during the elaboration of the concrete models for voice, video, and Web traffic sources in Chapter 4.

4. Examples of Load Models for Different Traffic Sources Models for different sources of network traffic are very often needed by researchers and telecommunications engineers e.g. for performance analysis studies of different network components. For simulation studies, original data traces (e.g. of voice, video, or data traffic) may indeed provide a very realistic source of loads, but their application is often restricted e.g. in long simulation runs when the repetition of the limited trace should be avoided, or in analytical studies which require a mathematical description of the traffic. In such cases artificial models for the generation of required network traffic is preferred. In order to describe models for workloads offered at different network and communication service interfaces, the LoadSpec tool has been developed by the author of this thesis. Its application for the specification of workloads from different voice, video, and Web traffic sources will be presented in this section.

4.1. Models for Speech Traffic Sources A survey of the current literature on speech traffic modelling reveals that a large number of simulative and analytical studies (cf. [SrW86, Den95, MSS05, PEA05, HGB06]) use an ON/OFF model with exponentially distributed lengths of ON and OFF phases. In these studies we can also find the parameters of the corresponding distributions which have been obtained for different types of human communication (e.g., classical human speech, reading, conversation, etc.) and different languages. Well-known network simulation tools like OPNET or ns-2 also use ON and OFF phases to model the output of the G.711 codec. In order to take into account the important characteristics of the compressed digitized voice as delivered from different types of voice codecs being currently in use (e.g. G.711, G.729.1, G.723.1, Internet Low Bit Rate Codec (iLBC), GSM Adaptive Multi-Rate (AMR), and iSAC, cf. [HHCW10] for the comparison of different codecs), a set of more sophisticated models has been proposed in later works [HGB06, MBM09]. © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_4

100

4. Examples of Load Models for Different Traffic Sources Codec

Type

Packet size

Period

G.711

Control Speech Speech and Control

4 bytes 172 bytes 176 bytes

30 s 20 ms 3s

G.729.1

Control Speech

5 bytes 38 bytes

1s 20 ms

Table 4.1.: Packet Types in G.711 and G.729.1 Codecs (Tab. from [MBM09]).

The proposed source models allow to capture the autocorrelation function (ACF) of consecutive packet sizes and the queueing behaviour of the original packet traces when many voice streams are fed to a single server queue. In the following sections, we present descriptions of models for voice traffic resulting from different types of voice codecs commonly used in telecommunication systems.

4.1.1. Voice Codecs with Constant Bit Rate CBR codecs send a bit stream of constant rate which is independent of the voice input. The coder’s output is, therefore, highly regular. For example, the International Telecommunication Union (ITU) G.711 codec is basically applied in digital telephony and uses Pulse Code Modulation (PCM) with a sampling rate of 8 kHz and 8 bit per sample so that a data rate of 64 kbit/s is required to transmit the raw voice data stream [G.711]. The codec defined in the ITU G.729.1 standard [G.729.1] was also designed for voice communication and extends the G.729 standard by offering different constant bit rates from 8 to 32 kbit/s in steps of 2 kbit/s. The additional header overhead during the transmission of codecs output is dependent on the type of packets and protocols used in the particular telecommunication service or VoIP application (cf. [Sie09]). Among others, RTP entailing a header overhead of at least 12 bytes can be used for real-time transmission of voice packets in connection with transport protocols UDP, Datagram Congestion Control Protocol (DCCP), or also TCP. In order to analyse the behaviour of the G.711 and the G.729.1 codec the raw output data streams have been analysed e.g. in [MBM09] using the corresponding codec implementations in CounterPath’s XLite [XLite] and Skype’s [Skype] softphone applications. A detailed description of the packet types including their sizes and the periods at which they are sent is given in

4.1. Models for Speech Traffic Sources

101

Table 4.1. Both codecs send periodic control information in addition to the main voice stream. However, from the perspective of the induced workload (considering the size of transmitted control information) it can be considered as not necessarily significant. Therefore, during the development of the UBA representing a model of the voice traffic source we will concentrate on the modelling of the main voice data stream and will first ignore the control information. The first simple UBA model for the voice data stream resulting from the G.711 codec is illustrated in Fig. 4.1. In this yet simple model, the voice user is initialized in the state Si , and thereafter requests of type Speech are generated in the R-state Rspeech which model the generation of RTP packets each combining 20 ms of the voice data sampled at 8 kHz and digitized using 8 bit per sample. The resulting RTP packets transport 160 bytes of voice data (160 samples each of 8 bit) and the RTP header of at least 12 bytes. So, the value of the packet size attribute of Speech requests is specified to be 160 + 12 = 172 bytes. The periods at which voice packets are generated are modelled in the delay state Dperiod (delay time of 20 ms in Dperiod , cf. Fig. 4.1). The transmission of control information can be simply added to the G.711 UBA model presented above by means of additional R-states and corresponding conditional state transitions (cf. Fig. 4.2). Without loss of generality, we use the fact that the periods of Control (30 s) and Speech+Control (3 s) requests are multiples of the period of the Speech requests being sent every 20 ms (cf. Table 4.1). The variable T representing the current time in the model is introduced in the S-state Si and initialized with zero. After the next Speech request is generated (in the R-state Rspeech ) the current time is incremented (in the D-state Dperiod ) by the G.711 packetization time period of 20 ms and thus the value of the current time variable T , too: T = T + 0.02 ms (cf. D-state Dperiod in Fig. 4.2). Thereafter, the next state of the UBA is determined (by means of guards on the corresponding state transitions) according to the the following conditions: equal(mod(T,30.0),0.0) : in this case the current time T is a multiple of 30.0 s and the next packet with the control information is to be generated (modelled by the requests of type Control in the R-state Rcontrol ). equal(mod(T,3.0),0.0) : the current time T is a multiple of 3.0 s and the next packet combining the speech data and control information is to be generated (which is modelled by requests of type Speech+Control in the R-state Rspeech+control ).

102

4. Examples of Load Models for Different Traffic Sources

Figure 4.1.: A UBA model of the G.711 output voice stream, control information ignored (own Fig.).

not(or(equal(mod(T,30.0),0.0),equal(mod(T,3.0),0.0))) : in case neither of the two above conditions are fulfilled, the next packet to be generated contains only speech data and is to be modelled by the request of type Speech. The next state is the R-state Rspeech , therefore.

4.1.2. Voice Codecs with Silence Detection Many of the existing voice encoding schemes (e.g. G.723.1 codec [G.723.1] or iLBC vocoder [RFC3951]) employ the Voice Activity Detection (VAD) facility to recognize periods with no speaker activity and suppress generation of voice packets during that times. Therefore, the output of such codecs consists of silence intervals and main talkspurts that are interrupted by short breaks which arise from short pauses a speaker makes while talking. As a result actual talkspurts are separated by breaks which are relatively short compared to the silence intervals separating consecutive talkspurts. In Fig. 4.3 we present the 6.4 kbit/s mode of operation for the G.723.1 codec when 24 byte voice packets are generated every 30 ms during the talkspurts. Therefore, source models for this category of codecs traditionally use “speech on” (or ON ) phases to represent periods of human voice activity and “speech off” (or OFF ) phases to represent inactive speaker periods (cf.,

4.1. Models for Speech Traffic Sources

103

Figure 4.2.: A UBA model of the G.711 output voice stream, control information included (own Fig.).

e.g., [HGB06]). During the ON phase, the source sends packets of constant length L at time periods of length Tp determined by the packetization time of the codec. The distributions of the length of the talkspurts and the silence intervals in packets (i.e. the actual length of a talkspurt measured in time units is to be divided by the packetization time Tp of the codec) can thereafter be approximated e.g. by means of a geometric distribution, i.e. the probability P (X = k) for exactly k failures before the first success event occurs is defined as P (X = k) = p · (1 − p)k and the expectation as E[X] = 1−p p , where p is the success probability of each single trial (in the sequence of independent Bernoulli trials). While the source is being in its ON phase, k subsequent failure events model the generation of k subsequent voice packets before the source changes into the OFF phase (with the (k + 1)-event being a success event). While the source is being in the OFF phase, the failure events represent the packet periods at which no voice packets are generated before the source finally changes back into the ON phase. So, the model for an on/off voice traffic stream has the following parameters, which can be specified by the experimenter: Tp [sec]: the length of the periods at which voice packets are generated while the source is being in the ON phase. This parameter is determined

104

4. Examples of Load Models for Different Traffic Sources

Talkspurt

Talk

Silence interval

Break

Silence interval

Figure 4.3.: Output stream of the G.723.1 codec with silence intervals and I I breaks (Fig. from [MBM09]). talkspurts interrupted by short

by the packetization time of the voice codec being modelled. I L [byte]: the constant size (in bytes) of the voice packets being generated in the ON phase. If needed, the packet size parameter may include the overhead resulting from additional headers (e.g., RTP, UDP, and IPv4 or IPv6 headers). ¯ on [sec]: the (target) mean duration of the ON phases in time units. D Counted in packets (or packetization intervals), k voice packets (or packeti¯ zation intervals) where k = DTon are to be generated before the source can p change into the OFF phase. This can be modelled by means of a sequence of independent Bernoulli trials (each with the success probability of pon ) so that the probability for k failures (packet generation events) before success (changing into the OFF phase) is geometrically distributed. The length Don of ON phases (measured in packets and modelled by means of on a geometric distribution) has an expectation E[Don ] = 1−p pon and, hence, such pon can be estimated that

¯ on D Tp



1−pon pon .

¯ of f [sec]: the (target) mean duration of the OFF phases in time units. D Measured in the number of the compounding packetization intervals, the ¯ D source should remain a number of n = Tofp f subsequent intervals in the OFF phase and only thereafter change into the ON phase. Again, this process is modelled by means of a sequence of independent Bernoulli trials each with

4.1. Models for Speech Traffic Sources

105

Figure 4.4.: A UBA describing an ON/OFF model for the voice stream from G.723.1 codec (own Fig.).

the success probability of pof f . The probability for n failures (packetization intervals at which no packets are generated) before success (changing back into the ON phase) is geometrically distributed and the length Dof f of OFF phases (measured in the number of packetization intervals and modelled by 1−p means of a geometric distribution) has an expectation E[Dof f ] = pofoff f , so that the parameter pof f of the geometric distribution can be estimated ¯ D 1−p such, that Tofp f ≈ pofoff f . The construction of the corresponding UBA to describe the on/off model for the G.723.1 codec is illustrated in Fig. 4.4 using the LoadSpec tool. The semantics of the particular states and transitions between the states in the UBA are described in the following. • In the S-state Si the voice source is initialised and, without loss of generality, it switches first into the ON phase (and not into the OFF phase). • The generation of voice packets during the ON phase is modelled in the R-state Ron by generating requests of the abstract request type sendVoicePacket (cf. Fig. 4.4). The attribute packet size of sendVoicePacket requests represents the length of voice packets (L [byte]) which is dependent on the codec being used. For example, for the G.723.1 codec

106

4. Examples of Load Models for Different Traffic Sources

in the 6.4 kbit/s mode of operation, the value of packet size is set to 24 bytes. • In the D-state Dt,on the time intervals between subsequent voice packets (inter-arrival times) in the ON phase are modelled. For the G.723.1 codec these intervals have the same constant length of Tp = 30 ms for both modes of operation (because the size of the voice packets is increased to improve the voice quality while the packetization time Tp remains constant). Further, a stochastic variable uon uniformly distributed at (0, 1) is used ¯ on D in order to determine the next state in the UBA. In case uon ≤ D¯ on , +Tp further packets are to be generated in order to achieve the specified mean ¯ on of the ON phase, so that the source remains in the ON phase length D and the next state in the UBA becomes the R-state Ron . Otherwise, if ¯ on D uon > D¯ on , the last packet to achieve the specified mean duration +Tp of the ON phase has been already generated, the source switches into the OFF phase and the the D-state Dt,of f becomes the next state of the UBA. • In the D-state Dt,of f the OFF phase of the voice stream is modelled, i.e. a number of packetization intervals of length Tp in which the generation of packets was suppressed by the coder. After the next packetization interval has elapsed (which is modelled by adding the packetization time Tp to the current time in the UBA), a stochastic variable uof f uniformly distributed at (0, 1) is used to determine the next state of the UBA. In ¯ f D case uof f ≤ D¯ of fof+T , not yet enough packetization intervals have been p ¯ of f of suppressed by the source to achieve the specified mean length D the OFF phase, the source remains in the OFF phase and the next state of the UBA is not changed (i.e. it remains to be Dt,of f ). Otherwise, if ¯ f D uof f > D¯ of fof+T , the source can leave the OFF phase and change back p into the ON phase, so that Ron becomes the next state of the UBA. • The voice source terminates when the actual simulation time t exceeds the specified maximum duration of the VoIP call. ¯ on , and D ¯ of f of the source Finally, the values for parameters Tp , L, D model are to be specified. As already mentioned, the values of Tp and L are strongly dependent on the type of the codec being used and its mode of operation. For example, the iLBC codec [RFC3951] may produce

4.1. Models for Speech Traffic Sources Codec

G.723.1 (contiguous ON /OFF )

107 G.723.1 (long ON /OFF with breaks)

Phase (D: duration in [s])

ON

OFF

ON (starting with at least 15 consecutively generated packets)

OFF (starting with at least 50 consecutively suppressed packets)

¯ measured D, V ar(D), measured

1.304 [s] 1.7938

1.480 [s] 2.9858

11.54 [s] 0.61003

11.98 [s] 0.60261

p (pon or pof f ), estimated

2.24859 · 10−2

1.98671 · 10−2

2.59291 · 10−3

2.49792 · 10−3

Table 4.2.: Measurements of ON /OFF phase durations (classical approach and approach from [MBM09]) using the set of typical telephone conversations available in [BAS96] (Tab. from [MBM09]).

L = 62 bytes every Tp = 20 ms or every Tp = 30 ms, and the G.723.1 codec [G.723.1] may produce L = 24 byte or L = 20 byte every Tp = 30 ms. Considering the measurements of mean lengths of ON and OFF phases from voice packet data, the most critical question is how to treat the short breaks in the main talkspurts resulting from the short pauses a speaker makes while talking. The classical approach to this issue is to build contiguous ON and OFF phases such that the main talkspurts are cut in pieces and the breaks in speaker’s talking are treated as independent OFF phases [HGB06, PEA05]. So, the mean durations of ON and OFF phases estimated according to this approach will be rather short (cf. Table 4.2, column “G.723.1 (contiguous ON /OFF )”). In [MBM09] the authors propose to measure the length of whole talkspurts instead of strictly contiguous (or uninterrupted) ON and OFF phases. This approach does not consider the breaks in talkspurts as independent OFF phases and takes into account the autocorrelation on packet level which results from the grouping of talk phases into main talkspurts and can be observed in real voice packet data. The estimated mean durations of ON and OFF phases are, therefore, by a factor of 10 longer than in the classical approach leading to a “heavy tail” in the distribution of length of the ON phases (cf. Table 4.2, column “G.723.1 (long ON /OFF with breaks)”). Moreover, when a superposition of a number of voice streams (each of those is modelled in the proposed way) is fed to a single server queue, the resulting

108

4. Examples of Load Models for Different Traffic Sources

queueing behaviour will be rather more realistic as in case of the classical modelling approach with contiguous ON and OFF phases (cf. [MBM09] for details). Examples for similar estimation methods of mean durations of ON phases ¯ on ) and OFF phases (D ¯ of f ) for other codecs with silence detection (e.g. (D iLBC) and speech patterns (e.g. using different languages) can be found e.g. in [HGB06, PEA05]. Furthermore, source models for VBR codecs (producing voice packets of different sizes at constant time periods) like GSM-AMR codec [GSMAMR] or internet Speech Audio Codec (iSAC) codec have been proposed by M. Menth et al. with the corresponding parameter sets available at [MMC-TR].

4.2. Modelling of Video Traffic Sources Models for VBR video streams generated by coders using e.g. an MPEG or H.264 video compression scheme are very important for the analyses of video transmission quality in IP-based networks and, in particular, for the understanding of the impact of packet delays and/or losses during the transmission on the quality of the video service (QoS). Using a packet trace of a real video stream may be often not feasible in such studies e.g. due to the complexity of the packetization mechanisms being used to transform video frames into the corresponding transport stream (cf., e.g., [RFC3984, RFC6184]), different orders of frames used for encoding and transmission, or the amount of memory needed to store the video file. Therefore, a series of models for artificial sources of video traffic have been proposed which can broadly be classified into data-rate models and frame-size models [SRS03]. In data-rate models, only the rate at which video data are arriving at a link is generated, e.g. in order to make some performance estimations. Such models are suitable for objective QoS studies to predict e.g. the average packet-loss probability or the probability for buffer overflows in an IP router. However, they do not allow to exactly identify the concrete frames being affected, which is important because losses of different types of video frames (I-, P-, and B-frames) have strongly different impact on the perceptual quality of the received video. In frame-size models, types of individual frames and their sizes are captured, and data-rate information can be obtained from the frame information. Moreover, the inherent frame-by-frame nature defined by the GOP and used by the coder is preserved and the location of losses can be precisely identified to understand

4.2. Modelling of Video Traffic Sources

109

the impact on the video quality at the receiver’s end in various performance and QoS studies. However, the construction of a frame-by-frame model of a VBR video may become a very difficult task, especially at the transport or network service interfaces, because the effects of different packetization techniques involved at the application layer can hardly be captured there. Consider that advanced interleaving and Forward Error Correction (FEC) techniques as well as complex packetization mechanisms [RFC3984, RFC6184] may often be involved into the generation of the RTP packet stream from the raw video codec data. So, the frame-by-frame modelling of video streams at the RTP or the IP layer may become very tedious when these techniques are employed. Especially, it is very difficult to capture the distribution of video frames onto various packets, and to exactly track which parts (slices) of the video frame will belong to which RTP/UDP packets and vice versa. In case of adaptive HTTP streaming, the effects of HTTP chunking should be taken into account. Other types of applications like Internet video streaming with interleaved transmission or high bit-rate video-on-demand may use different sophisticated packetization profiles specified e.g. in [RFC6184]. Three different compression techniques applied to I-, P-, and B-frames result in different compression ratio which should be preserved by the corresponding UBA model presented in Sec. 4.2.1). Moreover, the composition or content of pictures as well as the temporal similarity of adjacent pictures may vary in different fragments of the same film leading to frames of different sizes. Therefore, a universal VBR video model should have enough parameters to capture all classes of video fragments, and all types of frames (I-, P-, and B) in each class of video fragments (a corresponding UBA model is presented in Sec. 4.2.2).

4.2.1. Modelling of the GOP Structure In this section we aim at construction of the UBA model able to capture the type and size of different frames (I-, P-, and B-frames) as well as the specific GOP structure used by the video coder to generate the video stream. Let F = F1 , F2 , F3 , . . . , Fnf be the sequence of nf ∈ N frames obtained from an H.264-encoded video. Each frame Fi can be represented as a tuple Fi = (serialNr, type, size) where the type parameter specifies the type of the frame encoding type ∈ {I, B, P}. Further, each GOP within the video is a sequence of N frames which follows an (N , M ) cyclic format, i.e. the first frame is an I-frame, every M -th frame is a P-frame (assuming that the GOP contains P-frames), and (M -1) frames between every I-P, P-P, or P-I

110

4. Examples of Load Models for Different Traffic Sources

Figure 4.5.: A simple UBA model using a chain of six R-states to represent the IBBP BB... sequence ((6,3)-GOP structure) of H.264-coded video frames (own Fig.).

frames pair are B-frames. For simplicity, we further assume that the entire film has been coded using the same GOP structure. Therefore, the film can be represented as the sequence G = G1 , G2 , . . . , Gng , ng ∈ N of ng successive GOPs, and nf = N · ng , without loss of generality. We start constructing the UBA model by the definition of the possible request types. As already stated, different types of compression are used to generate I-, P-, and B-frames and lead to different compression ratios and sizes of resulting video frames. So, we expect that different distributions will be needed to approximate the lengths of I-, P-, and B-frames and, therefore, model them separately by means of three distinct abstract request types Send-I-Frame, Send-P-Frame, and Send-B-Frame, correspondingly. For example, if the (N = 6, M = 3) GOP structure has been used by the H.264 encoder, the corresponding sequence of video frames would result to IBBP BB.... The simplest way to model this sequence of video frames is to build a chain of six R-states (each responsible for the generation of the next frame within the GOP) being separated by the D-states to model the inter-frame intervals determined by the coder (cf. Fig. 4.5). In this simple case, the transitions between the R- and D-states of the UBA are all deterministic.

4.2. Modelling of Video Traffic Sources

111

Figure 4.6.: A universal UBA model using three R-states and conditional state transitions to model different possible GOP structures, which include all types of frames, i.e. I-, P-, and B-frames (own Fig.).

Apparently, the number of R- and D-states in the presented simple UBA will grow linearly with the length of the original GOP structure to be captured. In the more sophisticated approach presented in Fig. 4.6, only three R-states I-frame, B-frame, and P-frame are used to model the generation of I-, B-, and P-frames, correspondingly, and conditional transitions instead of deterministic transitions are used to determine the next state of the UBA. First, the video source is initialised in the S-state Si and the context variables NoF (denoting the total number of generated frames) and NoB (denoting the number of B-frames) are set to zero. In the R-state I-frame an abstract Send-I-Frame request is generated to model the first frame of the GOP (which is always an I-frame) and the total number of generated frames is incremented (NoF = NoF + 1). The time interval between the I- and the subsequent B-frame is modelled in the D-state DIB and is determined by the frame rate being used by the video coder. In this example, we assume that a constant frame rate of 24 frames per second has been used to produce the video stream (which is typical for current H.264 and Moving Picture Experts Group (MPEG) coders). Therefore, the video frames are produced at constant time periods of 41.67 ms and the corresponding inter-frame times modelled in the D-states of the UBA are set to this value.

112

4. Examples of Load Models for Different Traffic Sources

After the inter-frame time in DIB , the current state of the UBA becomes the R-state B-Frame, a new abstract Send-B-Frame request is generated, and the total number NoF of generated frames as well as the number NoB of generated B-frames is incremented (NoF = NoF + 1, and NoB = NoB + 1, correspondingly). Further, the next state of the UBA is determined dependent on the current total number NoF of generated frames and the number NoB of generated B-frames according to the following guards: equal(mod(NoF, N), 0): in this case, the remainder of the division NoF/N is equal to zero, i.e. the current total number of generated frames NoF is a multiple of the GOP size N . So, the end of the current GOP is reached and the next video frame to be generated is an I-frame (starting a new GOP). Therefore, first the D-state DBI becomes the next state of the UBA to model the inter-frame time between the last B- and the next I-frame, which will be generated in the R-state I-frame when it becomes the next state of the automaton. and(not(equal(mod(NoF, N), 0)), equal(mod(NoB, M-1), 0)): here, the end of the current GOP has not been reached (because of the condition not(equal(mod(NoF, N), 0))). Further, the current number NoB of generated B-frames is a multiple of M − 1, which means that the last M − 1 generated frames were B-frames and the next (M -th) frame to be generated must, therefore, be a P-frame. Thus, the next state of the UBA is the D-state DBP to model the inter-frame time between the last B-frame and the next P-frame, which will be generated in the R-state P-frame when it becomes the next state of the UBA. and( not(equal(mod(NoF, N), 0)), above(mod(NoB, M-1), 0) ): in this case, neither the end of the current GOP has been reached (because of the condition not(equal(mod(NoF, N), 0))), nor M − 1 subsequent B-frames have been generated (because of the condition above(mod(NoB, M-1), 0)). Therefore, another B-frame is to be generated as the next video frame, and the current state of the automaton is first switched into the D-state DBB to model the inter-frame time between the last and the next B-frame. Thereafter, the R-state B-frame becomes the next state of the UBA, again. We note that the parameters N and M can be specified in the initialisation state Si to model different variants (N ,M ) of the GOP structure. Therefore, the UBA presented in Fig. 4.6 is considerably more flexible and universal in respect of supported GOPs compared to the simple UBA from Fig. 4.5. Moreover, the description of the model is much more compact and the number of R- and D-states does not grow with the increasing GOP size.

4.2. Modelling of Video Traffic Sources

113

4.2.2. Statistical Characterization and Modelling of Frame Lengths In the UBA models presented in the previous section, the length of the video frames to be generated is modelled separately for each type of frames (I-, P-, and B-frames) by means of the size attribute of the abstract request type Send-I-Frame, Send-P-Frame, and Send-B-Frame, correspondingly. In this section we address the specification of values for the size attribute, i.e. a means for modelling the lengths of each individual I-, P-, and B-frame. The lengths of individual I-, P-, and B-frames may strongly vary for different types of video films. In particular, such factors as the composition and character of the video, the frequency of occurrence of camera pans, and last but not least the wide set of possible H.264 encoding parameters (which are mainly determined by the H.264 profile being used) may considerably influence the distribution of lengths of I-, P-, and B-frames produced by the coder. In this section we present the results of statistical characterization of frame lengths for the video film “Big Buck Bunny” (BBB) freely available at [BBB]. The version of the video film to be analysed has a resolution of 1280x720 pixel (High Definition Television (HDTV) 720p quality), a duration of 9 min 56 sec, and is available in the Quick-Time container format (.mov). Fig. 4.7 demonstrates that the data rate required to transmit the Big Buck Bunny (BBB) video over a network is strongly fluctuating. The estimated mean required throughput at the IP layer (which was measured at the client side with intervals of 1 sec) does not exceed the mark of 5.7 Mbit/s. However, film fragments with high motion intensity (e.g. camera pans at the times of around 40, 300, and 420 sec as well as in the final trailer between 520 and 580 sec) lead to higher amounts of data to be transmitted and result in much higher required throughput (up to 18.7 Mbit/s) in the particular time intervals. Further, we used the mp4videoinfo tool [MPEG4IP] to decode the frame structure of the H.264 video from the Quick-Time container file. The mp4videoinfo decoding revealed that a frame rate of 24 frames/sec and a (24,2) GOP structure were used to encode the video (i.e. the inter-frame time is 0.042 ms, the GOP is 24 frames long and every second frame is a P-frame). A total number of GOPs is 597 and the total number of frames is 14315, whereof 597 frames are I-frames, 7157 are P-frames, and 6561 are B-frames. Each row from the mp4videoinfo contains the information on the type of the next video frame and the Network Abstraction Layer Units (NALUs)

114

4. Examples of Load Models for Different Traffic Sources 20 IP throughput of the BBB video stream (RTSP streaming using RTP/UDP) Mean IP throughput of the BBB video stream 18 16

IP throughput [Mbit/s]

14 12 10 8 6 4 2 0 100

200

300

400

500

600

Streaming/playback time [s]

Figure 4.7.: IP throughput of the BBB video stream at the client side, using RTP/UDP transport in a 100 Mbit/s Fast Ethernet, (own Fig.).

used for its packetization. Thus, the total length of each video frame can be simply calculated as the sum of the lengths of the corresponding NALUs. Further, we handle the I-, P-, and B-frames separately to find a suitable statistical distribution for their lengths. First we compute the Empirical Probability Mass Function (EPMF) for the length of the I-frames taken from the video file1 . In order to do so, we first arrange the observed lengths xi of I-frames in increasing order x1 , x2 , . . . , xi , . . . , xn with x1 as the smallest and xn as the largest value. We truncate 1% of values (0.01 · n) from the beginning and 1% of values (0.01 · n) from the end of the frame length sequence and obtain a new sequence of frame lengths with “truncated tails” and the new minimum value xmin and maximum value xmax . 1 Here,

we consider the probability mass and not the probability density function due to the fact that lengths of frames are represented by a discrete (and not continuous) random variable X.

4.2. Modelling of Video Traffic Sources

115

Figure 4.8.: I-frame lengths (in bytes) from BBB video: EPMF, fit by means of the Log-normal and Gamma PDFs (own Fig.).

Thereafter, the region of frame length values [xmin ; xmax ) is subdivided into 50 equally-sized histogram intervals (also called bins) [xmin ; xmin + 1 · xmax −xmin −xmin −xmin ), [xmin + 1 · xmax50 ; xmin + 2 · xmax50 ),..., [xmin + 49 · 50 xmax −xmin ; x ]. For each interval we calculate the number of frames those max 50 lengths suit into the interval and normalize it by the total number of I-frames. The obtained relative frequencies of occurrence represent the values of the EPMF for the lengths of I-frames at the upper bounds of the histogram intervals (cf. Fig. 4.8). Further, we determined the following statistical characteristics of the length of I-frames: the minimum length xmin (34942 Byte), the maximum length xmax (346439 Byte), the empirical expected value E[X] (148715.74 Byte), standard deviation (71173.03 Byte), variance V ar[X] (5065600905 Byte2 ), 25% quartile (at 92038 Byte), and 75% quartile (at 197091 Byte). When we assume that the values of the variate X representing the length of I-frames originate from a Log-normal distribution Log-normal(μ, σ) (i.e. the values of the logarithm of the frame length adhere to a Normal distribution), then the shape parameter σ and the log-scale parameter μ can be obtained amongst others according to the Maximum Likelihood Estimator (MLE) method for the Log-normal distribution (cf. [NIST13]). In  k ln(xk ) this case, the MLE method leads to the values of μ ˆ= = 11.78714 n  μ )2 k (ln(xk )−ˆ and σ ˆ= = 0.51289 where xk are the n outcomes of X (i.e. n the lengths of I-frames). When we assume that the outcomes of X adhere to a Gamma distribution Gamma(α, β), then the rate parameter β and the shape parameter α of the ar[X] distribution can be calculated as β = VE[X] = 34062.30503 α = E[X] = β

116

4. Examples of Load Models for Different Traffic Sources

Figure 4.9.: I-frame lengths (in bytes) from BBB video: ECDF, fit by means of the Log-normal and Gamma CDFs (own Fig.).

4.36599 (cf. [AnT06], Section 3.2.8, p. 122). The corresponding Probability Density Functions (PDFs) of the Log-normal (μ = 11.78714, σ = 0.51289) and the Gamma (α = 4.36599, β = 34062.30503) distributions are included into the Fig. 4.8. The points in the plots of these PDFs represent the Log-normal or Gamma probability density cumulated in each histogram interval in order to allow for comparison with the EPMF of the lengths of I-frames. We note that it is rather hardly possible to decide which distribution delivers a better fit by means of only visual inspection of the Log-normal and Gamma PDFs with the histogram of the lengths of I-frames (EPMF). Further, we calculated the Empirical Cumulative Distribution Function (ECDF) for the lengths of I-frames using the same histogram intervals as for the computation of the EPMF. The resulting ECDF is represented in Fig. 4.9 and is superimposed with the corresponding Cumulative Distribution Functions (CDFs) for the Log-normal(μ = 11.78714, σ = 0.51289) and the Gamma(α = 4.36599, β = 34062.30503) distributions used to fit the empirical distribution of lengths of I-frames in the BBB video. Visually, the Gamma CDF appears to deliver a slightly better fit of the ECDF than the Log-normal CDF. However, the exact decision can hardly be met by means of visual inspection only. In order to estimate the goodness-of-fit of the proposed distributions, we apply the following well-known and widely used goodness-of-fit tests at the

4.2. Modelling of Video Traffic Sources

117

significance level of α = 0.05: Chi-Square (or χ2 ) test : compares the observed frequencies n1 , n2 , . . . , nk of k values (or in k intervals) of the variate with the corresponding theoretical frequencies e1 , e2 , . . . , ek calculated from the assumed theoretical distribution model. The basis for appraising the goodness of this comparison k (ni −ei )2 is the distribution of the quantity (test statistic) which i=1 ei 2 approaches the chi-square (χf ) distribution with f = k − 1 degrees-offreedom (d.o.f) as n → ∞ (cf. [AnT06], Chapter 7). As the parameters of the Log-normal and Gamma distribution are unknown in our case and need to be estimated also from the measured frame lengths, the d.o.f. f must be reduced by one for every unknown parameter that must be estimated. 2 k i) If the assumed distribution yields i=1 (ni −e < c1−α,f in which c1−α,f ei 2 is the critical value of the χf distribution at the cumulative probability of (1 − α), the assumed theoretical distribution is an acceptable model, at the significance level α. Otherwise, the assumed distribution model is not substantiated by the observed data at the α significance level. In our case f = 50 − 1 − 2 (reduction by 2 parameters in case of the Lognormal and also in case of the Gamma distribution), and the calculated test statistic yields to 0,1576 for the Log-normal distribution and to 0,0059 for the Gamma distribution. Both values are far below the critical value of c1−α,f = c1−0.05,47 ≈ 67.50 (cf. [AnT06], Appendix A). Therefore, both distributions are suitable for modelling the lengths of I-frames, at the α = 5% significance level and the Gamma distribution represents a slightly better fit (due to a lower value of the test statistic). Kolmogorov-Smirnov (or K-S) test : for a significance level α, the K-S test compares the observed maximum difference Dn = maxx |FX (x) − Sn (x)| between the CDF of an assumed theoretical distribution FX (x) and the empirical cumulative frequency function Sn (x) over the entire range of X with the critical value Dnα which is defined for significance level α by P (Dn ≤ Dnα ) = 1 − α. If the observed Dn is less than the critical value Dnα , the proposed theoretical distribution is acceptable at the specified significance level α. Otherwise, the assumed theoretical distribution is to be rejected. In the case of I-frame lengths the sample size is n = 50 because 50 intervals have been used for the computation of the ECDF. Therefore, at the 5% significance level, we obtain the critical value of Dnα 0.05 to be D50 = 0.1923. The observed maximum discrepancy Dn for the Log-normal distribution is 0.0475 and for the Gamma distribution Dn is 0.0379. So, the proposed Log-normal and the Gamma distribution are

118

4. Examples of Load Models for Different Traffic Sources

acceptable models for the lengths of I-frames at the 5% significance level. Our presumption that the Gamma distribution delivers a slightly better fit than the Log-normal distribution is confirmed by the smaller value of the test statistic for the Gamma distribution. Cram´ er-von Mises (or C-M) test : was developed by Cram´er and von Mises between 1928 and 1930 and is additionally applied here to estimate the goodness-of-fit because the first two (Chi-Square and K-S) tests are often criticized for their limited performance especially at the tails of the distribution. In order to apply the C-M test the observed lengths of I-frames must be first arranged in increasing order x1 , x2 , . . . , xi , . . . , xn with xn as the largest value. Next, the CDFs of the proposed Log-normal and Gamma . . , n. Thereafter, the distributions FX (xi ) are evaluated at xi , for i = 1, 2, . n 1 2 + i=1 ( 2i−1 C-M statistic determined by the quantity T = 12n 2n − F (xi )) is calculated. In our case, the C-M statistic amounts to 0.32221 for the Log-normal and to 0.21179 for the Gamma distribution. The critical values cα n for the C-M test depend on the sample size n and can be found in tabulated form for different significance levels α (cf. [AnT06]). At the significance level of α = 0.05 and the sample size n greater than 502 the 0.05 corresponding critical value is cα n = c50 = 0.22 so that the calculated C-M test statistic for the Gamma distribution is well below this critical value. Therefore, according to the C-M test the proposed Gamma distribution is an acceptable model for the lengths of I-frames at the α = 0.05 significance level. The calculated C-M statistic for the Log-normal distribution is above the critical value for α = 0.05 but below the critical value c0.01 50 = 0.33 for α = 0.01. Thus, the proposed Log-normal distribution can not hold as a valid model for the lengths of I-frames at the α = 0.05 significance level but well at the significance level of α = 0.01. Anderson-Darling (or A-D) test : this test was introduced by Anderson and Darling (1954) as a generalization of the already presented Cram´er-von Mises test to place more weight or discriminating power at the tails of the distribution ([AnT06]). This is important for our task of modelling the lengths of video frames because the tails of the proposed Log-normal and Gamma distributions are of practical significance. In order to apply the A-D test the observed lengths of I-frames must also be first arranged in increasing order x1 , x2 , . . . , xi , . . . , xn with xn as the largest value. Next, the CDFs of the proposed Log-normal and Gamma distributions FX (xi ) are evaluated at xi , for i = 1, 2, . . . , n. Thereafter, the A-D statistic 2 for

n > 50 the critical values for n = 50 can be used

4.2. Modelling of Video Traffic Sources

119

n A2 = − i=1 [ 2i−1 n (ln(FX (xi )) + ln(1 − FX (xn+1−i )))] − n is calculated. In our case the value of the A-D test statistic amounts to A2 = 2.3229 for the Log-normal and to A2 = 1.46219 for the Gamma distribution. Further, to account for the effect of sample size n, the adjusted statistic A∗ is computed depending on the form of the selected theoretical distribution. For the Log-normal distribution the adjusted A-D statistic amounts to 2.25 A∗ = A2 (1.0 + 0.75 n + n2 ) = 2.32587 and for the Gamma distribution with ∗ 2 α ≥ 2 to A = A + (0.2 + 0.3/α)/n = 1.46265. Finally, the value of the adjusted A-D test statistic A∗ is to be compared with the critical value cα for the appropriate distribution type at the chosen significance level α. At the significance level of α = 0.05 the A-D test statistic computed for Log-normal and the Gamma distribution is slightly higher than the corresponding critical values cα = 0.75038 for Log-normal and cα = 0.759 for Gamma distribution. Thus, we can conclude that according to the A-D test neither Log-normal nor Gamma distribution deliver an acceptable model for the lengths of I-frames at the significance level of α = 0.05. Further calculations reveal that the A-D test is not fulfilled also at the significance level of α = 0.01. It should be noted that the A-D test statistic is expressed in terms of the logarithm of the probabilities and, therefore, receives more contributions from the tails of the Log-normal and Gamma distribution. The procedure described above for I-frames has also been applied by us for the analysis of the lengths of P-frames from the BBB video. The following statistical characteristics have been obtained for the lengths of P-frames: the minimum length xmin (625 Byte), the maximum length xmax (130187 Byte), the empirical expected value E[X] (29387.34 Byte), standard deviation (23199.78 Byte), the empirical variance V ar[X] (538229771.2 Byte2 ), 25% quartile (at 13487 Byte), and 75% quartile (at 37023 Byte). The plot of the frequency of occurrence (EPMF) for the different lengths of P-frames from the BBB video is illustrated in Fig. 4.10. As in the case of I-frames, after the visual inspection of the EPMF plot we assume that the lengths of P-frames can be modelled by means of Log-normal(μ, σ) and/or Gamma(α, β) distribution. For the Log-normal distribution we used the MLE method to obtain the values of the parameters  

ln(x )



(ln(x )−ˆ μ )2

k k μ ˆ = k n k = 9.9687 and σ ˆ= = 0.8649 where xk are the n n observed lengths of P-frames. For the Gamma(α, β) distribution, we obtained the values of the rate parameter β and the shape parameter α from the known empirical mean

120

4. Examples of Load Models for Different Traffic Sources

Figure 4.10.: P-frame lengths (in bytes) from BBB video: EPMF, fit by means of the Log-normal and Gamma PDFs (own Fig.).

E[X] and empirical variance V ar[X] accordingly to β =

V ar[X] E[X]

= 18315.0195

E[X] β

and α = = 1.6045 (cf. [AnT06], Section 3.2.8, p. 122). The corresponding PDFs of the Log-normal(μ = 9.9687, σ = 0.8649) and the Gamma(α = 1.6045, β = 18315.0195) distributions are superimposed with the EPMF plot for the lengths of P-frames in Fig. 4.10. Visual inspection of the curves shows that the Gamma distribution is likely to deliver a slightly better fit although it does not exactly match the peak of the EPMF. The ECDF for the lengths of P-frames was obtained using the same histogram intervals as for the calculation of the EPMF and is presented in Fig. 4.11. The corresponding CDFs of the Log-normal(μ = 9.9687, σ = 0.8649) and the Gamma(α = 1.6045, β = 18315.0195) distributions are superimposed with the ECDF to be able to inspect the goodness of fit visually. The Gamma CDF slightly underestimates the ECDF for P-frame lengths in the interval from ca. 22000 to ca. 47000 Bytes and overestimates the ECDF in the interval from ca. 58000 Bytes to ca. 85000 Bytes. The Lognormal CDF overestimates the ECDF in the interval from ca. 11000 to ca. 27000 Bytes and slightly underestimates the ECDF for frame lengths higher than 90000 Bytes. So, we can hardly discriminate one of the distributions by means of only visual inspection. In order to estimate the goodness-of-fit of the proposed Log-normal and Gamma distributions, the χ2 , K-S, C-M, and the A-D tests have been applied at the significance level of α = 0.05: Chi-Square (or χ2 ) test : the χ2 test statistic for f = 50 − 1 − 2 degrees of freedom (again, reduced by 2 to take into account two parameters of the Log-normal or the Gamma distribution) yields to 4.7475 for the Log-normal

4.2. Modelling of Video Traffic Sources

121

Figure 4.11.: P-frame lengths (in bytes) from BBB video: ECDF, fit by means of the Log-normal and Gamma CDFs (own Fig.).

and to 0,1048 for the Gamma distribution. Both values are much smaller than the χ2 test critical value c1−α,f = c1−0.05,47 ≈ 67.50 (cf. [AnT06], Appendix A). Therefore, according to the χ2 test, both distributions are suitable for modelling the lengths of P-frames at the α = 5% significance level. The Gamma distribution can be discriminated to represent a better fit due to a significantly lower value of the test statistic compared to the value of the test statistic for the Log-normal distribution. Kolmogorov-Smirnov (or K-S) test : the observed maximum discrepancy Dn between the value of the CDF of the Log-normal distribution and the ECDF to be approximated is 0.0502. The observed maximum discrepancy between the value of the CDF of the Gamma distribution and the ECDF is 0.0379. Test statistics for both distributions are below the critical value 0.05 of Dnα = D50 = 0.1923 for the sample size of 50 and the significance level of α = 0.05. So, the proposed Log-normal and the Gamma distribution are acceptable models for the lengths of P-frames at the 5% significance level. We note that the K-S test does not allow to clearly discriminate the Gamma distribution as a better fit due to the fact that only the observed maximum discrepancy between the values of the theoretical CDF and the ECDF is considered in the test. value of the C-M test statistic deCram´ er-von Mises (or C-M) test : the  n 1 2 termined by the quantity T = 12n + i=1 ( 2i−1 2n − F (xi )) amounts to 5.4404 for the Log-normal and to 3.5705 for the Gamma distribution. Both

122

4. Examples of Load Models for Different Traffic Sources

0.05 values exceed the critical value cα n = c50 = 0.22 of the C-M test at the significance level of α = 0.05 for the sample size n > 50. Therefore, neither the Gamma nor the Log-normal distribution can be substantiated as an acceptable model for the lengths of P-frames in the C-M test at the α = 0.05 significance level. We note, however, that the value of the C-M test statistic for the Gamma distribution is significantly lower than the value for the Log-normal distribution, so that the Gamma distribution can be discriminated to deliver a better fit than the Log-normal distribution according to the C-M test.

Anderson-Darling (or A-D) test : in case of P-frames the value of the 2.25 adjusted A-D test statistic amounts to A∗ = A2 ·(1.0+ 0.75 n + n2 ) = 34.6523 for the Log-normal distribution and for the proposed Gamma distribution with parameter α = 1, 6045 < 2.0 to A∗ = A2 · (1.0 + 0.6/n) = 22.4392. At the significance level of α = 0.05 the adjusted A-D test statistic computed for the Log-normal and the Gamma distribution is much higher than the corresponding critical values cα = 0.7513 for the Log-normal and cα = 0.786 for the Gamma distribution (cf. [AnT06], Appendix A). Thus, according to the A-D test, neither Log-normal nor Gamma distribution can represent an acceptable model for the lengths of P-frames at the significance level of α = 0.05. Finally, we present the results of the statistical analysis for the lengths of B-frames from the BBB video. First, values of the following statistical characteristics have been obtained: the minimum length xmin (238 Byte), the maximum length xmax (58357 Byte), the empirical expected value E[X] (10830,22 Byte), standard deviation (11248,48 Byte), the empirical variance V ar[X] (126528356,7 Byte2 ), 25% quartile (at 3463 Byte), and 75% quartile (at 13639 Byte). The frequency of occurrence (EPMF) for different lengths of B-frames from the BBB video is illustrated in Fig. 4.12. Next, the parameters of the assumed Log-normal(μ, σ) and Gamma(α, β) distribution have been determined using the same methods as for Iand P-frames. The corresponding PDFs for the Log-normal(μ = 8.7850, σ = 1.1029) and the Gamma(α = 0.9270, β = 11682.8967) distributions are superimposed with the EPMF plot for the lengths of B-frames in Fig. 4.12. Visual inspection of the curves shows that the Gamma distribution is very likely to deliver a better fit of the EPMF also for the lengths of B-frames. The corresponding CDFs of the Log-normal(μ = 8.7850, σ = 1.1029) and the Gamma(α = 0.9270, β = 11682.8967) distributions are superimposed with the ECDF of the lengths of B-frames in Fig. 4.13. The Log-normal CDF overestimates the ECDF in the interval from ca. 3000 to ca. 13000 Bytes and

4.2. Modelling of Video Traffic Sources

123

Figure 4.12.: B-frame lengths (in bytes) from BBB video: EPMF, fit by means of the Log-normal and Gamma PDFs (own Fig.).

then slightly underestimates the ECDF for frame lengths in the interval from ca. 16000 to ca. 30000 Bytes. The Gamma CDF overestimates the ECDF only insignificantly for the lengths of B-frames below ca. 9000 Bytes and slightly underestimates the ECDF for the lengths of B-frames in the interval from ca. 10000 to ca. 20000 Bytes. According to the visual inspection, the Gamma distribution appears to deliver a better fit of the ECDF for the lengths of B-frames. In the last step the goodness-of-fit of the proposed Log-normal and Gamma distribution has been estimated using the χ2 , K-S, C-M, and the A-D tests at the significance level of α = 0.05: Chi-Square (or χ2 ) test : the values of the χ2 test statistic obtained for the Log-normal distribution (0.1854) and for the Gamma distribution (0.1595) are much smaller than the critical value of the test (≈ 67.50) at the given significance level. Thus, both distributions are valid models for the lengths of B-frames at the α = 0.05 significance level. Kolmogorov-Smirnov (or K-S) test : the observed maximum discrepancy Dn between the value of the CDF of the Log-normal distribution and the ECDF to be approximated is 0.0621. The observed maximum discrepancy between the value of the CDF of the Gamma distribution and the ECDF is 0.0344. So, for both distributions the K-S test statistics are below the 0.05 critical value of Dnα = D50 = 0.1923 for the sample size n greater than 50 and the significance level of α = 0.05. So, the proposed Log-normal and the Gamma distribution are acceptable models for the lengths of B-frames at the significance level of α = 0.05.

124

4. Examples of Load Models for Different Traffic Sources

Figure 4.13.: B-frame lengths (in bytes) from BBB video: ECDF, fit by means of the Log-normal and Gamma CDFs (own Fig.).

Cram´ er-von Mises (or C-M) test : the estimated values of the C-M test statistic for the Log-normal distribution (7.8687) and for the Gamma 0.05 distribution (3.8668) significantly exceed the critical value cα n = c50 = 0.22 of the C-M test at the significance level of α = 0.05 for the sample size n > 50. Thus, according to the C-M test, neither the Gamma nor the Log-normal distribution can be substantiated as an acceptable model for the lengths of B-frames at the given significance level. Anderson-Darling (or A-D) test : the obtained values of the adjusted A-D test statistic A∗ for the Log-normal distribution (51.9570) and for the Gamma distribution (31.8749) are much higher than the corresponding critical values cα = 0.7513 for the Log-normal and cα = 0.786 for the Gamma distribution (cf. [AnT06], Appendix A). Thus, according to the A-D test, too, neither Log-normal nor Gamma distribution can represent an acceptable model for the lengths of B-frames at the significance level of α = 0.05. We note, however, that the values of the C-M and A-D test statistics for the Gamma distribution is nearly twice as small as the values of the test statistics for the Log-normal distribution, so that we can discriminate the Gamma distribution to deliver a better fit of the ECDF for the lengths of B-frames. Finally we summarize the results of statistical modelling for the lengths of I-, P-, and B-frames:

4.2. Modelling of Video Traffic Sources

125

I-frames: the proposed Gamma(α = 4.36599, β = 34062.30503) distribution represents a valid model for the lengths of I-frames at the significance level of 0.05 (the χ2 , K-S, and C-M tests have been passed and A-D test has been missed only slightly at the given significance level). The Log-normal(μ = 11.78714, σ = 0.51289) distribution can be discriminated to deliver a fit which is still valid but is slightly worse than the Gamma distribution (χ2 and K-S tests have been passed, C-M and A-D tests have been missed only slightly, and the value of the test statistic for the Lognormal distribution is significantly higher than the corresponding value for the Gamma distribution in each test). P-frames: the proposed Gamma(α = 1.6045, β = 18315.0195) and Log-normal (μ = 9.9687, σ = 0.8649) distributions are acceptable models according to the χ2 and K-S tests at the significance level of 0.05, but the C-M and A-D tests which put more weight at the tails of the EPMF have been missed for both distributions. In all four tests, the values of the test statistic for the Gamma distribution are significantly lower than the values of the test statistic for the Log-normal distribution, so that the Gamma distribution can be discriminated as a better fit. B-frames: the results for the B-frames are similar to the results for P-frames: the proposed Gamma(α = 0.9270, β = 11682.8967) and Log-normal(μ = 8.7850, σ = 1.1029) distributions deliver acceptable models in the χ2 and K-S tests at the given significance level, but as in the case of P-frames the C-M and A-D tests have been missed by both distributions. In all four tests, the value of the test statistic for the Gamma distribution is at least by the factor of 2 lower than the value of the test statistic for the Log-normal distribution, so that we can discriminate the Gamma distribution as a better fit, again. We remark that both the C-M and the A-D tests put more weight or discriminating power on the tails of the EPMF and are passed by the Gamma distribution for the lengths of I-frames but not for the lengths of P- and Bframes. A narrow examination of the lengths of P- and B-frames may reveal the reason for the lack of fit according to the C-M and A-D tests. Possible causes of rejection may include the occurrence of a substantial quantity of frames with the same length (peaks of the EPMF) or the occurrence of a discontinuity in the EPMF of the frame lengths. In the cases mentioned above an appropriate procedure may be to group the video frame lengths into clusters or classes as proposed in the next section.

126

4. Examples of Load Models for Different Traffic Sources

4.2.3. Partitioning into Shot Classes The results of statistical modelling of frame lengths in the last section showed that the assumed distributions for lengths of P- and B-frames have been disproved by the C-M and A-D tests which put more weight on the tails of the distribution. Reasons for the lack of fit in these cases may be among others: • Occurrence of a substantial quantity of frames with the same length represented by peaks in the plot of the corresponding EPMF. It can be induced mainly by the character of the video and the video coding standard being used. • Occurrence of a discontinuity (i.e. when no frames are generated in particular intervals of frame lengths) or presence of a number of local maxima and/or minima in the EPMF of the frame lengths (e.g. as it is the case for the lengths of P-frames with the local maxima at ca. 8550, 19000, 61000, and 87000 Bytes and the local minima at ca. 48200, 56000, and 74600 Bytes). This can be induced by the existence of fragments with different motion intensity in the analysed video. • Occurrence of a considerable number of long frames (represented by a so-called “heavy tail” of the ECDF). Although we set aside 1% of the data points from the end of the ordered sequence of frame lengths, this measure could still be not enough to allow for successful statistical modelling by means of only one single distribution. In order to deal with the above-mentioned issues one can try to partition the full range of GOP sizes into different shot classes S1 , S2 , . . . , Sn according to the GOP size. The size of the GOP is defined as the sum of the lengths of video frames which belong to the GOP. A shot class Si , 1 ≤ i ≤ n, of length k, k ≥ 1, is a union of k not necessarily consecutive GOPs from the original video. Every GOP belongs to one and only one shot class. The strength of the preprocessing step (including the partitioning and the formation of the shot classes) stems from the fact, that the lengths of I-, P-, and B-frames can be modelled more accurately if the techniques presented in Sec. 4.2.2 are applied to each of the shot classes Si separately, and not to the full range of GOP and/or frame sizes. Therefore, the number and the boundaries of the shot classes should be chosen such that each resulting partition contains a statistically significant number of GOPs. The partitioning method can be constrained by further requirements, e.g. it may be essential to establish n equal-sized shot classes (i.e. an equal number of GOPs is to be assigned to

4.2. Modelling of Video Traffic Sources

127

1: a ←− 1-percentile point of size(Gi ), 1 ≤ i ≤ ng  the range of 1% small GOPs 2: b ←− 99-percentile point of size(Gi ), 1 ≤ i ≤ ng  the range of 1% large GOPs ln(b)−ln(a)

n  ratio of the geometric progression (a · r n = b) 3: r ←− e  traverse all ng GOPs 4: for i = 1 to ng do 5: for j = 1 to n do  establish n shot classes 6: if size(Gi ) ∈ [a · rj−1 , a · rj ] then  does the size of Gi fall in the interval of Sj ?  put GOP Gi into shot class Sj 7: Sj ←− Sj ∪ Gi 8: end if 9: end for 10: end for 11: P ←− computeTransitionMatrix(S)  Calculate the inter-shot class transition matrix

Figure 4.14.: Partitioning of the GOPs into shot classes using geometric boundaries (own Fig.).

each shot class). In this case, the shot class boundaries must be set exactly at the consecutive n-quantile points obtained for the CDF of the GOP sizes. Recall that the UBA model elaborated in Sec. 4.2.2 does already support arbitrary GOP structures. In order to introduce support for different shot classes, a flexible partitioning algorithm has been implemented and included into the UBA model. The basic input parameters of the partitioning algorithm are the number n of shot classes to be created and the type partitioningType of distribution for the class boundaries (e.g. uniform, geometrical, or n-quantiles, etc.). In the following, we explain the algorithm on the example of geometric partitioning (cf. Fig. 4.14). Similar to the procedure presented in Sec. 4.2.2 the smallest and the largest 1-percentile GOPs are initially set aside as being too extreme. The GOP sizes corresponding to these 1 and 99 percentile points are referred to by variables a and b, respectively. Thereafter, the interval [a, b] of GOP sizes is partitioned into n subintervals [a, ar], [ar, ar2 ], . . . , [arn−1 , arn ] such that the successive partitioning boundaries of the intervals increase in a geometric progression with a as the first term, b = arn as the (n + 1)th term and r = e(ln(b)−ln(a))/n is the common ratio of the progression. The first subinterval [a, ar] is extended to the interval [min{size(Gi ), 1 ≤ i ≤ ng }, ar] to include the range of 1% small GOPs sizes initially set aside. Similarly, the last subinterval [arn−1 , arn ] is extended to the right to include the range of 1% of large GOPs sizes set aside [arn−1 , max{size(Gi ), 1 ≤ i ≤ ng }]. Finally, all ng GOPs are traversed and each GOP Gi is assigned to the

128

4. Examples of Load Models for Different Traffic Sources

No. of shot classes n

2

3

4

5

6

7

S1 , S1 , S1 , S1 ,

lb [Byte] ub [Byte] # GOPs # frames

55475 580756.35 350 8400

55475 391883.34 144 3456

55475 321912.78 66 1584

55475 286079.72 47 1128

55475 264435.42 36 864

55475 249985.22 27 648

S2 , S2 , S2 , S2 ,

lb [Byte] ub [Byte] # GOPs # frames

580756.35 2011244 247 5928

391883.34 860659.03 340 8160

321912.78 580756.35 284 6816

286079.72 458660.85 183 4392

264435.42 391883.34 108 2592

249985.22 350224.22 61 1464

S3 , S3 , S3 , S3 ,

lb [Byte] ub [Byte] # GOPs # frames

— — — —

860659.03 2011244 113 2712

580756.35 1047730.81 171 4104

458660.85 735353.67 229 5496

391883.34 580756.35 206 4944

350224.22 490657.04 172 4128

S4 , S4 , S4 , S4 ,

lb [Byte] ub [Byte] # GOPs # frames

— — — —

— — — —

1047730.81 2011244 76 1824

735353.67 1178964.87 74 1776

580756.35 860659.03 134 3216

490657.04 687400.58 173 4152

S5 , S5 , S5 , S5 ,

lb [Byte] ub [Byte] # GOPs # frames

— — — —

— — — —

— — — —

1178964.87 2011244 64 1536

860659.03 1275464.27 58 1392

687400.58 963034.29 68 1632

S6 , S6 , S6 , S6 ,

lb [Byte] ub [Byte] # GOPs # frames

— — — —

— — — —

— — — —

— — — —

1275464.27 2011244 55 1320

963034.29 1349191.53 47 1128

S7 , S7 , S7 , S7 ,

lb [Byte] ub [Byte] # GOPs # frames

— — — —

— — — —

— — — —

— — — —

— — — —

1349191.53 2011244 49 1176

Table 4.3.: Geometric partitioning of the BBB video into n shot classes for n = 2, 3, . . . , 7, lb: “lower bound”, ub: “upper bound”, # GOPs: “number of GOPs”, # frames: “number of frames” (own Tab.).

shot class Sj if the size of the GOP size(Gi ) falls in the interval for the shot class Sj (cf. Fig. 4.14). Results obtained from the geometric partitioning of the BBB video into n shot classes for n = 2, 3, . . . , 7 are presented in Table 4.3. The figure shows that geometric partitioning produced shot classes containing a significant number of GOPs and video frames in each class and thus allows for meaningful statistical analysis and modelling. In the next step after the partitioning, stochastic modelling techniques presented in Sec. 4.2.2 can be applied to each of the shot classes separately. In order to model the lengths of I-, P-, and B-frames from the GOPs in each particular shot class, we can reuse the main part of the universal UBA model presented in Sec. 4.2.2, Fig. 4.6, without its initialization state S i (here not to be confused with the shot class Si ). In case n shot classes are to be constructed by the partitioning algorithm, exactly n such UBA parts (U BA1 , U BA2 , . . . , U BAn ) will be needed each to model a particular shot class Si , i = 1, 2, . . . , n.

4.2. Modelling of Video Traffic Sources

129

Figure 4.15.: A universal UBA model for H.264-coded video sources with two shot classes (own Fig.).

In order to combine the UBA parts U BA1 , U BA2 , . . . , U BAn into the joint model for n shot classes, we must specify the initial shot class, which can be chosen randomly, e.g., according to the uniform distribution. For this reason we have first introduced a new initialization state S init in the joint model (cf. Fig. 4.15 for the example of two shot classes). Thereafter, we added the corresponding stochastic state transitions from S init to the R-states of the UBA parts U BA1 , U BA2 , . . . , U BAn responsible for the generation of I-frames (which are the first frames in the GOP and also in the shot class). Further, in each UBA part U BAi , the values of the size parameter for Send-I-Frame, Send-P-Frame, and Send-B-Frame requests are specified according to the Gamma distributions which were obtained for the shot class Si . Finally, additional state transitions are needed between the UBA parts to implement inter-shot class transitions (cf. Fig. 4.15 for the example of two shot classes). Each time a full GOP in the shot class Si is generated in U BAi (which is the case when N subsequent frames are generated, where N is the GOP length) the decision has to be made either to stay in the same shot class Si or to change into one of the other shot classes. For this reason, the inter-shot class transition probability matrix P is computed by the

130 from to S1 S2 S3 S4 S5 S6 S7

4. Examples of Load Models for Different Traffic Sources S1

S2

S3

S4

S5

S6

S7

0.192308 0.114754 0.052326 0.028902 0.014706 0.000000 0.000000

0.346154 0.344262 0.139535 0.023121 0.029412 0.000000 0.020408

0.307692 0.426230 0.511628 0.225434 0.117647 0.063830 0.000000

0.038462 0.065574 0.244186 0.554913 0.352941 0.063830 0.040816

0.000000 0.032787 0.029070 0.138728 0.352941 0.234043 0.040816

0.038462 0.000000 0.011628 0.028902 0.102941 0.574468 0.102041

0.076923 0.016393 0.011628 0.000000 0.029412 0.063830 0.795918

Table 4.4.: Inter-shot class transition probability matrix P for n = 7 shot classes S1 –S7 (own Tab.).

partitioning algorithm and transitions between shot classes are implemented by means of additional stochastic transitions from the D-state immediately preceding the generation of the I-frame (and so the generation of the next GOP) in each of the UBA parts U BA1 –U BAn . In case of 2 shot classes the corresponding stochastic state transitions are added to the D-states DBI−S1 and DBI−S2 (cf. Fig. 4.15). The partitioning algorithm computes the |S|×|S| inter-shot class transition probability matrix P where pij gives the probability of transition from the shot class Si to the shot class Sj . The matrix P has the stochastic property |S| that j=1 pij = 1 for i = 1, 2, . . . , |S|. The transition probabilities pij are calculated from normalized relative frequency of transitions among shot classes as the algorithm sequentially traverses all ng GOPs G1 , G2 , . . . , Gng in the original video. The algorithm computes pij = (fij /fi ), where fij is the total number of transitions from Si to Sj , and fi is the total number of transitions out of Si . An example of the inter-shot class transition probability matrix P calculated for the case of 7 shot classes is shown in Table 4.4. The principal diagonal elements of P are allowed to be not zero because the joint model allows self-transitions among shot classes, i.e. it can remain in the same shot class after the generation of each subsequent GOP. An alternative approach may track the sequences of consecutive GOPs from the original video belonging to the same shot class (we refer to such sequences also as segments) and try to find an appropriate statistical distribution for the lengths of the segments. Supposed the distribution for the lengths of segments is known, the number of GOPs to be generated in a shot class Si can be determined according to that distribution and the decision to stay in the shot class Si or not does not have to be made after the generation of each subsequent GOP. Instead of that, the shot class has to be changed after the number of GOPs specified by the chosen distribution has been generated. For this

4.3. Modelling of Web Workloads with UniLoG

131

alternative approach, the principal diagonal elements of the inter-shot class transition matrix P would all be zeros (i.e. all self-transitions among shot classes will be ignored in the calculation of P ). The presented model may also be used in cases when different GOP structures have been applied to encode the original video sequence. Shot classes may be used to represent video fragments with identical GOP structure being used by the coder. Further, more comprehensive partitioning methods like the k-means clustering algorithm (cf. [Aal11, Wu12]) can be used to obtain the shot classes as clusters of frame lengths from a given sequence of GOPs. We leave this topic for further study.

4.3. Modelling of Web Workloads with UniLoG In this section we present the application of the UniLoG approach to modelling and generation of Web workloads. In particular, we discuss the prevalent characteristics of Web workloads and give their exact definitions as they will be used in the thesis. The corresponding methods for the measurement and estimation of concrete values for the specified workload characteristics will be described in Sec. 9.3 in Chapter 9. Because of the pre-eminent importance of the World Wide Web (WWW) we set a goal to embed the capability of generating realistic Web traffic and Web server loads into the UniLoG approach, too. In general, the following strongly different modelling approaches (MA) are possible and may be useful for various kinds of studies when characterizing loads at the corresponding HTTP application service interface and the resulting Web traffic induced by these loads: (MA1 ): Realistic description of sequences of requests as they are passed from the virtual users V U1 –V Un to the HTTP interface IFc (HT T P ) within the Web client (cf. Fig. 4.16). Generating a realistic sequence of requests at interface IFc (HT T P ) allows one, with only little expenditure, on the one hand to generate some specific load for a Web server W S1 –W Sk (in terms of client requests of varying complexity) and on the other hand to generate some specific load for the network (in terms of Web traffic of different structure and intensity) [BaC98, ACC02, OSPG09]. (MA2 ): Realistic description of sequences of requests as they are passed, both, in the Web client (as client requests) and in the Web server(s) W S1 –W Sk (as server responses) to the corresponding HTTP interfaces IFc (HT T P ) and IFs (HT T P ) (cf. Fig. 4.17). As server responses are

132

4. Examples of Load Models for Different Traffic Sources Environment (E) VU1

VU2



System (S)

VUn

requests / responses

WS1 IFC(HTTP) IFS(HTTP)

HTTP



WSk

requests / responses

HTTP

TCP

TCP

Figure 4.16.: Modelling approach MA1 (own Fig.).

generated as a consequence of client requests, precise modelling here requires coordination of load generation at both interfaces – and this is a tedious task. The concurrent generation of sequences of requests at interfaces IFc (HT T P ) and IFs (HT T P ) replaces the Web server being used in MA1 by load generating components [SCK03, KRL08]. Therefore, generation of load here gets much more complicated because a sufficiently realistic model for the Web server is required and load generation in the client and the server have to be coordinated. Moreover, this approach does no longer allow the dedicated loading of a Web server with client requests. Environment (E) VU1

VU2



VUn

requests / responses

HTTP

WS1 IFC(HTTP)

IFS(HTTP)

System (S)

TCP



WSk

requests / responses

HTTP TCP

Figure 4.17.: Modelling approach MA2 (own Fig.).

(MA3 ): Realistic description of the sequence of requests at the TCP service interface (IFs (T CP ) in the server) or IP packets (as observed at the IP interface IFs (IP ) in the Web server) as they are induced by single or an overlay of client requests to the Web server(s) W S1 –W Sk (cf. Fig. 4.18). The TCP requests or IP packets result from the transmission of the client requests and the corresponding server responses. This modelling approach fundamentally differs from MA1 and MA2 as it assumes a completely

4.3. Modelling of Web Workloads with UniLoG

133

different interface for the description and injection of load (TCP or IP instead of HTTP interface). The approach is useful in cases, when, e.g., streams of TCP requests have to be injected into a network in a manner as they would have been induced by single or an overlay of Web server accesses. Most of the currently existing literature on characterization and generation of HTTP loads [SoB04, CCG04, LAJ07, RRB07a, ViV09] follows this modelling approach, though, in order to be precise, one would have to consider it as a characterization and generation of TCP packet streams. Here also, the dedicated loading of a Web server with client requests is impossible using this approach. Environment (E) VU1

VU2



VUn

WS1

requests / responses

HTTP TCP



WSk

requests / responses

IFC(TCP)

IFS(TCP) System (S)

HTTP TCP

Figure 4.18.: Modelling approach MA3 (own Fig.).

MA1 is the approach followed by us in this thesis, because we want to generate load also for the Web server, and not only for the network (which is impossible when using approaches MA2 or MA3 ). Moreover, we want to inject the load directly at the HTTP interface and, therefore, MA3 is again not an alternative to the MA1 approach. Consider that a survey on the tools for Web workload generation has been given in Sec. 2.2. In the following Sec. 4.3.1 we present the application of our UniLoG approach to modelling and generation of Web workloads. For the architecture of the corresponding UniLoG.HTTP adapter we refer the reader to the Sec. 9.1 in Chapter 9 of this thesis. Among others, the algorithm for the allocation of real requests from a pool of Web sites is explained in Sec. 9.1. Algorithms for the estimation of concrete values for the important characteristics of Web workloads and the construction of a sufficiently large, representative and stable pool of Web sites for load generation are presented in Sec. 9.3.1 and Sec. 9.3.2, correspondingly.

134

4. Examples of Load Models for Different Traffic Sources

4.3.1. UniLoG Approach for Web Workload Modelling and Generation The typical behaviour of a user navigating Web sites by means of a Web browser is presented in Fig. 4.19 (left). A typical Web page consists of a base page and further objects linked into the base page from different origins (e.g., from the same or other servers of the Web site’s provider or from other third-party or non-origin servers). With the first initial HTTP request the browser retrieves the base page, parses it and issues additional HTTP requests required to retrieve the objects linked into the page (we call this process “rendering” of a Web page). System S

Environment E IFc(HTTP)

HTTP user o UBA (abstract requests) RGetPage

(HTTP requests)

Web server www.foo.com

Si

1.0

GetPage (www.foo.com/page1.html)

d1 1.0 d2

SLoading

d3 “Page 1 ready“

DView

RGetPage SLoading

If (below(Time, T))

GetPage (www.foo.com/page2.html)

RGetPage SLoading

“Page 2 Error“

d4

DView

DView

if (equal(status, READYSTATE_COMPLETE))

GetPage (www.foo.com/page3.html)

RGetPage SLoading

d5

If (above(Time, T))

St

“Page 3 ready“

Figure 4.19.: Retrieval of multiple pages from the Web server www.foo.com and the corresponding UBA model (own Fig.).

However, the seemingly simple task of loading a Web page today requires the client side browser to execute a considerable number of actions, among others: 1. The domain name of the server or host to be contacted is extracted from the host part of the URL submitted to the browser. In the following, a DNS request to the preconfigured DNS server is made in order to determine the IP address of the server/host to be connected to (in case the corresponding IP address is not already known and stored in the local DNS cache).

4.3. Modelling of Web Workloads with UniLoG

135

2. The client browser establishes a new TCP connection to the Web server using the IP address determined in the previous step and the port number specified in the URL. If no port number has been specified in the URL, a default port 80 is used for HTTP and port 443 for HTTPS connections. 3. The browser issues the first (initial) HTTP request to the base Web page submitted in the URL. The content of the base page is rendered and additional HTTP requests required to download the objects linked into the page (e.g., CSS, JavaScript, image, and Flash objects) are issued. Depending on the origin of the linked objects, the browser may be required to establish TCP connections to the corresponding non-origin servers before the objects can be downloaded. Depending on how the objects are linked into the page, additional DNS requests may be required when the corresponding IP address of the non-origin server is not known. In order to optimize the complex process of downloading a Web page, a series of optimizations have been proposed since the initial version of the HTTP/1.0 protocol. Such, in HTTP/1.1 a keep-alive-mechanism was introduced, which allows the browser to reuse a TCP connection for more than one HTTP request. Such persistent connections reduce request latency perceptibly, because the client does not need to re-negotiate the TCP 3-wayhandshake connection after the first HTTP request/response pair. Further, the browser was allowed to establish many TCP connections to the server in order to download different objects from the same server concurrently. Finally, an HTTP pipelining technique further reduces the overall rendering time of the page, allowing clients to send multiple requests before waiting for each response. When the unified workload generation approach is to be applied to the modelling of Web workloads using the alternative M A1 (described in the beginning of Sec. 4.3), we first have to identify the real HTTP service users at the interface IF = IFc (HT T P ) in the Web client which are relevant for the given modelling scenario. In the RFC 2616 HTTP service users are called user agents and may be represented, among others, by different Web browsers or automated Web caching, search, and indexing engines (so called “Web crawlers”). We model the relevant HTTP service users by means of a set of virtual HTTP users which are considered to be part of the environment E in the UniLoG approach. Further, each of the virtual users is represented by a corresponding UBA describing its behaviour at the HTTP service interface in the Web client. Finally, components of the protocol stack below the HTTP interface in the Web client, the Web server(s, if many), and the

136

4. Examples of Load Models for Different Traffic Sources

communication network are considered to be part of the service system S (cf. Fig. 4.19). Possible types of abstract HTTP requests In the next step, each type of virtual HTTP users has to be described by means of the corresponding UBA model. For this reason, we have to: 1) identify the possible types of abstract HTTP requests from the virtual users, and 2) describe the possible sequences of requests by means of states and transitions in the UBA. For the definition of the possible abstract request types at the HTTP service interface, the experimenter can take as orientation the HTTP/1.1 protocol specification (cf. RFC 2616) which provides nearly a dozen of different request methods for the interaction with the Web server. These request methods follow the general request/reply schema but the most frequently used method in current Web applications is the GET method. The PUT method is used much less frequently to transmit large amounts of data to the Web server (e.g., picture files or parameters of an HTML form to be submitted). The UniLoG approach allows a very flexible definition of abstract HTTP request types, e.g., of different complexity and abstraction levels. For example, a relatively simple abstract request type GetObject(ObjectURL) can be defined to issue requests to the resource identified by the URL submitted in the ObjectURL attribute from the host specified in the host part of this URL3 . The request type is yet relatively simple as it proclaims to induce only one single real HTTP request to the resource specified in the ObjectURL attribute (which may be given, e.g., by an HTML, graphic, or JavaScript file). A request type with more complex and abstract semantics GetPage( ObjectURL) may be defined to model not only a single HTTP request to the resource located at ObjectURL on the server/host identified by the host part of the specified URL but also the additional HTTP requests required to download all objects linked into the page. In this case, the UniLoG HTTP adapter is responsible for rendering the base object obtained by the first request to the resource ObjectURL and issuing the corresponding HTTP requests required to download all embedded objects of the page. The objects may be given by further HTML, CSS, graphic, audio and video files, etc., 3 Note

that the ”Host” field is mandatory in the HTTP/1.1 request header and distinguishes between various DNS names sharing a single IP address, allowing name-based virtual hosting on the same server.

4.3. Modelling of Web Workloads with UniLoG

137

and may be linked from the same (“origin”) server or from other (“nonorigin”) servers. The execution of the requests of type GetPage(ObjectURL) is, therefore, significantly more complex for the UniLoG adapter (compared e.g. to the GetObject(ObjectURL) defined above). Remember that the semantics of each abstract request type at the real HTTP service interface must be specified by the experimenter in the element of the corresponding XML complex type describing the abstract request type. In this way, the UniLoG adapter can resolve the complex HTTP request types like GetPage(ObjectURL) and recognize the need to issue additional HTTP requests to the linked objects. The abstract request types GetObject(ObjectURL) and GetPage(ObjectURL) may be very useful, e.g., to test under load a set of specific functions (accessible via the corresponding URLs) of a Web application or a Web service deployed onto the host or server specified by the Host attribute (we call this type of scenario “Workload for the Web server”). In order to be able to test each particular function effectively it is important that the exact location (e.g. the corresponding URL) of the function (resource) to be requested is specified in the objectURL attribute of the GetObject(ObjectURL) requests. The request types may also use more abstractly defined request attributes which address various characteristics of the workload to be induced in the Web server or in the network. For example, it may be necessary to generate workload for the network in terms of Web traffic of different structure, intensity, and traffic matrix (we call this scenario “Workload for the network”). An abstract request type GetPageWithMetrics(hostname, numberOfObjects, replySize) can be used to model requests to the Web pages from the server hostname which requires exactly numberOfObjects HTTP requests to be loaded and induces a server reply with the total size specified in the replySize attribute (including the size of the base object and all embedded objects). In this scenario, it is only important that the objects/pages are requested from the server(s) specified in the hostname attribute (e.g., in order to generate traffic with a predefined matrix) and that the complexity of these pages matches the specifications in the numberOfObjects and replySize attributes. Note that the page to be requested from the server hostname is not explicitly specified in the request type because it is not relevant in the scenario. Therefore, the corresponding attribute ObjectURL (used to explicitly specify the exact location of the page on the server) has been omitted in the request type. However, at the time of load generation, the UniLoG HTTP adapter will have to choose one particular page from the server Host which matches the specified complexity characteristics “as good as possible” in order to issue real HTTP requests to the corresponding URL.

138

4. Examples of Load Models for Different Traffic Sources

In the “Workload for the Web server” scenario (e.g., in order to analyse the utilization level of some known Web server), a delay time induced on the server to prepare and deliver the Web page can be a server workload characteristic of particular interest. In this case, we can include the abstract inducedServerDelayTime attribute in the definition of the corresponding abstract request type GetPageWithDelay(hostname, inducedServerDelayTime). Again, it is not relevant which unique pages (identified by the corresponding unique URLs) are loaded from the server hostname. But the overall delay time induced on the server hostname for the delivery of the page should match the specification in the inducedServerDelayTime attribute. Supported abstract request attributes From the explanations above it becomes evident (and we have already emphasized it in Sec. 2.3 that the choice of the abstract request types and also their attributes is strongly dependent on the particular Web workload scenario and on the target and purpose of the workloads to be generated (e.g. the type of Web application or service to be tested under load etc.). So, the attributes of abstract HTTP requests can be related to the concrete parameters of HTTP request methods (the URL identifying the resource to be requested is the most important parameter) or to the HTTP header fields declared in RFC 2616 (e.g. Host, User-Agent, Accept, etc.). RFC 2616 specifies a total of 50 different HTTP header fields, whereof 32 fields may be used in the request header (the Host header field is even mandatory since HTTP/1.1 in order to facilitate the shared virtual hosting service for Web sites) and 38 fields may be used in the header of the server response. As it has been already explained above, the exact specification of the resource to be requested will be indispensable in the abstract HTTP requests (e.g., in the objectURL attribute) in order to test a set of specific functions of a Web application or a Web service (which are usually accessible via specific URLs). In cases when the exact location of the resources on the Web server is not important, the full URL specification is not needed and the attribute Hostname containing only the domain or host name of the target Web server (e.g., www.foo.com) can be used in the abstract HTTP request types. The attributes may be defined also in a more abstract manner, e.g., as general characteristics of the induced workload which must not have a corresponding field in the HTTP header and may also not necessarily correspond to exactly one HTTP request (but may induce a number of requests resulting in a workload with specified characteristic). For example,

4.3. Modelling of Web Workloads with UniLoG

139

the abstract request attribute numberOfObjects may be declared to model the total number of embedded objects in the page or the attribute replySize denoting the overall size of the server’s reply (the size of the base page object including the size of all embedded objects). A thorough review of recent studies on Web workload modelling and characterization [BaC98, CCG04, WAW05, OSPG09, BMS11, IhP11] revealed that there exist a number of different abstract Web workload characteristics which do not have corresponding fields in the HTTP/1.1 header but may have a significant impact on the workload induced in the Web server and the network. Thus, we decided to provide support for the following abstract Web workload characteristics as attributes of abstract HTTP requests in the UniLoG load generator: numberOfObjects specifies the number of objects fetched and thus the total number of HTTP GET requests to be issued to load a Web page. Note that the objects may be linked from the same (“origin”) server accommodating the base page of the site or from the other (“non-origin” or third-party) servers. The characteristic can be further broken down into the particular Multi-Purpose Internet Mail Extensions (MIME) types (e.g., image, CSS, text/html, JavaScript, or Flash) across which the objects are spread. requestSize is the total amount of data (in bytes) transferred from the client to the Web server(s) involved in delivery of a Web page. The initial HTTP GET request to the base page and the subsequent requests to the objects linked into the page from its origin and non-origin server(s) contribute to the requestSize. Some of the HTTP request header fields (e.g. Accept, User-Agent) may significantly influence the size of HTTP GET request messages [WeX06]. replySize is the total size of response data (in bytes) retrieved from the server(s) involved into the delivery of a Web page including the size of all embedded objects. The characteristic may be also specified to exclude the amount of data loaded from non-origin server(s). It can be further broken down into different content types in order to specify their contribution to the total number of bytes downloaded. Along with the numberOfObjects attribute this characteristic describes the complexity of the page content. complexity is provided to characterize the structural complexity of the page in general and can be specified by means of a complexity class derived from the values of content complexity characteristics numberOfObjects and replySize (which do not consider the induced server load directly).

140

4. Examples of Load Models for Different Traffic Sources

The class boundaries used to separate different complexity classes can be specified by the experimenter in the UBA. numberOfServers keeps a record of the distinct servers involved into the delivery of a Web page. For example, loading www.foo.com may require retrieving content not only from other internal servers such as scripts.foo.com or images.foo.com, but may also involve third-party services such as CDNs (e.g., Akamai, Limelight), analytics providers (e.g., Google analytics) to track user activity, social network plugins (e.g., Facebook, LinkedIn), and advertisement services (e.g., doubleclick) to monetize user visits. Generally, an increasing number and contribution of various servers and administrative origins involved in loading a Web page indicates a higher complexity of the Web page service. inducedServerLoad characterizes the amount of time required for the server to respond to the client requests and can be specified by means of a server delay class derived from the values of the delays (d1 -d5 , cf. Fig. 4.19, left) induced on the server(s) involved into the delivery of a Web page by the requests to the page itself and all its embedded objects linked from origin or non-origin servers. The class boundaries used to separate different classes of inducedServerLoad (e.g., {Immediately, Fast, Slow, Annoyingly slow}) can be specified by the experimenter in the UBA. Consider that client requests may induce further search, authentication or database retrieval activities on the server side and, therefore, they usually differ significantly in this factor. Along with the numberOfServers, the characteristic describes the service complexity of a Web page. Note that we do not intend to exactly capture the complexity of server-side infrastructure of Web sites. The methods to estimate the values of workload characteristics for Web pages defined above will be presented in Sec. 9.3.1. Here, we only note that the workload characteristics have been chosen to be widely independent from the geographic point of their measurement (so called “vantage point” cf. [BMS11]). Similarly, in [BMS11] results of measurements for only one of the vantage points have been presented because results of measurements from the other vantage points are similar for the chosen workload characteristics. It is generally possible to specify further advanced Web workload characteristics (which are not explicitly provided in the set of abstract request attributes presented above) using the UBA concept introduced in Chapter 3, for example:

4.3. Modelling of Web Workloads with UniLoG

141

User think time: is an important property of Web traffic to capture its bursty nature [BaC98]. It can be specified in D-states of the UBA as inter-arrival time between the subsequent requests. Page popularity: can be defined, e.g., as relative number of requests made to individual pages on the same server. It can be specified, e.g., by means of a frequency or Zipf-like distribution for the objectURL attribute (which contains the full URL of the resource/page to be requested). Temporal locality: of page requests (referring the likelihood that, once a page has been requested, it will be requested again in the near future) can be characterized by means of a distribution of stack distances [KRL08] for the objectURL attribute of abstract HTTP requests in the request sequence to be generated. Specification of possible sequences of abstract requests In order to describe the possible sequences of abstract HTTP requests we use the UBA concept introduced in Chapter 3. It is recommended to define individual UBAs to describe each type of user behaviour (or types of virtual HTTP users) in the load scenario. The user actions which may be described in the UBA are, e.g., open a new page, click a specific link, click a random link, or submit a form with specific parameters in the URL. More application-specific types of user interactions for online book stores, auction sites, bulletin boards, and online news forums can be found, e.g., in [ACC02]. For example, when load is to be generated for an e-commerce site (which also belongs to our “Workload for the server” scenario category), the following types of virtual users can be implemented by means of the corresponding individual UBAs: • Only browsing items (by following the links specified, e.g., in the objectURL attribute of the subsequent GetPage(objectURL) requests), • Searching for products (by typing in the keywords and submitting the search form), • Browsing and viewing product detail pages, • Adding items to a shopping cart and abandoning, • Comparing selected products in a grid, • Logging in to an account (using the specified user name and password),

142

4. Examples of Load Models for Different Traffic Sources

• Completing checkout / logout process. In each of the cases above we can use a generalized UBA template presented in Fig. 4.19, right. The virtual user starts in the state Si and leaves this state after the corresponding browser instance is completely initialized. The abstract request type GetPage(objectURL) is used in the state RGetP age in order to model HTTP GET requests to the different resources / pages located on the server specified in the host name part of the URL submitted in the objectURL attribute. Consider that we have to specify the exact location of the resources to be requested on the server because we intend to test specific functions of the e-commerce site. So, in order to implement the types of virtual users listed above, the corresponding URLs can be supplied, e.g., by means of a trace file and assigned to the objectURL attribute of the request type GetPage(objectURL) in the state RGetP age . After the generation of each GetPage(objectURL) request the user changes its state in the UBA into the blocking system state SLoading and resides in this state until the page is loaded completely (i.e. until all its embedded objects are successfully retrieved or an error/status code is returned). There are, in general, other possibilities to model this behaviour (e.g., to leave the blocking situation in SLoading immediately after the base object of the page was delivered). It is also possible to take different actions or to change into different UBA states depending on the result of loading the page (indicated by the system reaction messages corresponding to the HTTP status codes in the system state SLoading ). The user-dependent delay time (the so-called “think time”) between subsequent page requests is modelled in the delay state DV iew . After the specified user think time the UBA state is changed again to RGetP age and the next page request is generated to model the next user activity (e.g., following a link on the current page, submitting a user name and a password, clicking on the button, or entering a new URL in the address line of the browser). The UBA can be executed in this manner until the upper limit specified by the experimenter in the context variable T for the load generation time (held in the context variable T ime) is exceeded. Finally, in order to complete the specification of the load scenario, the experimenter has to determine the number of virtual users of each type or, alternatively, the fraction of each virtual user type in the total number of virtual users in the scenario.

Part III.

Workload Generation

5. Architecture of the Unified Load Generator In this chapter we describe the architecture of the UniLoG workload generator which has been designed according to the unified load generation approach presented in Sec. 2.3. The basic principle underlying the design and elaboration of the UniLoG architecture is to start with an abstract workload model which is described by means of a corresponding User Behavior Automaton as introduced in Chapter 3. The execution of the UBA in the modelling domain results in a sequence of initially abstract requests which are generated according to the specifications in the UBA. Further, a set of interface-specific adapters is used to generate concrete workloads and/or traffic at a chosen target service interface IF by means of execution of the corresponding system or API calls induced by the abstract requests.

5.1. Basic Requirements The requirements to the UniLoG architecture to be designed in this thesis result mainly from the issues which must be solved when the unified load generation approach presented in Sec. 2.3 is applied to IP-based networks. According to the UniLoG approach the resulting tool should be able to unify the complete procedure of load generation at different interfaces in IP-based networks. Therefore, a comprehensive set of requirements to the projected architecture is emerging. The requirements can be classified into functional requirements (which address the concrete features and tasks to be supported in the architecture) and non-functional requirements (referencing such important aspects as correctness, real-time requirements, extensibility, scalability, etc.).

5.1.1. Functional Requirements In order to be compliant with the UniLoG approach the architecture designed in this thesis should support the following set of basic functions: © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_5

146

5. Architecture of the Unified Load Generator

1. Specification of different workload models for various application and service users in IP-based networks (e.g., IP, TCP, UDP, and HTTP service interfaces) in form of UBAs. The formal workload description technique presented in Chapter 3 must be integrated into the UniLoG architecture in order to accomplish this task. The architecture should provide a corresponding Graphical User Interface (GUI) to facilitate the construction of new workload models for the experimenter. 2. Specification of values for different workload parameters in UBA models. This feature can be provided by means of the corresponding functions in the GUI or by means of direct modification of the UBA XML file. The integration of the formal workload description technique in UniLoG is an important prerequisite to support this feature, too. 3. Execution of the UBA model and generation of a sequence of initially abstract requests (ti , ri ), i = 1, 2, . . . , n, n ∈ N, where ti denotes the arrival time of the abstract request ri at the target service interface IF . Further, evaluation of different statistics for the generated sequence of abstract requests should be possible. 4. For each abstract request (ti , ri ), preparation of the corresponding real request (t∗i , ri∗ ) and execution of the service and API calls induced by this request at IF . Evaluation of different statistics should be also possible for the generated sequence of real requests.

5.1.2. Non-functional Requirements In the following we discuss the non-functional requirements to be met by the UniLoG architecture. Modularity: The overall architecture is to be subdivided into functional components with strict responsibilities and well-defined interfaces (thus making the components potentially interchangeable). Modular design is an important prerequisite for the extensibility of the UniLoG architecture. For example, it should be differentiated between the components responsible for the specification of load models (UBAs), the generation of abstract requests, and the generation of real workloads as a sequence of requests at a chosen target service interface. Correctness: The implementation of the UniLoG architecture must be correct in the sense that the generated loads must exactly correspond to the

5.1. Basic Requirements

147

specifications in the UBA model. As a further consequence, components responsible for the generation of real requests at IF must ensure that • the requests injected at IF are conform with the service specification at IF , • the values specified by the experimenter for the attributes of abstract requests in the UBA are used in a correct manner for the corresponding parameters of real requests, • the real requests are injected accurately in time, i.e. the execution of system and API calls induced by the abstract requests at IF is initiated exactly at the time instants specified in the UBA. Real-time requirements: We remember that in general a system is said to be real-time if the total correctness of an operation depends not only upon its logical correctness, but also upon the time in which it has been performed [KiF08]. The UniLoG architecture in this thesis has to be designed and implemented on the basis of an operating system without explicit real-time extensions (like different Linux derivatives or Windows desktop versions). On such systems many other application or system processes can be executed concurrently and compete with the UniLoG process for the processing time. While the application processes and noncritical system processes can usually be deactivated for the time of UniLoG execution, more critical parts of the system running in the user-space cannot be switched off or assigned a lower priority because of an increasing risk of a critical operation failure. Further, the time- and security-critical system processes running in the kernel-space do not participate in the procedure of normal task scheduling at all (therefore having a prioritized access to the system resources). For these reasons we cannot claim hard real-time requirements for the UniLoG architecture. Recall that the hard and soft real-time systems, as well as their deadlines, are classified by the consequence of missing a deadline. In hard real-time systems missing a deadline leads to a total system failure. In soft real-time systems the usefulness of a result degrades after its deadline, thereby degrading the system’s quality of service [KiF08]. Thus, as we cannot require that all requests meet their deadline (to satisfy hard real-time requirements), our goal is that the number of requests which meet their deadlines is maximized and the lateness of requests is minimized in the implementation of the UniLoG architecture (so that we claim soft real-time requirements). In time intervals when a certain number of requests couldn’t meet their deadlines the traffic generator should not

148

5. Architecture of the Unified Load Generator

produce a total system failure but only its accuracy and performance characteristics (determined e.g. by the corresponding measures such as mean deviation from the specified request injection times and packet/data rate, respectively) are allowed to degrade. Soft real-time requirements may be satisfied by over-provisioning of the system resources on one hand and, on the other hand, by proper design of each component of the UniLoG architecture. As a consequence, the implementation choices which have to be made for the UniLoG architecture represent the most critical issue to construct a high-performance traffic generator and may determine the range of its applicability. Extensibility: It is essential that the architecture of UniLoG is designed to be extensible in the following ways: • Creation of additional workload models should be possible without additional programming effort. The formal workload description technique presented in Chapter 3 of this thesis should be integrated in the UniLoG architecture to support and facilitate the construction of new workload models. • The set of application and service interfaces supported by UniLoG (e.g. IP, UDP, TCP, and HTTP service interfaces) should be extensible. Therefore, the interfaces of the components responsible for the generation of abstract and real requests should be clearly specified (to be able to add support for further service interfaces later). Flexibility: This relates to, both, the operation and use of the load generator: • It should be possible to modify the characteristics of the workloads to be generated by changing the values of the corresponding parameters in the UBA without additional programming effort, so that the new parameter values are available for the load generation immediately after the UBA has been reloaded. • Support for complex user environments (cf. Sec. 3.7) is a very useful and flexible feature to generate workload mixes at different service interfaces in IP-based networks. This feature is supported by the formal workload description technique presented in Chapter 3 and should be provided in the implementation of the corresponding architecture, too. We remark that from the perspective of extensibility and flexibility requirements, an implementation of the UniLoG architecture as a user-space application has a significant advantage over an implementation as a system module in the kernel space.

5.1. Basic Requirements

149

Scalability: This refers to the ability of the load generator to increase the number of emulated/virtual users without lot of effort. Methods for the specification of complex user environments (containing many and also different types of virtual users each described by a corresponding UBA) have been elaborated for the formal workload description technique (cf. Sec. 3.7) and the corresponding functionality must be provided by the UniLoG architecture. Portability: (e.g. between different Linux derivatives and Windows versions) may be an important aspect for the implementation of the UniLoG architecture. However, it could hardly be satisfied for all components of the architecture, especially for the interface-specific modules responsible for the injection of real requests at IF . For the components responsible for the specification, parsing, and execution of UBA models this requirement is not critical because they are not required to execute interface-specific system calls at IF . We note that also from the portability perspective, the implementation of the UniLoG architecture as user-space process has a clear advantage over the implementation in the kernel-space. Further requirements: Accuracy can be understood as a part of correctness in respect to the injection times of requests (see the topic “correctness” above). And, when we have measured the accuracy of the particular request-injecting component, we can specify its supported precision as an important system characteristic. Robustness has already been (partly) mentioned in the real-time requirements. For the injection of real requests, robustness means that the traffic generator should not produce a total system failure when a certain number of requests cannot meet their deadlines. Further, we claim for all components of the architecture that they should be able to handle system errors and exceptions (e.g. during the construction and parsing of the UBA as well as during its execution and generation of real requests at IF ). At this place we intentionally avoid to make any requirements regarding performance characteristics of the load generator (expressed e.g. by means of the maximum packet or data rate achievable with requests of predefined length) because these characteristics are specific for the concrete target service interface.

150

5. Architecture of the Unified Load Generator Generator

LoadSpec (UBA) Ii Ia

1.0

Ii

Si

Ia 1.0

Ib

RGet

SBlocked

0.99998

DOFF

0.00002

It

1.0

real (t*i, r*i) request

Ib

RGet

0.00002

Adapter (e.g. for IP)

(ti, ri)

Si

RQ SBlocked

0.99998

DOFF

1.0

St

UBA parser

1.0

(tj, ej)

1.0

St

Request and Event Mapper

It

Request Injector

IP service interface socket.send()

Status codes

(t*j, e*j)

Event Capturer

ICMP messages

IP

M A C

EQ

~TG

Scheduler

abstract requests (ti, ri) real requests (t*i, r*i)

~TA

real system reactions (t*j, e*j) abstract system reaction messages (tj, ej)

Figure 5.1.: Overview of the basic UniLoG architecture (own Fig.).

5.2. Overview of the UniLoG Architecture According to the requirements specified in Sec. 5.1 the overall architecture of the UniLoG traffic generator has been subdivided into the following basic functional components (cf. Fig. 5.1): LoadSpec: this module assists the experimenter with the specification of workload models in form of a UBA by means of the graphical user interface. A set of predefined UBAs for VoIP and video traffic sources along with a set of predefined types of abstract requests and system reactions for a number of common service interfaces in IP-based networks (e.g., IP, UDP, TCP, and HTTP service interfaces) are provided with this module. Further, the experimenter can use LoadSpec to provide values for different UBA parameters (like request attributes in R-states, delays between subsequent requests in D-states, types of system reactions in S-states, context expressions in conditional state transitions etc.). LoadSpec stores a new UBA in a UBA file format according to the UBA XML schema definitions presented in Sec. 3.3 of this thesis. UBA Parser: is responsible for the transformation of the UBA description provided in form of a UBA XML file into the internal structures (objects) used by the Generator component (see below) for the execution of the UBA model. Among others, the Parser is responsible for the lexical analysis of the UBA file and validity checks of the UBA description against the UBA XML schema definition specified in Sec. 3.3. Generator: is responsible for the execution of the UBA in the modelling domain using logical time t. The result of the UBA execution is a sequence of initially abstract requests (ti , ri ), i = 1, 2, . . . , n, n ∈ N, where ti denotes

5.2. Overview of the UniLoG Architecture

151

the arrival time of abstract request ri at IF in the modelling domain. The time instants ti are initially logical and relative to the beginning of the UBA execution. The Generator starts the execution of the UBA always at logical time t0 = 0.0. The actions executed in the Generator in each state are dependent on the type of the state (R-, D-, or S-state). The current logical time t in the UBA is advanced in D-states according to the specified delays between subsequent abstract requests (generated in R-states) or abstract request and the corresponding abstract system reactions (generated in S-states). In R-states the Generator produces a new abstract request (ti , ri ) of the abstract request type assigned to the particular R-state. The actions taken in S-states depend on the specified type of system reactions to be interpreted. The corresponding actions are described in Sec. 5.3 along with further details of the Generator’s functionality. Adapter: is responsible for the generation of concrete workloads and/or traffic by means of execution of the corresponding system or API calls induced by the abstract requests (ti , ri ) at a chosen target service interface IF (e.g. at the IP service interface). In order to do so, the Adapter observes the request queue Request Queue (RQ) for new abstract requests (ti , ri ) inserted by the Generator and prepares the corresponding system calls (t∗i , ri∗ ) which are also referred to as real requests at IF . Consider that the time instants t∗i denote the physical injection times of real requests ri∗ and have to be calculated by the Adapter from the logical times ti . Further, in order to obtain ri∗ from ri , the Adapter does always contain an interface-specific part which is responsible for the mapping between abstract and real requests and request attributes (cf. the following Sec. 5.4). For example, in order to inject conform IP packets at the IP service interface, the UniLoG.IP adapter contains a special mechanism for mapping of the abstract IP requests from the UBA onto the corresponding methods and parameters of the raw IP socket used for packet injection. In case the given UBA contains S-states, the Adapter has additional responsibilities. First, the system events occurred at IF are to be captured and the events being relevant for the current UBA are to be identified by the Adapter. Next, on the basis of the captured system events, abstract system reaction messages are created and inserted into the event queue Event Queue (EQ) which is observed by the Generator in the S-states of the UBA. The functionality of the Adapter is described in Sec. 5.4 in more detail.

152

5. Architecture of the Unified Load Generator

Scheduler: implements a cooperative scheduling mechanism used to manage the processing time during the concurrent execution of the Generator and the Adapter. In particular, the real-time requirements of requests are to be taken into account (see Sec. 5.5). It should be noted that the formation of the separate Generator and Adapter modules in the architecture is motivated in the first place by the basic principle of the UniLoG approach to separate the abstract specification of workloads from their physical generation in form of real requests at a target service interface. In this way, the total processing time needed per request is split in two separate parts: 1. the total processing time TG required in the Generator to generate the next abstract request (ti , ri ) (including the time required for further related actions in the UBA, e.g. updating the context variables, evaluating the context expressions, determining the next state, etc.), and 2. the processing time TA required in the Adapter to prepare the corresponding real request (t∗i , ri∗ ) and to inject it at the target service interface IF . Could the Generator and the Adapter be executed always in parallel, the total processing time needed per request would be theoretically determined by the maximum of the times TG and TA and not any more by the total time (TG + TA ) required in case of their serial execution. Considering the actions taken in the Generator and Adapter we remark that when the first of these components is blocked, the second one can be activated in many cases to execute its upcoming actions, and vice versa. For example, the Generator can produce a number of abstract requests in advance when the Adapter is waiting for the specified injection time of the next real request or is blocked in a system call induced by one of the preceding requests. On the other side, the Adapter can continue to prepare and/or inject outstanding real requests also when the Generator is blocked in an S-state of the UBA waiting for an abstract system reaction message in EQ. Further, the Generator may need no feedback from the Adapter to produce the next abstract request in the UBA, e.g. when: • the given UBA does not contain S-states at all, • no feedback from the Adapter is required in S-states (because of the chosen type of system reactions to be interpreted), or

5.2. Overview of the UniLoG Architecture

153

• a long sequence of R-states is to be processed before the next occurrence of an S-state with required feedback. In these situations the Generator cannot block and may produce abstract requests in advance (fill the request queue RQ) far ahead of the Adapter transforming them into real requests. So, the Generator and the Adapter can be simultaneously active in these situations. In order to achieve a high degree of performance and scalability of the UniLoG architecture on different system platforms, the Generator and Adapter modules have been implemented as two separate execution threads which are spawned from the UniLoG process and run concurrently. On a single-core system an internal cooperative scheduling mechanism (Scheduler) is used to manage the processing time taking into account the real-time requirements of requests (see Sec. 5.5). On systems with two or more processor cores the Generator and Adapter can be executed either concurrently on the same core (then using the internal cooperative scheduler) or on different cores. In the latter case, a significant performance gain may be achieved in situations when the Generator does not need feedback from the Adapter to produce the next abstract request (as explained above), as the corresponding Generator and Adapter threads can be executed on different cores in parallel. The relation between the Generator and the Adapter threads follows a typical producer-consumer scenario. The Generator generates (or “produces”) abstract requests (ti , ri ) in the R-states of the UBA and enqueues them into the queue for abstract requests RQ. The request queue RQ is observed by the Adapter which extracts (or “consumes”) the abstract requests (ti , ri ) one by one from the head of RQ and prepares the corresponding real requests (t∗i , ri∗ ) at the target service interface IF . Both the Generator and the Adapter change the contents of the queues RQ and EQ by means of the operations push back (to enqueue an element) and pop (to dequeue an element). Therefore, the Generator and the Adapter need a mutually exclusive access to these queues. The queues are located in the memory region shared between the Generator and Adapter and have limited capacity. So, the access on the RQ from the Generator has to be synchronized with the access on the RQ from the Adapter in order to prevent from reading an inconsistent state of the queue. The same considerations are valid for EQ, so that analogous precautions have to be made in respect to the accesses on it. We note that the need to synchronize access on the RQ may become a limiting factor for the performance of the load generator.

154

5. Architecture of the Unified Load Generator

Therefore, the critical sections in the corresponding code implementing the access to RQ must be kept as short as possible.

5.3. Generator Functionality The Generator starts to traverse the states of the UBA always at logical time t0 = 0.0 and executes the corresponding actions in each state depending on the state type (R-, D-, or S-state). The current logical time t in the UBA is advanced in D-states according to the specified delays between subsequent abstract requests (generated in R-states) or abstract request and the corresponding abstract system reactions (generated in S-states). When generating a new abstract request (ti , ri ) in an R-state, the Generator sets the arrival time ti of the request ri at IF to the current value of the logical time t from the UBA. Further, the attributes of the request ri and their corresponding new values are determined by the Generator according to the specifications in the abstract request type assigned to the R-state. Finally, the new abstract request (ti , ri ) is inserted into the queue for abstract requests RQ which is observed by the Adapter. The resulting sequence of abstract requests in RQ is time-ordered, i.e. the next request to be handed over to the Adapter is located always at the head of RQ. The activities in the S-states of the UBA are determined by the chosen type of system reactions to be interpreted (see the following Sec. 5.4 for the detailed description of three possible types of system reactions): • Abstract reactions (tj , ej ) can be generated directly in the Generator as their arrival times tj and values of reaction attributes ej are specified in the S-states of the UBA (similar as abstract requests (ti , ri ) are generated according to the specifications in R-states). The Adapter is not involved in the modelling of abstract reactions, therefore. • When local or distributed system reactions have to be interpreted, the Generator blocks immediately after entering the S-state and starts monitoring (polling) the EQ for abstract system reaction messages generated by the Adapter which has to be involved in this case (cf. Sec. 5.4). The Generator retains the control back after the abstract system reaction message (tj , ej ) becomes available in EQ. Further, when the abstract system reaction (tj , ej ) becomes available (either generated by the Generator directly or obtained from a corresponding abstract system reaction message in the EQ as explained above) the Generator takes the following actions:

5.4. Adapter Functionality

155

1. the current logical time t used in the UBA model is advanced to the logical arrival time tj of the system reaction (tj , ej ), 2. values of reaction attributes (which may contain e.g. status and error codes from the system calls executed at IF ) are extracted from the system reaction ej into the predefined context variables vi ∈ V , 3. the context variables can be used e.g. in the guard expressions at the conditional transitions out of the S-state in order to determine the next state in the UBA depending on the type of system reaction or the value of one of its attributes.

5.4. Adapter Functionality In this section we describe generic adapter tasks which are common to all UniLoG adapters. The corresponding functions must be implemented and invoked in the main processing (dispatch) loop of every Adapter regardless of the target service interface IF it is being developed for. Initialisation of real requests: each abstract request (ti , ri ) from RQ is characterized by the corresponding abstract request type which is itself described by means of an XML complex type (cf. Sec. 3.4.3). For each abstract request type, the element of the complex type points to the unique identifier of the corresponding system call from the list of supported system calls at IF . Recall from Sec. 3.4.2 that the Adapter assigns a unique identifier to each of the system calls supported at IF and to each of their corresponding parameters then storing them in a list of supported system calls in order to facilitate the mapping between the abstract requests and concrete system calls at IF . Provided that the list of supported system calls at IF is available in Adapter and the references are specified for all abstract request types and their attributes in the UBA (the UBA Parser is responsible to check the latter condition at loading time before the UBA model is executed), the mapping mechanism consists of the following steps (for the example of the TCP socket interface): 1. The element of the complex type is evaluated to determine the identifier of the corresponding system call at the TCP socket interface (e.g. the identifier TCP.SEND = 202 denotes the send operation to transmit data on a connected TCP socket).

156

5. Architecture of the Unified Load Generator

2. The dependencies (prerequisites) of the current system call are checked, e.g. the current call may require that some other preliminary system calls have already been made at IF . For example, a TCP socket must be associated with a local address using the bind() method and connected to a peer TCP instance using the connect method before any calls to the send() method can be made. The required system calls are immediately scheduled for execution by the Adapter. 3. The Adapter iterates through the list of request attributes of the current request type (described by means of the complex type , cf. Sec. 3.4.3). For each request attribute, the element of the complex type contains a reference to the corresponding parameter of the system call. So, the Adapter can determine the data type of the parameter (using the list of supported system calls at IF ). For numerical parameters, the Adapter just copies the specified value from the corresponding abstract request attribute. For parameters using complex data types (especially character or binary data arrays) the Adapter copies only the reference to the corresponding data blocks, where appropriate. 4. The Adapter iterates through the parameter list of the system call. In case a parameter is obligatory in the system call and it was still not assigned a value (e.g. because this parameter was omitted in the definition of the abstract request), a predefined default value is set for the parameter. Injection of real requests: recall that the time instants ti specified for abstract requests (ti , ri ) in RQ denote their logical injection times at IF , i.e. these times are specified relative to the physical beginning of the UBA execution. Thus, the Adapter must first calculate the corresponding physical injection times t∗i before the real requests (t∗i , ri∗ ) can be executed at IF . Assumed that the execution of the UBA in the Generator was started at the physical time t∗start , the physical injection times t∗i are calculated by the Adapter as t∗i = t∗start + ti . For each abstract request (ti , ri ) the Adapter must execute the corresponding real request ri∗ exactly at the calculated physical injection time t∗i . Therefore, the method used for the measurement of time intervals is crucial for the Adapter and it will be discussed in detail in Sec. 5.5. Modelling of system reactions: In case the UBA contains S-states, the type of system reactions to be interpreted in a particular S-state determines the required actions in the Adapter.

5.4. Adapter Functionality

157

• “Abstract reactions”: in case the arrival (indication) times tj and the values of attributes of system reactions (tj , ej ) are specified directly in the S-state of the UBA (e.g. by means of constant values, or according to a trace or a statistical distribution), the corresponding abstract system reactions can be generated and interpreted directly in the Generator. So, there is no need to create abstract reaction messages in EQ and the Adapter is not involved, therefore. • “Local system reactions”: represent the feedback from the target service interface IF as a consequence of the execution of system calls induced by the real requests. Usually, after the execution of each system call the corresponding status and error codes can be obtained either from the return value or by means of additional function calls (e.g. WSAGetLastError() on Windows Sockets). For example, the return integer value of the send() call used to transmit data on a connected TCP socket is equal to the total number of bytes sent if no error occurs. Otherwise, a value of SOCKET ERROR is returned, and a specific error code can be retrieved by calling the WSAGetLastError() function. In this case the Adapter is involved and is responsible for obtaining the corresponding system and error codes after the execution of each system call. Provided that the last system call (t∗i , ri∗ ) at IF was finished at physical time t∗j and its return value contained the status code e∗j , the corresponding abstract system reaction message (tj , ej ) is prepared by the Adapter and inserted into the queue EQ. Here, tj is the logical arrival (or indication) time of the abstract system reaction ej at IF which is calculated as tj = t∗j − tstart where tstart is the physical beginning of the UBA execution, again. • “Distributed system reactions”: system reactions may be also represented by the messages from the communication partner instance observable at IF . For example, in case IP requests are injected at the IP service interface, the system reactions may be represented by the corresponding Internet Control Message Protocol (ICMP) messages like SourceQuench, DestinationUnreachable, etc., generated by the peer IP protocol instance(s). In this case the Adapter must first capture the system events indicated (occurred) at IF and filter the events which are relevant for the UBA before the corresponding abstract system reaction messages can be created in EQ. It should be noted that this approach induces a considerable overhead in the Adapter because network packets which represent system reactions have to be captured at the target service interface and interpreted in

158

5. Architecture of the Unified Load Generator real-time (e.g. using packet capture and analysis libraries like LibPcap or WinPcap [WinPcap]).

Statistics: the Adapter must always provide functionality to collect statistical data and evaluate different statistical characteristics of the injected real requests. Finally we should emphasize that the Generator thread does not block in case when the UBA contains S-states with abstract system reactions only because reactions of this type can be generated directly in the Generator without involving the Adapter. Therefore, the Generator and the Adapter thread can be executed in parallel in this case, too.

5.5. Real-time Requirements of Requests As already stated in Sec. 5.1 a special attention must be paid to the proper design of the Generator and the Adapter components when the outlined architecture is to be implemented on the basis of an operating system without explicit real-time extensions and without using any specialized hardware. In the following we discuss the important factors for the design of these two main components of the UniLoG architecture and present the corresponding solutions. First we note that the real requests (t∗i , ri∗ ) are time-critical because their injection at IF must be made exactly at the specified time instants ti and not later or earlier. Further, due to the fact that the Adapter needs a finite non-negligible time to prepare the real request (t∗i , ri∗ ) corresponding to the abstract request (ti , ri ), the abstract requests must be handed over to Adapter early enough and, therefore, they become time-critical as well. So, taking into account the real-time requirements of abstract and real requests, the implementation of the UniLoG traffic generator in general and its Generator and Adapter components in particular must cope with a set of problems described in the following.

5.5.1. Impact of Multitasking First, we consider the effects arising from the concurrent execution of multiple processes on the same CPU (multitasking). Remember from Sec. 5.1 that the UniLoG architecture is to be implemented as a user-space process, so that it competes for processing time with other application and system processes running in the user-space. Here, a set of different measures can be considered

5.5. Real-time Requirements of Requests

159

to increase the amount of processing time available for the UniLoG process. First, non-critical application and system services executed in the user-space, e.g. local packet filter, software update services, virus scanner, etc., can simply be deactivated for the time of load generation. However, more critical user-mode system processes (which may be, e.g., parts of the networking, client/server, or I/O subsystem) or kernel-mode system processes cannot be generally deactivated or assigned a lower priority because of the increasing risk of a critical operation failure. Therefore, the next appropriate measure may be to adequately increase the priority of the UniLoG process. Remember that Windows platforms use a pre-emptive priority-based round-robin scheduler with six process priority classes combined with seven thread priority levels within each class to form the base priority of each thread in the system (note that threads are scheduled, and not processes: each process has its main execution thread). The scheduler assigns time slices in a round-robin fashion to all threads with the highest priority. If none of these threads are ready to run, the system assigns time slices in a round-robin fashion to all threads with the next highest priority. If a higher-priority thread becomes available to run, the system ceases to execute the lower-priority thread (without allowing it to finish using its time slice), and assigns a full time slice to the higher-priority thread [MSDN2]. Thus, in order to minimize the number of interruptions we can assign the UniLoG process to the REALTIME PRIORITY CLASS which is the highest priority class available on Windows systems. The next lower HIGH PRIORITY CLASS may also be a good alternative in order to prevent competition of UniLoG with the time-critical system processes running in the real-time priority class. Could we theoretically arrange that UniLoG would be always alone in the REALTIME PRIORITY CLASS class, the round-robin strategy of the scheduler would effectively become equivalent to the FCFS strategy in this class. Further, in order to manage the processing time for the Generator and Adapter threads spawn from the UniLoG process we cannot simply rely on the operating system scheduler as it does not have any information on the injection times (deadlines) of requests at its disposal. The scheduler could interrupt the Adapter thread at a very inappropriate point of time, e.g., when the Adapter prepares a time-critical request to be injected as soon as possible. Therefore, UniLoG is using an internal cooperative scheduling mechanism to manage the processing time among the Generator and Adapter more efficiently taking into account the urgency of requests (cf. Sec. 5.5.3).

160

5. Architecture of the Unified Load Generator

5.5.2. Latency Introduced by System Calls As already stated, the UniLoG architecture is to be implemented as a userspace application. Therefore, the latency introduced by system calls cannot be avoided or neglected during the design of the UniLoG components. System calls must be used in UniLoG in the first place to perform network I/O operations when real requests are to be injected at the target service interface IF , for example at the TCP socket interface by means of send() or recv() operations. Further, system calls may be necessary, e.g., to obtain the current system time (in the polling loops of the Generator and the Adapter) and to create and synchronize the threads required in the system. System calls to perform network I/O operations are mainly determined by the choice of the socket family which represents a trade-off between the functional criteria and the performance. For example, in case the specified UBA represents a user (or users) of a UDP service interface, the Adapter should use datagram sockets from the PF INET protocol family which are typically used by the network applications requesting UDP services. Implementing the Adapter in this way increases the probability that the overhead introduced during the injection of real UDP requests in the Adapter will be equal to the overhead introduced in the network application using regular UDP datagram sockets from the PF INET family for its network I/O operations. Alternatively, the Adapter could use sockets from the PF PACKET family (also called as “raw” sockets) to inject UDP requests in form of Ethernet frames whose UDP, IP and MAC headers must be prepared by the caller itself and not by the socket. In case when the majority of the header fields keep the same values from packet to packet, a substantial performance increase may be achieved for packet generation. For example, when UDP packets of constant length are to be generated from the same sender to the same receiver (i.e. the source address and port as well as the destination address and port fields remain identical from packet to packet), only the value of the IP identification field has to be increased and UDP and IP checksums must be recalculated to generate a new packet. Further, the control path of PF PACKET sockets does not include a large set of additional operations which are usually involved on the standard control path of a regular PF INET datagram socket and lead to a significant overhead. In particular, PF INET socket is in charge of checking if the socket is connected, query the routing table for the associated destination, allocating the kernel structure incapsulating the frame (sk buff structure) and building the frame with the information passed by the calling function [BGPS05].

5.5. Real-time Requirements of Requests

161

Moreover, PF INET socket encompasses security components which perform packet filtering, mangling and other operations related to address/port translation (NAT/NPT). Thus, if the caller decides to omit these operations while using the PF PACKET socket, a significant performance increase may be achieved for packet generation compared to the use of regular PF INET sockets (cf. e.g. the results presented by the authors of the traffic generator BRUTE in [BGPS05]). However, according to the UniLoG approach, the procedure outlined above cannot be seen as load generation at the regular UDP service interface because a set of operations being usually in charge on the control path of the regular PF INET socket are simply omitted using raw PF PACKET socket operations. Finally, it should be noted that the performance of the same socket operation (e.g. send) may vary on different implementations of the socket interface (e.g. on different Linux versions or different Windows versions). Further, different default values of socket initialisation parameters (like e.g. socket send SO SNDBUF and receive SO RCVBUF buffer sizes) may be a reason for different system behaviour. In case of timers, special attention must be paid to their resolution (which must be sufficient to provide a reference clock for using UniLoG in highspeed environments) and the introduced overhead (which delays the starting time of the next statement in the polling loop of Generator and Adapter, especially in case an additional context switch is triggered by the timer function). For example, the primary API to acquire high-resolution time stamps or measure time intervals for native code on Windows platforms is the QueryPerformanceCounter (QPC) system call (there are alternative system calls for kernel-mode device drivers and managed code). If the system can use the Time Stamp Counter (TSC) register as the basis for QPC, the kernel transition (system call) is not required. If the system must use a different time base, such as the High Precision Event Timer (HPET) or Advanced Control and Power Interface (ACPI) timer (the latter also known as the Power Management PM clock), a system call is required [MSDN1]. TSCs are high-resolution per-processor hardware counters that can be read through the RDTSC and RDTSCP machine instructions, providing very low access time (latency) and computational costs (overhead) in the order of tenths or hundredths of machine cycles, depending upon the processor type. Windows platforms up from the Windows 7 and Windows Server 2008 R2 versions installed on systems with processors already having constant-rate (“invariant”) TSCs use these counters as the basis for QPC. Further, current Windows platforms are able to perform the necessary frequency calibration and to tightly synchronize the individual TSCs across all processors/cores

162

5. Architecture of the Unified Load Generator

during the initialization of the multi-processor/multi-core system.

5.5.3. Latency Introduced by UniLoG Components The processing times of UniLoG components cannot be eliminated completely, but we try to reduce them as far as possible by means of the proper design of each component and several optimizations in the execution of the UBA. Where possible, resources needed for the generation of abstract requests can be proclaimed, i.e. they can be allocated all at once already before the start of the execution of time-critical components responsible for the generation of abstract and real requests. For example, traces with the specifications of the values of request attributes and the request injection times can be loaded before the beginning of the execution of the Generator and the Adapter. Further considerations have to be made for the case when the Generator and Adapter are executed on the same processor core concurrently (especially in case only one processor core is available in the system). In order to consider the times needed to prepare the next real request in the Adapter, we propose to monitor the urgency of the abstract requests being generated by the Generator. An abstract request (ti , ri ) is called urgent for the Adapter at the instant of physical time tnow when the remaining physical time t∗i − tnow = (tstart + ti ) − tnow until the injection of its corresponding real request (t∗i , ri∗ ) is less than or equal to some predefined time value Δt, i.e. (tstart + ti ) − tnow ≤ Δt. Recall that the physical injection time of the real request (t∗i , ri∗ ) corresponding to the abstract request (ti , ri ) is calculated as t∗i = tstart + ti where tstart is the physical beginning of the UBA execution in the Generator. Monitoring the urgency of abstract requests is the responsibility of the internal cooperative scheduler in UniLoG. The scheduler checks the urgency of the next abstract request stored at the head of RQ each time an abstract request has been generated by the Generator or a real request has been generated (that means prepared and injected at the target service interface) by the Adapter. In case the abstract request at the head of RQ becomes urgent, the internal scheduler will activate the Adapter again to ensure that the urgent request is handed over to the Adapter not later than Δt time units before its physical injection time t∗i = ti + tstart . Assumed that the time TA required in the Adapter to prepare each single real request can be reliably estimated (see explanations below), the value of the parameter Δt can be chosen to correspond exactly to the estimated value of TA . In this way, the internal scheduler can be appropriately parameterised to guarantee

5.6. UniLoG Design Extensions for Multi-Core Platforms

163

the timely preparation of requests in the Adapter and to make possible the injection of real requests at the target service interface just in time. We note that a reliable estimate of Δt can be obtained relatively easy in case when the request preparation time in the Adapter is nearly constant and does not fluctuate with the request length. The implementation of the Adapter can support this behaviour when unnecessary copy operations for large memory blocks are avoided (e.g. copying of packet data arrays, etc.).

5.5.4. Intrinsic Model Factors Special properties of the request sequence to be generated may cause problems when a component of the traffic generator reaches its performance or precision limit. For example, when bursts of requests are to be generated and the inter-arrival times between requests within the burst are smaller than the time TA required by the Adapter to prepare the injection of each real request, the Adapter will not be able to meet the injection times of requests at the chosen service interface. In such cases the experimenter should ensure that all the hardware and software components involved into the load generation process are properly chosen and dimensioned to meet the requirements of the load scenario to be established and to guarantee the accurate and reliable generation of the specified request sequence. Further we should keep in mind that the UniLoG load generator is not predestined to generate only “kind loads” consisting of periodic requests with constant inter-arrival times which could be easily predicted by the Adapter. Instead, the inter-arrival times specified by the experimenter in the D-states of the UBA may lead to arbitrary (and thus also infinitesimal, in worst case) delays between the subsequent requests to be injected.

5.6. UniLoG Design Extensions for Multi-Core Platforms According to the Intel product information site [Int14], current commodity desktop processors (4th Generation Intel Core i7 processors are prevailing at the time of writing) are mostly equipped with four cores and support hyper-threading thus providing eight logical processors to user applications. In the segment of high-end desktop systems, processors with 6 cores are available since fourth quarter of 2011 (Intel Core i7-3930K, 12 M Cache, up to 3.80 GHz) and a first processor with 8 cores has been released in the third quarter of 2014 (Intel Core i7-5960X Extreme Edition, 20 M Cache, up

164

5. Architecture of the Unified Load Generator

to 3.50 GHz). Further, in the server segment, processors with 6, 8, and 10 cores have been available since second quarter of 2011 (Intel Xeon Processor E7 family) and the currently released processor family Intel Xeon Processors E7 v2 can have up to 15 cores providing up to 30 logical processors to the applications by means of the activated hyper-threading technology. So, in order to provide a high degree of scalability for the UniLoG architecture it is indispensable to take into account the aspect of multiprocessing on current multi-core systems (so-called multi-core awareness property of the designed load generator). Recall that the scalability property characterizes the ability of the load generator to modify (mostly to increase) the number of emulated virtual users without lot of effort. In case of UniLoG, the number of virtual users can be increased by means of one of the following methods: Aggregated UBA: Integrating of a number of virtual users in the same single (“aggregated”) UBA model. In this case, the Adapter will be responsible for the creation of corresponding sockets, initiation of connections, etc., required for each additional virtual user during the execution of the aggregated UBA. Concurrent UBA agents: in order to execute a number of virtual users (represented by their corresponding UBAs) concurrently, a new functional component called UBA agent is introduced. Each UBA agent contains a single Generator and the respective Adapter module in order to execute the specified UBA and generate real requests. Each UBA agent is implemented as a new thread spawn from the UniLoG main process. The scalability of the UniLoG architecture is achieved by increasing the number of concurrently active UBA agents. As long as the number of concurrently active UBA agents does not exceed the number of physically existing cores in the system (or the number of logical processors in case hyper-threading is activated), the agents can be executed in parallel and do not affect each other. When the number of UBA agents exceeds the number of physical cores in the system, some of them will be executed concurrently on the same core. This may lead to an increasing number of interruptions for each of the UBA agents thus increasing the risk of missing request deadlines. Multiple UBA agents in different configurations can be used in the UniLoG load generator in order to exploit the multi-core system architectures also in combination with the NICs implementing several independent RX/TX queues, provided that the underlying networking subsystem and the protocol software do also provide explicit support for the multi-core system

5.6. UniLoG Design Extensions for Multi-Core Platforms

165

architectures and multi-queue NICs (see, e.g., [RDC11, BPGP12] for further details). System for distributed load generation: Virtual users required in the experiment are identified and mapped onto different geographically distant (distributed) load agents. The corresponding extension of the existing UniLoG architecture to a system for distributed load generation is presented in the next Chapter 6 of this thesis.

6. Distributed UniLoG Architecture In this chapter we first motivate the development of systems for distributed load generation and identify the typical expectations / requirements of the experimenter from such systems. Further, we present the design of the system for distributed load generation based on the architecture of the UniLoG load generator presented in Chapter 5 and describe the key details of its implementation.

6.1. Prerequisites and Requirements The choice of the target service interface used by a load generator to inject traffic may determine (and thus also restrict) the type of traffic matrix which can be established during the load experiment. Therefore, the decision about the target service interface to be used should be made strongly according to the objectives of the particular load experiment. For example, when the UniLoG load generator is used at the transport service interface (i.e. it is used in combination with the UniLoG.TCP or UniLoG.UDP adapter(s)) to generate a number of TCP or UDP packet streams originating from different network nodes, the source IP address in the header of the generated packets will always correspond to the local IP address assigned to the network interface controller in the network node used by the corresponding adapter. This is because the UniLoG adapters use UDP and TCP sockets which must be bound to a local transport address consisting of a local IP address and port before they can be used for packet generation. Thus it is not possible to spoof the source IP address of generated packet streams using UniLoG.TCP or UniLoG.UDP adapters. As a consequence, the load generators (as well as the corresponding load sinks, if needed, e.g., in case of TCP) must be installed in each network node involved into the experiment and the requirement to coordinate and remotely control the execution of the load generators becomes apparent.

© Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_6

168

6. Distributed UniLoG Architecture

Further, the destination IP address is a free parameter and can be specified by the experimenter. So, a single UniLoG.UDP or UniLoG.TCP adapter supports the generation of a 1:n traffic matrix if the matrix is observed at the IP layer (1:n denotes here a relation “from one sender to n receivers” at the observed protocol layer). In order to generate m:n traffic matrices at the IP layer, more than one adapter is required and, as explained above, a coordination of the involved adapters is needed during the load experiment. When we observe the traffic matrix at the transport layer, an m:n traffic matrix can be generated already by a single UniLoG.TCP or UniLoG.UDP adapter using different source port numbers in combination with different destination IP addresses and/or port numbers. In contrast, the UniLoG.IP adapter allows to generate IP packets whose source IP address differs from the local IP address used by the adapter (the spoofing of the source IP address is allowed). Thus, a single UniLoG.IP adapter can generate a number of IP packet streams with different source IP addresses and inject these streams into the network 1 . So, observed at the IP layer, the m:n traffic matrix can be generated already using a single UniLoG.IP adapter and the installation of the adapter in each network node involved into the experiment may not be necessarily required. However, many load generators with the respective IP adapter may be necessary when traffic has to be generated at a specific link or in the specific node in the network. In these cases the coordination and remote control of the load generators installed in the network is required. Moreover, the use of remotely controlled and coordinated traffic generators may be indispensable in order to execute the following tasks: Performance evaluation of network software and hardware components: experiments for load and stress testing as well as compliance tests of routing algorithms, load balancing techniques, routing devices, etc., may represent the possible scenarios. Further, the system for distributed load generation may be useful to support the design and implementation of network security tools and security devices for networked systems, e.g. traffic anomalies and intrusion detection systems, firewalls, Network Address Translation (NAT)/ Network Port Translation (NPT) boxes, etc. Here, the generation of IP test traffic with very high packet and/or data rates (in the order of several Gbit/s) at a specified link (e.g. at the Digital Subscriber Line (DSL) upload link connecting the Local Area Network (LAN) of the customer with the 1 However,

IP packets with the spoofed source IP address may be blocked by a firewall/packet filter in the next IP router/gateway, depending on the concrete configuration of the experimental network.

6.1. Prerequisites and Requirements

169

Internet Service Provider (ISP) network) or in the specified network node (e.g. in a particular IP router in the provider network) may be required. When the required test traffic cannot be generated by means of only one stand-alone traffic generator (e.g. because of its internal performance limits) a system incorporating many traffic generators coordinated by the experimenter may be useful. Performance evaluation of distributed applications and services: in order to produce traffic for experiments with distributed applications and services, the UniLoG traffic generator must be used in combination with the respective UniLoG adapters for application service interfaces (e.g., HTTP, Common Internet File System (CIFS), Server Message Block (SMB), various Peer-to-Peer services etc.). Not only the headers but also the payload fields of the generated packets must be filled in a valid manner in order to be recognized by the service or application under test. Generation of traffic mixes: (e.g. in networks with mesh topology) in case the required traffic mixes cannot be provided by means of a single standalone traffic generator, e.g., because the traffic mixes should contain different types and number of traffic sources (for example, Web traffic combined with a number of file transfers, video, and voice traffic streams) originating from different users and/or nodes in the network under study. In this case, a corresponding number of coordinated and remotely controlled UniLoG traffic generators can be used in combination with the respective adapters required to generate HTTP, FTP, video and voice traffic, correspondingly. The remote control of traffic generators plays an important role especially when remote parts of a geographically distributed network are involved into the experiment. In this thesis we connect to the preliminary conceptual work [Con06] and corresponding prototype implementations ([Had06, Sch07]) carried out at the TKRN working group at the University of Hamburg in the area of distributed load generation. One possible classification for the different types of load generator control was proposed in [Con06]. In this classification, the autonomous model for the load generators control assumes a mutually independent execution of the load generators. We emphasize that taking into account the above-mentioned application scenarios of the system for distributed load generation aimed at in this thesis, the autonomous model is definitely not sufficient to provide predefined and precisely synchronised aggregated traffic loads and to adequately respond their influence on the tested component to the experimenter.

170

6. Distributed UniLoG Architecture

Therefore, in this thesis we concentrate on the development of the system for distributed load generation which enables the experimenter to control, coordinate, and monitor the participating load generators from one central point in the experimental network. In [Con06] this type of load generator control was referred to as a master-slave model for distributed load generation. Further, the first prototypes of the remote invocation of load generators implemented by the students at the TKRN working group ([Had06, Sch07] have had a series of limitations, restrictions, and security drawbacks. The implementation in [Sch07] used the Java Remote Method Invocation (RMI) technology to invoke the UniLoG adapters. This required the installation of the Java runtime, arrangement of the RMI registry and, the appropriate configuration of the firewalls in the network and/or load generating machines to admit the RMI traffic. The solution was restricted to the submission and execution of a single supported command to invoke the UniLoG UDP adapter which was then able to generate traffic only according to a local trace of abstract requests. The trace had to be manually deployed to the load generating stations and there was no possibility to automatically deploy the components needed for the load generation (e.g. UBA models, required traces, etc.) onto the different load generating machines. Further, the solution lacked measures to implement authentication, authorisation, and communication security (e.g. by means of encrypted control channels) between the central station and the load generators. In [Had06] lot of effort has been spent on the implementation of the algorithm for the initial synchronisation of the internal clocks used by the load generators. In particular, the thesis argued that it is indispensable to take into account the maximum propagation delay of commands from the central control point to load generators in order to ensure that the start command reaches all load generators at exactly the same time so that the load generators can start immediately after they received the start command (e.g. startLoadGeneratorNow()). In our opinion, this issue can be solved in a much more reliable manner using the appropriate protocols (e.g. Network Time Protocol (NTP) or Precision Time Protocol (PTP) which are provided by the operating system and can be used by the load generators) to perform the initial clock synchronisation before load generation. Given that the physical clocks in the involved load generating stations are synchronised sufficiently precise, the start of the load generator process in each station can simply be scheduled to a specified point of time by means of a corresponding command (e.g. startLoadGeneratorAt(12:00 AM, Central European Time (CET))) submitted by the experimenter at a

6.1. Prerequisites and Requirements

171

central control point and transferred to the load generators. At this point we refer to [KWK09] for further details. On the basis of the preliminary discussion presented above the key requirements to a system for distributed load generation in this thesis can be specified as follows: Remote control: The experimenter must be able to control the process of load generation and monitor its progress from one central station in the network. In particular, it should be possible to start selected load generators immediately or at the specified instant of physical time (e.g. at 12:00 AM, CET). Further, all required components (e.g. load generators, involved adapters, load sinks (if needed), load models to be executed, traces with values of load parameters, etc.) must be disseminated from one central point in the network to the network nodes involved into the experiment before its start. Clock synchronisation: is required to make possible the start of the involved load generators exactly at the physical point of time specified from the central station. Thus, it is absolutely necessary that the system time of the load generators is synchronized with the system time at the central station. Further, if required in the particular experiment, the system time in the central station can be additionally synchronized with an external timing resource. There exist different ways to synchronise distributed clocks through a network, e.g., NTP and its derivative Simple Network Time Protocol (SNTP) [RFC5905], or the newer PTP which was approved as IEEE 1588 standard for Ethernet networks [IEEE1588]. Scalability: in terms of systems for load generation. This requirement is understood as the ability of the load generator to modify (mostly to increase) the number of emulated virtual users without lot of effort. The feature is very important for load experiments because the superimposition of a number of different load (or traffic) sources makes possible the generation of aggregated load (or traffic) mixes in the network. Methods to increase the scalability of the centralized UniLoG load generator have already been discussed in Sec. 5.6. The additional ability to use many geographically distributed load generators would provide for even higher degree of flexibility during the generation of different load mixes in IP-based networks. Security issues: There are measures necessary to prevent fraudulent use of the system for distributed load generation being designed. In particular, the experimenter and the central station must be authenticated and authorized

172

6. Distributed UniLoG Architecture

by the load generators in order to become eligible to control their behaviour. Furthermore, the security (confidentiality) of the communication between the central station and the involved load generators must be ensured (e.g. communication channels can be established by means of Transport Layer Security (TLS) [RFC5246, RFC6176]).

6.2. System Architecture for Distributed Load Generation As already stated in Sec. 5.2, the separation of the overall architecture of the UniLoG traffic generator into the Generator and Adapter components and execution of these components at different processor cores may provide a performance gain for the load generation. A significant performance gain can be expected especially in case when the execution of the given UBA model requires a substantial amount of processing time. However, the distribution of the model execution and request generation/injection components onto different (geographically distant) nodes or stations in the network (as proposed e.g. in [Con06]) appears to be much less efficient because of the increasing communication and synchronisation overhead between the remote components of the traffic generator. Therefore, in this thesis we concentrate on the design of a system for distributed load generation comprising a set of stand-alone UniLoG load generators (and do not further consider the issue of the distribution of the components of a stand-alone load generator onto the different nodes and stations in the experimental network). According to the requirements discussed in the previous section, the system for distributed load generation in this thesis consists of the following functional components (cf. Fig. 6.1): Load generators: are responsible for the generation of workloads specified by the experimenter. Load generators consist of the Generator and Adapter components defined in Chapter 5 which are responsible for the execution of the UBA model and generation of real requests at the specified target service interface (e.g. HTTP, UDP, TCP, or IP service interface), respectively. The Adapter may require that a corresponding load sink is established at the communication partner’s side (as it is the case for the UniLoG.TCP adapter). Load agents: are able to control and monitor the load generators and provide a load generation service for authorized experimenters. Along with

6.2. System Architecture for Distributed Load Generation

Ii 1.0

Ia

Ii

Generator (GAR)

Si

Ii 1.0

Ib

RGet

Ia

1.0

0.00002

Si

1.0

Ia

SBlocked

0.99998

1.0

DOFF

It

St

Ib

SBlocked

0.00002

Ib

SBlocked

0.99998

DOFF

0.00002

DOFF

Si

RGet 1.0

Ib

SBlocked

1.0

1.0

RGet 1.0

0.99998

DOFF

Ia

Ii

Si

RGet 0.99998

Management Station

UBA model

Load Generator (UniLoG)

173

Experimenter

1.0

St

It

1.0

1.0

St

0.00002

It

St

It

IEEE 802.11g 54 Mbit/s AP

Load Sink

100 Mbit/s RQIP

EQIP

RQTCP

Adapter UniLoG.IP

RQHTTP

EQTCP

Adapter UniLoG.TCP Request and Event Mapper

EQHTTP

1 Gbit/s

Adapter UniLoG.HTTP Request and Event Mapper

Request and real Event Mapper Request injector Event Capturer ICMP status messages

IP frames

RTP/TCP video stream

1 Gbit/s

request Request injector

1 Gbit/s

Load Agent 1

Event Capturer TCP socket codes

TCP segments

TCP / UDP

IP …

Request injector HTTP requests

VoD client ( VLC )

Event Capturer HTTP responses

Load Agent 2

HTTP

VoD server ( LIVE555 )

Control channel (TLS / OpenSSL) abstract requests real requests

real system reactions abstract system messages

Figure 6.1.: Architecture of the system for distributed load generation on the basis of the UniLoG load generator (own Fig.).

the load generators, load agents belong to the load-generating software which must be installed in each network node involved into the current load experiment in order to execute it. In respect of their functionality, the load agents can be compared to the agents defined in the Simple Network Management Protocol (SNMP) (cf. RFC3410-3418). Management station: is charged with the remote control, remote configuration, and monitoring of the load generators installed in the experimental network by means of communicating with the respective load agents. In order to avoid the potentially possible influence of the control traffic on the traffic generated for the experimental network and vice versa, the corresponding control channels (cf. Fig. 6.1) from the management station towards the load agents can be established using separate physical (“dedicated”) network links. The use of dedicated links is recommended especially when the system for distributed load generation is used in experiments for performance evaluation and stress tests of network software and hardware components. The required load generating software can be installed by the system administrator either directly in each of the involved network nodes or remotely using TLS channels from the management station in the experimental

174

6. Distributed UniLoG Architecture

Command uploadUBA uploadTrace

Description Uploading a UBA model to a load agent Uploading a trace file with values of UBA parameters to a load agent startLoadGeneratorNow Starting the load generator immediately (after this command is received) startLoadGeneratorAt Starting the load generator at the specified point of time stopLoadGenerator Immediately terminate the load generator getState Query the load generator’s state (e.g. initialising, active, terminated) getReport Request a load generation report from the load agent Table 6.1.: Control commands supported by the load agents (own Tab.).

network. The load agent is started during the boot process as a daemon on Linux platforms or as a system service on Windows platforms. In order to execute a load experiment the required workload models must be first prepared in the form of UBA using the LoadSpec tool presented in Chapter 4. Next, the UBA model itself and the corresponding trace files (if needed for the specification of the values of different UBA parameters) are deployed from the management station onto the load agents involved into the experiment. At this time, the authentication and authorization procedure between the management station and the load agents must be already completed. The authorization of the management station by the load agents is of particular importance in order to prevent the fraudulent use of load agents from other not authorized stations. After the authentication and authorization procedure is completed the load agents become associated with the management station. Further, an encrypted TLS connection is established between the management station and each of the associated load agents. The corresponding TLS sessions are used by the management station to communicate with the load agents in order to execute the required actions before, during, and after each load experiment. The experimenter can use the load generation service offered by the load agents by means of different control commands submitted from the management station. The supported commands are summarized in Tab. 6.1. The control commands are encapsulated into HTTP messages and are transmitted to the load agents by means of HTTP.GET or HTTP.POST request methods (so that a component to serve HTTP requests is required in the load agents).

6.3. Implementation Aspects

175

A particular load agent can initiate the start of the corresponding load generator either immediately (e.g. after it received the startLoadGeneratorNow command from the management station) or at some specified point of time which can be specified by the management station in the startLoadGeneratorAt command (e.g. startLoadGeneratorAt 2014.10.14–19:04:40:345, CET). The duration of the load experiment is determined by each load generator from the UBA model referenced by a parameter of the submitted start commands. All load generators involved into the experiment must start the execution of their corresponding UBA exactly at the same time. Therefore, the load agents use a corresponding NTP client to synchronize their system time with the system time in the management station. The system time in the management station can additionally be synchronised against an external NTP server in case the time used in the distributed load generation system should correspond to the real clock time. According to the RFC1305 NTP can support clock accuracy in the sub-microsecond range. In practice the primary factor affecting accuracy is jitter due to network and operating system latencies. For example, the typical accuracy expectation for NTP in the public Internet is in the order of 1 to 10 ms and in the 100 Mbit/s Ethernet LANs in the order of 100 μs [Mil12]. For comparison, the accuracy expectation of PTP synchronized clocks in Ethernet LANs is in the order of 100 ns determined primarily by the resolution and stability of a dedicated clock oscillator and counter. The getState command can be used to determine the current state of a load generator (if it is initialising, still active or already terminated). Important load generation statistics (number of generated packets, total generation time elapsed, packet and data rate, etc.) can be obtained using the getReport command.

6.3. Implementation Aspects In the following we describe a set of key implementation details for each particular component of the distributed system for load generation in this thesis. Load generators: already existing components of the UniLoG load generator (Generator and Adapter) have been extended to be able to start the load generator at the specified physical point of time (e.g. 2014.10.14 – 19:04:40:345, CET).

176

6. Distributed UniLoG Architecture

Load agents: have been implemented in the Python programming language (which is freely available at https://www.python.org/) mainly due to its outstanding scripting capabilities used to implement the remote control and configuration functions for the load generators in the load agents. Further, an event-driven networking engine for Python called Twisted2 has been used in the load agents in order to implement a miniaturized HTTP server for the processing of control commands encapsulated into HTTP requests. Authentication and encryption: the authentication of the management station and the load agents is implemented by means of X.509 certificates. We established an own Certification Authority (CA) using the OpenSSL library3 , generated the corresponding public and private key pairs for the management station and the load agents and, finally, signed them using the established CA. After the authentication procedure is completed, encrypted TLS communication channels are established between the management station and the load agents using the Python wrapper module around the OpenSSL library (pyOpenSSL 0.14). These channels are used by the management station to forward control commands from the experimenter to the load agents and to receive status reports from the load agents. Management station: has been implemented by us as a separate Python module which allows, among others, to forward the control commands submitted by the experimenter to the load agents involved into the experiment via the established TLS control channels. The management station maintains a list of the load agents associated with it and stores the location (IP address and port number) of each load agent.

2 Twisted

for Python is available on the Python site https://pypi.python.org/pypi/ Twisted 3 OpenSSL is available at https://www.openssl.org/

7. Load Generation at Network Layer Service Interfaces At the moment of writing this thesis, UniLoG adapters for IPv4, UDP, TCP as well as HTTP service interfaces have been implemented and an adapter for injecting VoIP calls at the Session Initiation Protocol (SIP) service interface is being under development. In this chapter we present the design of the UniLoG adapter for IPv4 network service interface (further referred to as UniLoG.IPv4) and discuss the most important details of its implementation. Performance characteristics of the implemented adapter are of particular interest for the experimenter and we present the results of the corresponding performance evaluation tests carried out by us. Details of the earlier adapter version implemented on top of the FreeBSD operating system can be found in the publications [KoK09, KoK10].

7.1. Application Scenarios and Requirements As we have already stated in Sec. 6.1 the choice of the target service interface for load generation needs a very careful consideration before performance evaluation tests are carried out. Assuming that the experimenter decided to generate loads at the IPv4 network service interface, the UniLoG load generator must be used in combination with the corresponding UniLoG.IPv4 adapter. In particular, the UniLoG.IPv4 adapter can be used for testing of different types of network hardware devices and network applications (which may include both software and hardware components), for example: Network hardware devices: switches, routers, firewalls, etc., implemented in hardware. Here, we should remark that the UniLoG load generator is a software-based solution which can be reasonably used for performance testing of network hardware devices when the performance characteristics of the UniLoG.IPv4 adapter are sufficient for the particular test scenario.

© Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_7

178

7. Load Generation at Network Layer Service Interfaces

Packet-based applications: network bridges and firewalls, NAT/NPT boxes, NetFlow [RFC3951] probes, IPFIX and sFlow probes (RFC5470–RFC5473), VoIP probes, packet-to-disk applications. Flow-based applications: intrusion detection systems, intrusion protection and prevention systems, OpenFlow switching [KAB08], network performance analysers, HTTP(S) traffic analysers, network latency monitors. According to the UniLoG approach for load modelling and generation, an adapter for the IPv4 service interface is responsible for the execution of the following tasks: 1. Initialization of the corresponding real IP request for each abstract IP request generated according to the specifications in the UBA. The initialization step includes, among others, the determination of the required system calls and their parameters needed to inject real IPv4 requests at the IP service interface (using, e.g., the raw IP socket interface). 2. Injection of the pre-initialized real IPv4 requests (represented by the corresponding IPv4 packets) at the IP service interface exactly at the points of time specified in the UBA. 3. In order to provide support for distributed system reactions in the UBA models, there must be a facility to capture the incoming IP traffic at the local IP interface being used by the adapter. The amount of data to be captured can be restricted by means of user-defined filters. 4. From the traffic captured at the local IP interface the adapter identifies ICMP packets with relevant information for the UBA model and transforms them into the abstract system reaction messages which can be evaluated in the S-states or conditional expressions in the UBA. Furthermore, the implementation of the adapter should be extensible. For example, it may become necessary to include support for IPv6 in the adapter without lot of effort. It should be remarked, that the rate of IPv6 adoption is vastly increasing in the provider networks. For example, Google is continuously measuring the availability of IPv6 connectivity among its users. The percentage of users that accessed Google daily over IPv6 on January 1, 2014 has been around 2.85%. In 2014 it was steadily growing and reached the mark of approximately 5.9% on January 1, 2015, see [IPv6S].

7.2. Design of the UniLoG.IPv4 Adapter

179

7.2. Design of the UniLoG.IPv4 Adapter The main responsibilities of the IPv4 adapter addressed in the previous section motivate the partitioning of its architecture into three main functional components Request and Event Mapper, Request Injector, and Event Capturer (cf. Fig. 7.1).

Figure 7.1.: Architecture of the UniLoG.IPv4 adapter (own Fig.).

For each abstract IPv4 request (ti , ri ) from the RQ the Request Mapper determines the corresponding system call(s) at the raw socket interface needed to inject the real IP request (usually it will be the socket.send() system call with different parameters). For each system call, the mapper iterates through the list of its parameters and determines the parameter values according to the specifications in the corresponding attributes of the abstract request (ti , ri ). Recall that the mapper uses the semantics field (, cf. Sec. 3.4.2) of the abstract request type and each of its associated request attributes in order to determine the required system call(s) and their corresponding parameters, respectively. Finally, the preinitialized real IPv4 request (t∗i , ri∗ ) is handed over to the Request Injector component. The Request Injector is responsible for the timely injection of the preinitialized request (t∗i , ri∗ ) exactly at the specified time instant t∗i . Request

180

7. Load Generation at Network Layer Service Interfaces

Injector polls a high-precision clock (which is provided, e.g., by a QPC clock on Windows systems, see Sec.5.5) in order to meet the specified injection time as accurately as possible on the given system. Because the Request Injector communicates directly with the IP service provider and executes the system calls at the raw sockets interface, it can be charged with additional tasks of gathering statistical data (e.g., for the calculation of the packet and the data rate for generated streams, or for the estimation of possible differences between the factual and the specified injection times, etc.). Further, in case local system reactions are needed in the given UBA, the Request Injector is in charge of receiving the return value and/or status codes from the execution of each subsequent system call, saving and forwarding them back to the Event Mapper. Moreover, in case the UBA does also consider distributed system reactions, the Event Capturer is activated during the execution of the adapter and captures the ICMP messages (t∗j , e∗j ) indicated at the local IP service interface. The Event Mapper transforms the ICMP messages into the abstract system reaction messages (tj , ej ) which are inserted into the Event Queue and can be processed in the S-states and context expressions of the UBA. For the implementation of the basic modules of the adapter we used a number of additional C/C++ libraries available for Linux and Windows operating systems. For example, the injection of IPv4 packets in UniLoG.IPv4 has been implemented using the open source library libnet (the project moved from www.packetfactory.net to http://sourceforge. net/projects/libnet-dev/, last updated January 30, 2014). In order to capture system events at the IPv4 interface, the libpcap library has been used which is freely available at http://www.tcpdump.org/ for Linux and at http://www.winpcap.org/ for Windows operating systems (the official name of the Windows library version is WinPCap). In order to inject IPv4 packets into the network, the Request Injector can use either the raw socket interface directly (which is provided e.g. by the socket libraries on Windows and Linux systems) or it can use the more abstract methods from the libnet library which internally uses the same raw socket interface. In both cases, the automatic addition of an IPv4 header is a configurable option of the raw socket (by means of the IP HDRINCL socket option). That means that in the default case (IP HDRINCL socket option is set to false) the operating system does the most part of work: 1. the source and destination IPv4 address fields are set to values provided to the send() call by the Request Injector,

7.2. Design of the UniLoG.IPv4 Adapter

181

2. other fields of the header (e.g., ttl, tos, and identification) are set according to the socket options been set and/or IPv4 stack configuration settings, 3. the pointer to a data buffer provided to the socket send() call is used to populate the payload part of the packet, and, finally 4. the IP checksum is computed. The other option for the Request Injector is to prepare the IPv4 completely on his own (i.e., to fill every field of the IPv4 packet structure with correct values and to conduct the required checksums computation). In order to be able to provide such user-prepared IPv4 header for each packet, the IP HDRINCL option for the raw socket must be set to a non-negative value. It should be remarked, that when the generated IPv4 packets should contain payloads from other protocols than the test traffic protocols (specified by means of protocol numbers 253 or 254 in the protocol field of the IPv4 header), the injection of such packets at the raw socket may be restricted by the operating system due to security reasons. For example, on Microsoft’s client operating systems (Windows 7, Windows Vista, Windows XP with Service Pack 2 (SP2), and Windows XP with Service Pack 3 (SP3)) the ability to send traffic over raw sockets has been restricted in several ways [MSDN3]. First, TCP data cannot be sent over raw sockets. Packets containing TCP payloads (protocol number 6) are discarded by the raw socket in order to prevent for different types of TCP attacks (e.g., TCP SYN flood, connection reset attacks, etc.). Second, UDP datagrams with an invalid source IP address cannot be sent over raw sockets. The IP source address for any outgoing UDP datagram must exist on a network interface or the datagram is dropped. This change was made to limit the ability of malicious code to create distributed denial-of-service attacks and limits the ability to send spoofed packets (UDP/IP packets with a forged source IP address). Consider that the above restrictions do not apply, however, to Microsoft’s server series of operating systems (Windows Server 2008 R2, Windows Server 2008 , Windows Server 2003), or to versions of the client operating system earlier than Windows XP with SP2. At the time of writing, the author was not aware of any comparable restrictions on the use of raw IP sockets on Linux platforms (PF PACKET socket family). This does not mean, however, that the security issues related

182

7. Load Generation at Network Layer Service Interfaces

to the use of raw sockets shouldn’t be considered in the implementation of the UniLoG.IPv4 on Linux systems, too. The issue discussed above just strengthen our argumentation against the injection of TCP requests at the raw IP socket when the real intention of the experimenter has been in fact to generate realistic TCP traffic loads. When realistic TCP traffic loads are to be generated, it cannot generally be recommended to produce the plain TCP Protocol Data Units (PDUs), include them into the payload part of IPv4 packets, and injected the resulting IPv4 packets at the raw IPv4 socket. Such approach would completely miss out the additional TCP protocol techniques (e.g., slow start, congestion avoidance, fast recovery, and fast retransmit, etc.) employed in different versions and variants of TCP. Instead of that it can be rather recommended to model load directly at the TCP service interface and inject the generated requests using the corresponding system calls available for example in the TCP transport sockets (i.e. in the way the UniLoG.TCP adapter accomplishes this task). Finally, we note that the importance of the correct choice of the target service interface for request injection has been always strongly emphasized during the elaboration of the UniLoG approach (cf. Sec.2.3). In the following sections we describe the abstract request types, request attributes, and types of system reactions which are supported by the UniLoG.IPv4 adapter.

7.2.1. Abstract IP Requests An example of an abstract IPv4 request type InjectIPPacket is shown in Fig. 7.2. In this particular case, the abstract request type contains three request attributes payloadLength, destination, and protocol which represent the length of user data to be transmitted with the packet, IPv4 address of the receiving host, and the protocol whose PDU are encapsulated in the IPv4 packet, respectively. The complete set of attributes available for the definition of abstract IPv4 requests supported by the UniLoG.IPv4 adapter is described in the following. We should remark that the abstraction level used currently in the UniLoG.IPv4 adapter is rather low and the semantics of the supported abstract request attributes is mainly oriented on the semantics of the corresponding fields from the IPv4 header. More complex abstract request types and/or request attributes (e.g. an IPv4 flow as a single request type) can be specified using already existing techniques provided by the UBA concept (e.g. by means of additional states and transitions).

7.2. Design of the UniLoG.IPv4 Adapter

183

1 2 8 16 destination 17 IPv4.SENDTO.DESTADDRESS 18 ipv4addr 19 32-bit destination IPv4 address 20 ...... 21

22 30 3 TCPOpenRequest 4 TCP.OPEN 5 Generate a new (active) TCP connection open request 7 9 localTCPPort 10 TCP.OPEN.SOURCEPORT 11 unsignedShort 12 Port number of the local TCP instance/socket 13 7777 14

15 23 remoteTCPPort 24 TCP.OPEN.DESTPORT 25 unsignedShort 26 Port number of the remote TCP instance/socket 27 8888 28

29

Figure 8.2.: Definition of the abstract TCP request type TCPOpenRequest (localTCPPort, remoteIPAddress, remoteTCPPort) to model the generation of active TCP connection requests (own Fig.).

is because the connected local and remote sockets are uniquely determined by the attributes of the connection establishment request TCPOpenRequest( localTCPPort, remoteIPAddress, remoteTCPPort) generated prior to all data transmission requests. An example of the definition of an abstract request type TCPSendRequest(payloadBuffer, payloadLength) to model the transmission of payloadLength byte of user data stored in the payloadBuffer array is shown in Fig. 8.3. Consider that the definition of TCPSendRequest(payloadBuffer, payloadLength) in Fig. 8.3 contains only two abstract request attributes to specify the length payloadLength of the user data portion from the

8.1. Design and Architecture of the UniLoG.TCP Adapter

199

1 2 8 18 payloadLength 19 TCP.SEND.LEN 20 unsignedInt 21 Length of the data portion to be sent (in byte) 22 1240 23

24

Figure 8.3.: Definition of the abstract TCP request type TCPSendRequest (payloadBuffer, payloadLength) to model the transmission of payloadLength byte of user data from the payloadBuffer array using TCP (own Fig.).

payloadBuffer array to be transmitted with the particular send request. In case the connection open request has been omitted in the UBA or the virtual user(s) contained in the UBA require more than one TCP connection, the definition of the TCPSendRequest() must be extended by the attributes localTCPPort, remoteIPAddress, and remoteTCPPort to specify the local and the remote TCP sockets to be used for the corresponding socket send() calls. The resulting abstract request type (e.g., TCPSendRequest( localTCPPort, remoteIPAddress, remoteTCPPort, payloadBuffer, payloadLength)) will help to resolve ambiguity in the UBA in respect to the TCP connection to be used for send() calls in case many connections are modelled in the UBA. In case when modelling of the connection establishment request has

200

8. Load Generation at Transport Layer Service Interfaces

been omitted in the UBA, the adapter must check the corresponding TCP connection before any call to send() socket method can be made. In particular, it means that the adapter must determine (from the abstract request of type TCPSendRequest(localTCPPort, remoteIPAddress, remoteTCPPort, payloadBuffer, payloadLength) the local and the remote TCP sockets to be used for the transmission and check whether these two sockets are already initialized and connected to each other. If needed, the local TCP socket is created (by means of the socket() call), the required socket options such as the TCP send window are set (by means of the setsockopt() call), and, finally, the local socket is connected to the foreign (remote) TCP socket specified by the remoteIPAddress and remoteTCPPort attributes of the abstract TCP send request. In order to provide a means to close a particular TCP connection (e.g., when the UBA contains a set of virtual users which require many TCP connections and some of them must be closed and reopened) the adapter supports an abstract request type TCPCloseRequest(localTCPPort, remoteIPAddress, remoteTCPPort) (cf. Fig. 8.4). As can be seen from Fig. 8.4, the abstract request type TCPCloseRequest( localTCPPort, remoteIPAddress, remoteTCPPort) contains the same abstract request attributes as the connection open request in order to uniquely specify the TCP connection to be closed. The virtual user in the UBA may generate a connection close request at any time on his own initiative, or in response to various prompts (system reactions) from the TCP (e.g., remote close executed, transmission time-out exceeded, destination inaccessible, etc.). Consider that closing connections is intended to be a graceful operation in the sense that outstanding send requests will be transmitted (and retransmitted), as flow control permits, until all of them have been serviced [RFC793]. In case when the connection close request type is omitted in the UBA, the adapter will try to gracefully close each of the connections established during the UBA execution by means of the closesocket() call for the corresponding local TCP sockets. At this point we should refer to some aspects of the implementation of the UniLoG.TCP adapter which may also have an impact on its performance characteristics. For example, the Request and Event Mapper component requires a certain amount of time, referred to as generation overhead, to convert each abstract TCP request into the corresponding real request at the TCP service interface (i.e., into the corresponding Winsock API call). In case the UBA model considers TCP system reactions, there is also a certain

8.1. Design and Architecture of the UniLoG.TCP Adapter

201

1 2 8 16 remoteIPAddress 17 TCP.CLOSE.DESTADDRESS 18 ipv4addr 19 IPv4 address of the remote TCP instance/socket 20 10.0.0.8 21

22 30

Figure 8.4.: Definition of the abstract TCP request type TCPCloseRequest (localTCPPort, remoteIPAddress, remoteTCPPort) to model the connection close requests (own Fig.).

amount of time needed for the generation of the abstract system reaction messages from the real system reactions indicated to the adapter. It should be noted that the generation overhead induced in Request and Event Mapper per abstract request may be unintentionally increased by means of improper implementation decisions like copying large blocks of data between the objects used to represent the abstract and the real requests. In such situations the adapter should maintain only one single instance of request data and use pointers (references) to access it from both abstract and real requests. Further, the use of statically preallocated (proclaimed) buffers instead of dynamically allocated buffers for send and

202

8. Load Generation at Transport Layer Service Interfaces

receive operations can help to avoid performance drawbacks in the adapter. During the implementation of the UniLoG.TCP adapter in this thesis we paid special attention to the fact that the generation overhead in Request and Event Mapper will not be dependent on the length of the abstract requests or the payload data buffer specified in the abstract requests.

8.1.2. TCP Load Receivers As already noted in the beginning of Sec. 8.1, the load receiver component is provided on the TCP server side to listen for (and accept) the incoming TCP connection requests and to read out the TCP receive buffer (“window”) by means of recv() socket calls. In particular, the UniLoG.TCP provides the following functions: Connection handling: After the initialization, the load receiver listens for incoming TCP connection requests (generated by means of the connect() socket call from the TCP clients) on the TCP server port specified by the experimenter. For every new incoming TCP connection request, the load receiver issues a corresponding accept() socket call which selects a new TCP socket on the server and connects it with the remote TCP socket indicated in the connection request. In order to improve concurrency, a new separate worker thread is created and used to receive the TCP data transmitted over the new connection. This measure improves the concurrency in the load receiver. In particular, it is possible to 1) receive data from a number of different TCP clients concurrently, and 2) receive data and accept a new connection request concurrently (while accepting many connection requests in parallel is not possible in general). Receiving TCP data: The main responsibility of the TCP load receiver is to receive and retrieve the binary data block(s) encapsulated into the TCP requests (which are generated by the UniLoG.TCP adapter part on the TCP client side). Recall that the received data is first stored in the TCP receive buffer provided by the receiving TCP protocol instance and can be read out from it by means of the subsequent recv() socket calls into the receive buffer provided by the receiving application (i.e. by the UniLoG.TCP load receiver). In order to improve latency and retrieve the application data as soon as possible, the UniLoG.TCP load receiver should use a rather small application buffer for the recv() socket call of, e.g., 8192 byte (which is also the default size of the receive window used by the receiving TCP protocol instance internally). Recall that the most available TCP implementations (including Windows, FreeBSD, and Linux) set the PSH bit in the header of

8.1. Design and Architecture of the UniLoG.TCP Adapter

203

the TCP segment which transports the last part of the application data block (buffer) submitted to the send() socket call. In this way, the sending TCP instance is instructed to send out the corresponding TCP segment(s) immediately and not to wait until the TCP sending buffer becomes full. Second, the receiving TCP instance is instructed to pass the received data from the TCP receive buffer to the application (i.e., to the data buffer specified in the recv() socket calls in the UniLoG.TCP load receiver) as soon as possible [StW98]. Statistics: The TCP load receiver collects transmission statistics on each accepted connection. In particular, such parameters as the total number of bytes received, number of completed recv() calls, total connection duration, number of concurrent connections served, etc., are recorded in the corresponding counters during the execution of the load receiver.

8.1.3. Supported Types of Abstract System Reactions The specification of the TCP service in [RFC793] postulates that in providing interprocess communication facilities, the TCP must not only accept commands (requests), but is also required to return information to the processes it serves. The latter may consist of: • General information about a connection (e.g., interrupts, remote close, binding of unspecified foreign socket). • Replies to specific user commands indicating success or various types of failure. In the UniLoG concept, delivery of such type of information may be seen as distributed and also as local system reactions, dependent mainly upon the condition whether the foreign host was involved into the decision to generate the reaction indicated to the local host or not. Further, the user commands submitted at the TCP service interface receive an immediate return and possibly a delayed response via an event or pseudo interrupt. The immediate delivery of a return or status code represents rather local system reactions in the UniLoG concept while the receipt of a delayed response corresponds rather to distributed system reactions (because the remote host is mostly involved into their generation). It should be noted that the concrete types of system reactions indicated to the TCP user depend not only upon the type of the last issued abstract request, but in many cases the processing required is also dependent on

204

8. Load Generation at Transport Layer Service Interfaces

the current state of the TCP protocol instance (e.g., CLOSED, SYN-SENT, SYN-RECEIVED, ESTABLISHED, etc., cf. the TCP state diagram in [RFC793]). In the following we describe the types of system reactions possible for selected combinations of the request type issued and the current state of the TCP instance: After an open() call: Consider that the UniLoG.TCP adapter can issue only active open() calls while the UniLoG.TCP load receiver always issues passive open() calls. In the CLOSED state of the sending TCP instance (i.e., when the transmission control block (TCB) to hold connection state information does not yet exist) the following types of system reactions are possible: • if the foreign socket is unspecified, return ‘‘error: unspecified’’,

foreign socket

• if the caller does not have access to the local socket specified, return ‘‘error: illegal connection for this process’’, • if there is no room to create a new connection, return ‘‘error: ficient resources’’.

insuf-

In all other states of the sending TCP instance excluding the LISTEN state (e.g., SYN-SENT, SYN-RECEIVED, ESTABLISHED, etc.,) when the connection request has already been sent return ‘‘error: connection already exists’’. After a send() call: The possible types of system reactions are, again, dependent on the current state of the TCP instance. • In the CLOSED state of the sending TCP instance: if the user does not have access to a connection, then return ‘‘error: connection illegal for this process’’. Otherwise, return ‘‘error: connection does not exist’’. • In the SYN-SENT, ESTABLISHED, and CLOSE-WAIT states: if there is no room to queue the send() request, respond with ‘‘error: insufficient resources’’. If the foreign socket was not specified, then return ‘‘error: foreign socket unspecified’’. • In the states FIN-WAIT-1, FIN-WAIT-2, and CLOSING STATE: the connection close request has already been issued so that no further send requests can be served. A system reaction of type ‘‘error: connection closing’’ is returned. After a close() call: In case the TCP instance is in the CLOSED or CLOSING state, return ‘‘error: connection does not exist’’.

8.2. Performance Evaluation

205

8.1.4. Types of Supported Traffic Matrices Depending upon the implementation of the TCP, the local network and TCP identifiers for the source address will either be supplied by the TCP or the lower level protocol (e.g., IP). These considerations are the result of concern about security, to the extent that no TCP instance should be able to masquerade as another one [RFC793]. In particular, it means that it should not be possible (at least not without the collusion of the TCP implementation) to generate TCP segments with a source IPv4 address different than the IPv4 address of the local IPv4 protocol instance. As a direct consequence, the UniLoG.TCP adapter must be installed in the experimental network on every host which should generate TCP load streams (of course in case the source of the TCP streams is important for the particular experiment). Here, the injection of TCP streams on one “central” host A such, that the generated streams will appear to be originating from hosts B, C, etc., is not possible due to the security reasons explained above. Therefore, a stand-alone UniLoG.TCP adapter can support generation of traffic with the 1:n traffic matrix (1:n denotes here “one sender to many receivers”) when the generated traffic is observed at the IP layer. When we observe the generated traffic at the TCP layer and take into account different TCP port numbers used on the sender and receiver side, the possible traffic matrix extends to m:n (which means “many senders to many receivers”). Further, multiple NICs can be used in the same load generating host to support more than one IPv4 protocol instance (so called “multi-homing”). In such a multi-homed IPv4 host many TCP adapters can coexist when each of them is assigned to a different IPv4 protocol instance.

8.2. Performance Evaluation The performance of the UniLoG.TCP adapter can be adequately characterized by means of the request rate and the corresponding data rate achievable when TCP requests of particular lengths are generated on a specified network link, e.g., on a Gigabit Ethernet link. Especially in case of TCP we should consider that the achievable request rate is determined not only by the application rate (i.e., by the number of send() requests produced by our load generator per second) but also by the available network rate which can strongly fluctuate. As it was already mentioned, TCP uses an internal send buffer to store the outgoing TCP data and to react to changes in the available network rate by means of a congestion control mechanism. In most cases, the completion of a blocking

206

8. Load Generation at Transport Layer Service Interfaces

send() request (i.e., the returning of the function from the blocking send call) in the application only indicates that the data buffer in an application send call is copied to the Winsock kernel buffer and does not indicate that the data has hit the network medium. Hence, the time required to inject a real TCP request into the network by means of the blocking send() socket call can be dependent on the current network conditions. For example, the local TCP send buffer may become full due to the sporadic congestion situation on the bottleneck link in the network. In this case, the TCP stack will not report the next send() request as been completed to the adapter application immediately after its data buffer has been copied into a Winsock kernel buffer. We will explain this point in more detail later in this section. Here we should emphasize that in the UniLoG workload modelling concept, the time required in the TCP stack to execute advanced processing after the request injection at the TCP service interface belongs clearly to the request service time in the (transport) sub-system (transport layer). So, it is an aspect of the system behaviour (represented by the system domain S in the UniLoG approach) and does not belong to the environment E emulated by our UniLoG load generator and the UniLoG.TCP adapter. From the reasons explained above it should be clear that the time required to inject the TCP requests at the TCP socket interface is a characteristic of the TCP transport system and cannot be controlled from our UniLoG load generator (at least not without lot of additional emulation effort on the network hosts and links participating in the particular workload experiment). This fundamental fact should be clearly kept in mind during the discussion of performance characteristics of the UniLoG.TCP adapter. In order to analyse the influence of the TCP send buffer size during the execution of the blocking send() socket calls, we conducted a series of experiments with the UniLoG load generator in combination with the UniLoG.TCP adapter and describe their results in the following. • The sending and the receiving host have been connected directly to each other (i.e., without using any routing or switching hardware) in order to minimize the TCP round-trip time between the hosts and ensure that the TCP sending rate is not limited by the network. The sending host was a Windows 7 Pro SP1 64-bit machine with an Intel(R) Core(TM) i7-2600 CPU @ 3.40 GHz, 8.00 GB RAM, and the Broadcom NetLink Gigabit Ethernet network adapter. The receiving host was a Windows 7 Ultimate SP1 32-bit machine with an Intel(R) Core(TM) 2 Quad CPU Q8200 @ 2.33 GHz, 4.00 GB RAM, and the Realtek PCIe Gigabit Ethernet Family Controller.

8.2. Performance Evaluation

207

• The UniLoG load generator used a UBA model for a TCP sender which generates TCP send requests over a single TCP connection with constant payload length of 1460 byte and infinitesimal inter-departure times between the requests (a so-called “full queue” model for the single TCP connection). The sender terminates after generating a total of 10 · 106 send requests. • The UniLoG.TCP load receiver has been configured to remove the received requests from the TCP receive buffer as fast as possible. The size of the application buffer provided to the receive() socket calls has been specified to 8192 byte in order to correspond to the default size of 8192 byte for the receive buffer which has been used by the TCP receive instance internally. Further, the generation of the Ethernet PAUSE frames according to the IEEE 802.3x Flow Control specifications has been deactivated in the receiving host. Recall that an overwhelmed Ethernet network node can send a PAUSE frame, which halts the transmission of the sender for a specified period of time [IEEE802.3x]. • In each experiment the UniLoG.TCP adapter established a single TCP connection and injected 10 · 106 TCP send() requests into the network. The size of the TCP send buffer has been modified from experiment to experiment (send buffer sizes of 8 KB, 16 KB, 32 KB, 64 KB, and 128 KB have been used). In every run we measured the total experiment duration (time to inject 10 · 106 requests at the TCP socket interface), the request rate (number of send() requests generated per second), and the corresponding data rate (i.e. the utilization of the Gigabit Ethernet link between the sender and the receiver host). Each experiment with the particular send buffer size has been repeated ten times in order to obtain significant statistical results. The results of measurements are presented in Tab. 8.1 (consider the 95% confidence intervals calculated for the measured values of the experiment duration, the request rate, and the data rate). The results of measurements presented in Tab. 8.1 confirm the fact that the performance of the sending TCP protocol instance can be controlled to a large extent by adjusting the size of the TCP send buffer. Increasing the size of the send buffer in the experiments from 8 to 16 [KB] and further up to 128 [KB] allows to store more send() requests in the local TCP send buffer so that after the completion of the corresponding blocking TCP socket send() call more requests per second will be reported to the adapter as already being injected. This fact can be confirmed by the decrease in the total duration of the experiment (which is the time required to generate a

208 Send buffer size [KB] 8

8. Load Generation at Transport Layer Service Interfaces Payload size [byte] 1460

Total dura- Request rate [rps] Data rate tion [s] (mean, (mean, 95% CI) [M bit/s] (mean, 95% CI) 95% CI) 284.54 35155.06 425.79 [277.79;291.30] [34302.28;36007.84] [415.47;436.12] 16 1460 215.06 46519.59 563.44 [208.88;221.23] [45141.68;47897.49] [546.76;580.13] 32 1460 146.46 68286.93 827.09 [144.26;148.65] [67245.55;69328.32] [814.48;839.70] 64 1460 129.11 77457.85 938.17 [127.58;130.65] [76526.76;78388.94] [926.89;949.44] 128 1460 124.28 80462.17 974.56 [124.23;124.34] [80424.85;80499.49] [974.11;975.01] Table 8.1.: TCP send() request rate and the corresponding utilization of the Gigabit Ethernet link measured during the generation of 10 · 106 requests using different sizes of the TCP send buffer, 95% confidence intervals (own Tab.).

total of 10 · 106 requests) and the corresponding increase in the request rate with larger send buffer sizes (see Tab. 8.1). Consider that increasing the send buffer size to 64 KB or further to 128 KB leads to the utilization level of 93.8% and 97.4%, correspondingly (using only one single TCP connection). Further, we have investigated the typical size of the TCP send buffer on different versions of the Windows operating systems and found out that the default size of the TCP send buffer is defined to be 8192 byte (8 KB) on the Windows XP and Windows Server 2008 as well as on the most recent Windows 7, Windows 8.1, and Windows 10 desktop operating systems versions1 . After we have obtained this important system information we used the default send buffer size of 8 KB in all our further experiments in order to obtain the performance characteristics of the UniLoG.TCP adapter when it is used in realistic TCP scenarios. In the second series of experiments to evaluate the performance characteristics of the UniLoG.TCP adapter we used a fixed size of the internal TCP send buffer of 8192 byte (8 KB). In each experiment within the series a different size of the TCP payload has been used. The size of the TCP payload has been chosen in the range from 10 byte to 1460 byte (so that the resulting total Ethernet packet length amounted to 64 byte or 1514 byte, 1 The

Winsock getsockopt() system call has been used in order to determine the initial value of the SO SNDBUF socket option which determines the size of the TCP send buffer

8.2. Performance Evaluation

209

Time to generate 10*106 packets [s]

350

300

250

200

150

100

Experiment duration (Time to generate 10*106 packets) Experiment duration (Nagle disabled)

50

0 10

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

TCP payload size (Byte)

Figure 8.5.: Experiment duration (the time required to generate a total of 10 · 106 TCP send() requests) for different TCP payload sizes, TCP send buffer size set to 8192 byte (own Fig.).

correspondingly). On the receiver side, the UniLoG.TCP load receiver used the same size for the application receive buffer as the UniLoG.TCP adapter on the sender side for the application send buffer. Each experiment has been terminated after a total of 10 · 106 TCP send() requests has been generated by the UniLoG.TCP adapter. We measured the duration of the experiment (i.e. the time to generate a total of 10·106 send() requests), the achieved TCP send() request rate and the corresponding data rate (observed at the Gigabit Ethernet link between the sender and the receiver). The measurement results are presented in Fig. 8.5, Fig. 8.6, and Fig. 8.7, correspondingly. As can be seen from Fig. 8.5 the time to generate the total number of 10 · 106 send() requests increases almost linearly with the increasing TCP payload length specified in each experiment. However, special attention must be paid to the experiments with payload lengths smaller than ca. 128 byte and larger than or equal to 1400 byte. From Fig. 8.5 we see that for payload lengths smaller than ca. 128 byte the duration of the experiment appears to decrease stronger with the decreasing payload length. On the other side, for payload length of 1400 byte and 1460 byte the experiment duration unexpectedly falls below the value measured for the payload length of 1300 byte. A more precise look at the Figures 8.6 and 8.7 gives a clue of what exactly happens with the TCP send requests for these particular payload lengths.

210

8. Load Generation at Transport Layer Service Interfaces

TCP send() requests per second [rps]

400000

TCP send() request rate TCP send() request rate (Nagle disabled)

350000

300000

250000

200000

150000

100000

50000

0 10

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

TCP payload size (Byte)

Figure 8.6.: The rate of blocking TCP send() requests achievable for different TCP payload sizes, TCP send buffer size set to 8192 byte (own Fig.).

Ethernet data rate [Mbit/s]

400

350

300

250

200

150

100

Nagle enabled Nagle disabled

50

0 10

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

TCP payload size (Byte)

Figure 8.7.: The data rate of the TCP stream achievable on a Gigabit Ethernet link for different TCP payload sizes, TCP send buffer size set to 8192 byte (own Fig.).

Such, for payload lengths starting from 300 byte the request rate increases much stronger with the decreasing payload length and escalates for the specified payload length below 128 byte (cf. Fig. 8.6). This escalation of the request rate led to a surge of the data rate observable in Fig. 8.7 for the

8.2. Performance Evaluation

211

payload lengths below 128 byte. Such, while the data rate achievable for requests with 128 byte payload amounts to 257.14 Mbit/s, it unexpectedly grows up to 347.04 Mbit/s achievable for requests with 10 byte of payload. We supposed that such an exponential increase of the request rate and the unexpected increase of the data rate for smaller payload lengths is most probably due to the Nagle algorithm used in the TCP stack. Recall that the TCP stack (and also its Winsock implementation) enables the Nagle algorithm by default in order to avoid that small data packets congest the network. The algorithm coalesces a small data buffer from multiple send() calls and delays sending it until an ACK for the previous data packet sent is received from the remote host 2 . If the stack has coalesced a data buffer larger than the Maximum Transmission Unit (MTU) (which is 1460 byte in the Ethernet network in our experiments), a full-sized packet (segment) of 1460 byte is sent immediately without waiting for the ACK from the remote host. That means, that the Nagle algorithm came into play in every of our experiments as we specified payload lengths below the MTU of 1460 byte. To optimize performance at the application layer, Winsock copied data buffers from the adapter (application) send() calls to a Winsock kernel buffer (whose maximum length in our experiments was specified to 8 Kbyte using the SO SNDBUF socket option). Then, the stack used the following rules to indicate the send completion to the adapter application (because the adapter invoked the blocking send operation, the completion notification was represented by the function returning from the blocking call): • If the socket is still within SO SNDBUF quota, Winsock copies the data from the adapter application send() and indicates the send completion to the adapter. • If the socket is beyond SO SNDBUF quota and there is only one previously buffered send still in the stack kernel buffer, Winsock copies the data from the adapter send and indicates the send completion to the adapter. • If the socket is beyond SO SNDBUF quota and there is more than one previously buffered send in the stack kernel buffer, Winsock copies the data from the adapter send. Winsock does not indicate the send completion to the adapter until the stack completes enough sends to put the socket back within SO SNDBUF quota or only one outstanding send condition. 2 For

more detailed information, see the Microsoft knowledge base article KB214397 “Design issues - Sending small data segments over TCP with Winsock” available at https://support.microsoft.com/en-us/kb/214397

212

8. Load Generation at Transport Layer Service Interfaces

According to the Nagle algorithm used in the TCP stack in our experiments, data buffers from the subsequent send request have been coalesced and sent out in a single full-sized TCP segment. As a result, a higher (than expected) number of send requests per second may have been reported to the adapter as already been completed by the TCP stack. This induced an unexpectedly high decrease of the experiment duration and the corresponding strong increase of the packet rate especially for smaller payload lengths (below 128 byte). In order to study the behaviour of the request and data rate more thoroughly, we conducted a second series of experiments using the same specified payload lengths for the TCP send requests but disabling the Nagle algorithm (by setting the TCP NODELAY socket option to true). Again, we measured the experiment duration (the time to generate a total of 10 · 106 requests) as well as the achieved request rate and the data rate in the Gigabit Ethernet network. The resulting plots have been embedded into the Figs. 8.5–8.7 as dashed lines. Comparing the plots of the experiment duration, packet rate and data rate to the corresponding plots obtained from the experiments with the disabled Nagle algorithm, we can make the following observations: • When the specified payload length is decreasing (especially in the region from 128 byte down to 10 byte) and the Nagle algorithm is disabled, the experiment duration is not dropping so strongly as in case when Nagle is enabled. This is most probably due to the fact, that each of the small packets is delivered to the remote host immediately when Nagle is disabled (i.e., without coalescing its data buffer with buffers from the other requests and sending them out in a single full-sized TCP segment). For example, to generate the same total number of 10 · 106 requests, 70.67 s are required to generate the 128 byte requests and 47.25 s are required to generate the 10 byte requests when Nagle algorithm is disabled. For comparison, when Nagle algorithm is enabled, 56.87 s are required to generate 128 byte requests and only 14.81 s are required to generate the 10 byte requests. Correspondingly, the packet rate is not strongly escalating at the 128 byte mark when Nagle is disabled and, instead of that, it is rather moderately increasing up to the maximum request rate of 211732.68 rps achievable for requests with payload length of 10 byte. Compare that the maximum request rate achievable with 10 byte request amounts to 677808.74 rps when Nagle is enabled (this data point is not visible in Fig. 8.6). Obviously, the effect of performance optimization achievable by means of the Nagle algorithm for small payload lengths can be best observed from the plots of the data rate in Fig. 8.7. When Nagle is enabled, the

8.2. Performance Evaluation

213

achievable data rate increases from 257.14 Mbit/s (for 128 byte requests) to 323.18 Mbit/s (for 64 byte requests) and further up to 347.04 Mbit/s (for 10 byte requests). After we disabled the Nagle algorithm, the data rate decreased from 206.1 Mbit/s to 190.68 Mbit/s and finally down to 108.41 Mbit/s for the payload lengths of 128, 64, and 10 byte, correspondingly. • When the payload length is specified to 1460 byte (which is exactly the value of the MTU of 1500 byte in our experimental Gigabit Ethernet minus TCP header length of 20 byte and minus IPv4 header length of 20 byte), the value of the experiment duration is identical, no matter Nagle is enabled or disabled (cf. Fig. 8.5). The same is valid for the values of the request rate and the data rate, correspondingly (cf. Fig. 8.6 and Fig. 8.7). Moreover, it seems that the values of the performance characteristics obtained in the experiments with the disabled Nagle algorithm are identical to the values of the performance characteristics measured when Nagle is enabled for the payload lengths decreasing from the MTU of 1460 byte down to 1400 byte (the corresponding data points are not shown in the figures). Based on this observation, we can strongly presume that the Winsock implementation of the TCP stack sent out the next “full-sized” TCP segment already when 1400 byte (and not the full MTU of 1460 byte) of data have been accumulated (coalesced) in the TCP send buffer. A plausible reason for this behaviour of the TCP stack may be to prevent the full-sized segments from fragmentation when IP-tunnelling [RFC2003] is used in the transit network (e.g., in order to provide Virtual Private Network (VPN) services or to deliver datagrams to a mobile node using Mobile IP). Populating 1400 (instead of full 1460) byte of payload into the IPv4 packet would have left enough space for three additional IPv4 headers (each 20 byte long) and thus allow to apply the IPv4-in-IPv4 encapsulation at least three times for the same packet. • An interesting observation can be made when we observe the experiment duration, request rate, and the data rate when the payload length is increased from 128 byte up to 1300 byte and compare the values of these characteristics achievable in case when the Nagle algorithm is enabled to the corresponding values obtained when Nagle is disabled. For example, when we look at the plots of the data rate achievable in the Gigabit Ethernet in Fig. 8.7, we see that in both cases (Nagle on or off) the data rate increases almost linearly when the specified payload length is

214

8. Load Generation at Transport Layer Service Interfaces

increased from 128 byte up to ca. 1300 byte. However, for payload lengths up to 700 byte a higher value of the data rate is achieved when the Nagle algorithm is used. In contrast, for payload lengths larger than 700 byte the higher value of the data rate can be achieved when the Nagle algorithm is disabled. So, the Nagle algorithm seems to provide a performance gain in respect to the maximum achievable data rate only for payload lengths of up to ca. 700 byte. Similar observations can be made in Fig. 8.5 and Fig. 8.6. Such, the time required to generate a total number of 10·106 requests when Nagle is active is less than the time required when Nagle is disabled for payload lengths up to 700 byte. After the payload length exceeds the 700 byte mark, less time is required to generate requests when Nagle is disabled. Consistently, the achievable request rate when Nagle is enabled is larger that the request rate achievable when Nagle is disabled for “smaller” payload lengths below 700 byte. For payload lengths beyond the 700 byte mark, a higher value of the request rate can be achieved when Nagle algorithm is disabled. We guess that the observed behaviour of all three metrics is most probably due to the effects induced by the interaction of the Nagle algorithm and Winsock buffering used on the TCP sender side with the delayed acknowledgement algorithm employed on the TCP receiver side. We will discuss this topic in the following. Recall that when the TCP stack receives a data segment, a 200-ms delay timer goes off. When an ACK is eventually sent, the delay timer is reset and will initiate another 200-ms delay when the next data packet is received. To increase the efficiency in both Internet and the Intranet applications, the Microsoft TCP stack uses the following criteria to decide when to send one ACK on received data packets 3 : • If the second data packet is received before the delay timer expires, the ACK is sent. • If there are data to be sent in the same direction as the ACK before the second data packet is received and the delay timer expires, the ACK is piggybacked with the data segment and sent immediately. • When the delay timer expires, the ACK is sent. 3 Also

here, see the Microsoft knowledge base article KB214397 available at https:// support.microsoft.com/en-us/kb/214397 for more details.

8.3. Aspects of the UniLoG.UDP Adapter Implementation

215

Back to the discussion of the data rate plots in Fig. 8.7, consider that the 700 byte mark is exactly the half of the 1400 byte which we identified to be the limit for the payload length of the “full-sized” TCP segments in our experiments (see the explanations above). So, at least two subsequent send requests can be coalesced (and sent out in one segment) when their specified payload length is below 700 byte. The TCP receiver on the other side would generate a single ACK after two such coalesced packets (each with at least two send request buffers) are received (due to the delayed acknowledgement algorithm described above). The performance optimization results in this case from the reduced number of ACKs to be generated and, therefore, reduced number of round-trip periods required to transmit the specified amount of data. Now consider the case when the specified buffer size of send requests exceeds the half of the MTU (e.g., when it is specified to be 800 byte). In this case, the 800 byte from the first request and the first 600 byte from the subsequent second request can be coalesced and sent out in one TCP segment. The remaining 200 byte from the second send request can only be coalesced with the buffers of the subsequent third and fourth send requests, so that 200 + 800 + 800 - 1400 = 400 byte from the buffer of the last (fourth) send request will remain (because they will not fit into the second full-sized TCP segment). Additionally, every odd-numbered TCP segment will be subject to the delayed acknowledgement mechanism and the sender will not immediately get more SO SNDBUF quota (but only after an ACK generated for the next even-numbered segment is received by the sender). Therefore, when the Nagle algorithm is enabled, the number of the round-trip times required to transmit the specified amount of data is likely to increase for payload lengths beyond the 700 byte mark and the corresponding values of the data rate are likely to be lower than in case Nagle is disabled.

8.3. Aspects of the UniLoG.UDP Adapter Implementation As it has already been explained in the beginning of this chapter, there is not much effort needed to extend the existing UniLoG.IPv4 adapter by the capability of generating requests at the UDP service interface. However, in order to be compliant with the UniLoG approach for workload generation, a UDP adapter for the UniLoG load generator should use an existing UDP service at the local UDP interface (and not the IPv4 service at the IPv4 interface) in the load generating host.

216

8. Load Generation at Transport Layer Service Interfaces

The architecture of the UDP adapter for UniLoG is very similar to the architecture of the UniLoG.TCP adapter presented in Sec. 8.1. An important difference is that an other type of socket (SOCK DGRAM instead of SOCK STREAM) is used for the generation of send() requests at the UDP socket interface. Further differences exist in the set of supported abstract request types. Such, the abstract request types to model the connection establishment and connection close requests are not supported in the UniLoG.UDP adapter as there exist no corresponding service primitives at the UDP interface (and, hence, no corresponding system calls in the UDP socket). In the context of this thesis, the first version of the UDP adapter for UniLoG has been implemented for the Windows operating system using the Windows Sockets (Winsock) API. Parts of the UniLoG load generator including the UDP adapter have been ported to the Linux operating system in order to conduct experiments presented in [Kol12]. Further, in an experimental study presented in [KoB13], the UniLoG.UDP adapter has been prototypically implemented on top of the real-time operating system RTOS-32 [RTOS]. Our main motivation behind the prototypical implementation has been the evaluation of the expected high timing accuracy potentially achievable when the UniLoG.UDP adapter is executed on top of a real-time operating system like RTOS-32. At this place we refer the interested reader to the results presented in [KoB13] and [Bei13].

9. Generation of Web Workloads In this chapter we present the application of the UniLoG approach to the generation of Web traffic and Web server loads. The basic modelling concept along with the main abstract Web workload characteristics has been introduced in Sec. 4.3. In Sec. 9.1 we describe the architecture of the UniLoG.HTTP adapter which is responsible for the generation of real HTTP workloads based on the specifications of the experimenter in the UBA. Two alternative approaches to develop such an adapter (imitating the behaviour of a real Web browser or integrating a real Web browser directly in the adapter) are presented in Sec. 9.2. Construction of a comprehensive, representative, and stable pool of Web sites required to take UniLoG.HTTP in operation is presented in Sec. 9.3. In particular, algorithms for the estimation of abstract Web workload characteristics (like, e.g., replySize, numberOfObjects, or inducedServerLoad) are described in Sec. 9.3.1. Finally, measurement results obtained for the abstract workload characteristics using the proposed algorithms and the browser integration approach for top 1000 most popular Web sites from Alexa ranking list are presented in Sec. 9.3.2.

9.1. Architecture of the UniLoG.HTTP Adapter As already described in Chapter 5 the UniLoG load generator incorporates a formal automata-based workload specification technique and makes use of a distributed architecture to provide a high degree of flexibility in generating traffic mixes with different structure and intensity for various scenarios. The key components of the UniLoG.HTTP adapter are presented in Fig. 9.1. The main execution thread in the Adapter component consumes abstract HTTP requests (ti , ri ) from the request queue RQ. Recall that the abstract HTTP requests are generated by the Generator component in the R-states of the UBA model during its execution. For each abstract HTTP request (ti , ri ), the Adapter invokes the Request Mapper component which is responsible for the allocation of the correspondent real HTTP request from the request pool.

© Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_9

218

9. Generation of Web Workloads UBA Model (Generator)

UniLoG.HTTP

RQ (ti, ri)

EQ (tj, ej)

Adapter

logical time, abstract request

abstract HTTP requests (ti, ri)

(ti, ri) Adapter Thread

(ti, r*real i)

Request Mapper

Pool

real HTTP requests (t*i, r*i) real HTTP system reactions (t*j, e*j)

logical time, real request request

(ti, r*i) tNOW

abstract system reaction messages (tj, ej) Timer

real (ti, r*i) request

Web server (WS1)

Request thread 1

(t*i, r*i)

Request thread 2

(t*j, e*j)



Web server (WS2)

Request thread n

(t*i, r*i) HTTP request

Web server (WSn)



(t*j, e*j) HTTP status codes

Internet

local HTTP service provider (e.g. WinInet library) HTTP requests (GET/POST) HTTP data chunks and status codes

Figure 9.1.: Architecture and basic components of the UniLoG.HTTP adapter (own Fig.).

The request pool contains HTTP requests represented by N tuples ei ∗ ∗ ∗ = (vi,1 , vi,2 , . . . , vi,p , vi,1 , vi,2 , . . . , vi,r ), i ∈ N, 1 ≤ i ≤ N , where N ∗ ∗ is the size of the pool. The elements vi,1 , . . . , vi,r denote the values of r parameters required to build an HTTP/1.1-conform request to the Web page corresponding to ei . Initially, the HTTP request method requestMethod, server name serverName and port serverPort, object name objectName, and the optional data optData of length optLength sent with POST requests are used as parameters of HTTP requests in the pool. Further, vi,1 , vi,2 , . . . , vi,p denote the values of p abstract request parameters which describe the workload characteristics of the request ei as they were specified in Sec. 4.3.1. Concrete values for the abstract request parameters vi,1 , . . . , vi,p can be estimated in the adapter using the captured HTTP traffic induced by the request ei . The corresponding methods for the analysis of the captured traffic and the calculation of the workload characteristics are presented in Sec. 9.3.1. Each time a corresponding real HTTP request ri∗ is to be determined for a given abstract request ri = (a1 , a2 , . . . , ap ), the attributes a1 , a2 , . . . , ap specified in the abstract request ri are used as a search key to match the abstract parameters vi,1 , vi,2 , . . . , vi,p and to extract the real HTTP

9.1. Architecture of the UniLoG.HTTP Adapter

219

request which matches the given abstract request ri as good as possible. The corresponding algorithm is presented in Fig. 9.2. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:

r = (a1 , ..., ap ): current abstract HTTP request with attributes a1 , ..., ap E = e1 , e2 , . . . , eN : the pool of N page requests ∗ , v ∗ , . . . , v ∗ ): the i-th entry in the pool ei = (vi,1 , vi,2 , . . . , vi,p , vi,1 i,2 i,r ∗ , . . . , v ∗ : values of real HTTP/1.1 request parameters vi,1 i,r vi,1 , . . . , vi,p : values of p abstract HTTP request parameters C ←− E  Put all entries into the initial candidates list C for j = 1 to p do  Inspect the attributes a1 , . . . , ap Cj ←− ∅  Initialize the candidates list Cj for step j for i = 1 to size(C) do  Inspect each entry from the candidates list if distance(aj , vi,j ) = mini,1≤i≤size(C) (distance(aj , vi,j )) then  Put element vi,j into the candidates list Cj Cj ←− Cj ∪ vi,j end if end for C ←− Cj  Transfer the list of candidates after step j into C if size(C) = 1 then  Is only one entry left in the candidates list? exit;  Terminate end if end for if size(C) > 1 then  Still more than one entry in the candidates list C? c ←− rand(C)  Choose one entry from C randomly end if

Figure 9.2.: Algorithm used to find the best matching HTTP request from the pool (own Fig.).

The mapping algorithm is designed to find a real HTTP request from the pool, which matches the given abstract HTTP request as good as possible. Such “best possible match” is given by the real HTTP request ri∗ , whose key parameters vi,1 , vi,2 , . . . , vi,p exhibit the minimum possible distance to the values of the corresponding attributes a1 , a2 , . . . , ap of the given abstract HTTP request ri = (a1 , a2 , . . . , ap ). The experimenter can specify different priorities for the parameters vi,1 , vi,2 , . . . , vi,p to determine the order in which they have to be examined. In order to extract the “best possible match”, the request mapper starts with the abstract parameter a1 of the highest priority. The priority of the abstract request parameters can be specified by the experimenter in the UBA. The adapter creates a list of candidates (a “candidates list”) C which consist of requests from the pool with the minimal distance between their parameter vi,1 , 1 ≤ i ≤ N and the attribute a1 of the given abstract request (see Fig. 9.2). In the next step, the abstract parameter a2 with the second highest

220

9. Generation of Web Workloads

priority is examined. Requests with the minimal distance between their parameter vi,2 , 1 ≤ i ≤ N and a2 are kept on the candidates list C. In case of string parameters, especially when the serverName is examined, an exact match instead of the minimal distance characteristic is required in order to determine the candidates. The described procedure is repeated until only one candidate is left or all of the p abstract request attributes a1 , a2 , . . . , ap are examined. In case there are more than one real HTTP requests left in the candidates list C after the last step, one of the requests is selected randomly. The request allocation algorithm specified above is flexible in the number and types of abstract request parameters used for the pool query. Its complexity is determined by the number v of required comparisons which yields to v = N · p for the worst case when all N real requests from the pool match each of the p parameters of the current abstract request. In order to prepare the allocated real HTTP request (ti , ri∗ ) for execution, the adapter creates a new request thread (cf. Fig. 9.1) in suspended mode and makes the parameter settings required to inject a new HTTP request. The adapter polls the timing resource and resumes the request thread when the specified physical injection time t∗i of the request is reached. HTTP status codes (e.g. HTTP/1.1 200 “OK”, HTTP/1.1 304 “Not Modified”, HTTP/1.1 404 “Not Found”, etc.) returned by the request thread represent the local system reaction messages which are inserted by the adapter thread into the event queue EQ.

9.2. Implementation Aspects For the implementation of the UniLoG.HTTP adapter the decision regarding the component used to prepare and to inject the HTTP requests is of particular importance. Usually, not much effort will be required to issue an initial HTTP request to the landing (base) page of a Web site. First, the IP address of the server where the Web page is located (the so called “origin” of the site) is determined by means of a DNS request to the DNS server(s) using the host/domain name part of the specified page URL. Second, a new TCP connection is established to the server with the IP address obtained from the DNS server reply. Thereafter, an initial HTTP request to the landing page can be made. Important for the implementation of the adapter is the phase of rendering the main page when its content has to be parsed for objects embedded into the page or linked from other servers (so called “non-origins” of the site).

9.2. Implementation Aspects

221

Style sheets (CSS), scripts (e.g. JavaScript), images, and Flash objects, etc., quoted by the corresponding HTML tags are typical examples of such embedded objects. At this point, the ability of the UniLoG.HTTP adapter to recognize different types of embedded objects may have a significant impact on the estimated workload characteristics of the given Web page. For each of the recognized objects the adapter must issue an additional HTTP request and load the corresponding object file. Certain types of objects (e.g., text/html) may contain further links to CSS, script, and image objects, which must be downloaded by the adapter in the same manner, and so forth. For objects linked from the non-origin servers of the page, the adapter must first establish a TCP connection to the corresponding server. Prior to the connection establishment, an additional DNS request may be required to determine the IP address of the server (in case it is not already stored in the local DNS cache). Generally, the UniLoG.HTTP adapter can retrieve the files containing the script objects (like, e.g., JavaScript or ActionScript, used for Dynamic HTML (DHTML) and Flash technologies respectively) from the server without lot of effort. But it will rather hardly be possible to provide support for the interpretation of the scripts in the adapter in order to execute them on the client side as it would require the implementation of a corresponding parser, interpreter, and virtual machine for each of the scripting languages. Modern dynamic Web pages do, however, very often use client-side scripting techniques to modify the content of the page while it is being loaded or shown, for example: • JavaScript programs can change the HTML Document Object Model (DOM) tree of the page used to represent its content internally in the browser. Depending on the modification been made (e.g., changing the source of an image file linked into the page) the browser may have to (re)load affected objects from the server and induce additional HTTP request/response pairs. • Client-side scripting languages (e.g. ActionScript used in Adobe Flash objects) can be used to load and control different types of media (sound, video, animations, etc.) in the page presentation which often induce substantial traffic with the server. • JavaScript objects can make use of the XMLHttpRequest object [KASS14] supported by all modern browsers to exchange data with a server after loading a Web page in order to improve the page interactivity and responsiveness. The data can be transferred not only by means of HTTP but

222

9. Generation of Web Workloads

also by means of other protocols (e.g., FTP) and can be encoded using any common content format (e.g., JavaScript Object Notation (JSON), HTML, or plain text) and not only XML as the name of the object suggests. • AJAX concept uses the XMLHttpRequest object as a foundation for the asynchronous communication between the client-side and the server-side scripts in the development of Web applications. A client-side script can update portions of a page based upon user events without reloading the whole page completely, request and receive data from a server after the page is loaded, and send data to a server in the background while the page is viewed. Thus, the AJAX concept is pretty far away from the classical HTTP request/response scheme originally used to retrieve static Web pages. Other types of Internet media embedded into a page (such as Flash objects, Java applets, sound or video files, etc.) to be handled by different browser extensions and plug-ins represent a similar issue (as induced by script objects) for the UniLoG.HTTP adapter as the corresponding components required for the interpretation of these media types are not available or can hardly be provided in the adapter implementation. This is one of the main arguments for the use of a real Web browser to generate page requests in the UniLoG.HTTP adapter. For the reasons explained above it becomes apparent that the exact imitation of browser behaviour in the UniLoG.HTTP adapter may become a very complex and tedious task. Further, there exist unavoidable differences across multiple browsers in respect to the rendering of a page, concurrent request execution and pipelining techniques, number of concurrently used TCP connections to the server, etc. Therefore, along with the primarily used browser imitation approach, an alternative approach of real browser integration has been used for the implementation of the HTTP adapter. These methods are described in the following two sections.

9.2.1. Browser Imitation In this approach, the behaviour of the real browser is imitated in the adapter by implementing the set of actions required to retrieve and load a Web page (e.g., establishing a TCP connection to the server, requesting the page resource specified in the abstract HTTP request, parsing and rendering the received page, generating requests to the recognized embedded objects, and receiving the response data). With this method, parts of the source code

9.2. Implementation Aspects

223

needed to carry out the corresponding measurements to obtain concrete values of abstract workload characteristics for Web pages (introduced in Sec. 4.3.1) are easily accessible. On the other hand, as already stated above, the browser imitation may become rather too complex to implement in particular because of different types of Internet media to be handled by a corresponding browser plug-in (which has then to be implemented also in the adapter) and different types of script objects (e.g., JavaScript, ActionScript) which require an implementation of the corresponding script engine/virtual machine (like, e.g., Tamarin engine for Adobe Action Script or TraceMonkey/SpiderMonkey which is Mozilla’s JavaScript engine in Firefox) to interpret and execute the scripts in the adapter. Furthermore, as has already been mentioned above, it is also questionable how realistic such a browser imitation will be, considering the particular differences across Web browsers in respect to the rendering of the page content, etc. The first prototype of the UniLoG.HTTP adapter has been implemented at the TKRN working group based on the browser imitation approach in [Gah08]. To obtain concrete values of workload characteristics for Web pages in order to provide a pool of Web sites required for load generation, different measurement points have been set in the adapter’s source code. A lot of effort has been spent on the validation of the proposed measurement procedure (see [KoW11] for further details). Such, an instance of the Firefox browser with reduced functionality has been used to obtain the workload characteristics for a predefined set of Web pages. The functional set of the browser has been reduced1 to exactly meet the functional set supported by the browser imitation component in the UniLoG.HTTP adapter in order to allow for comparison of the obtained workload characteristics. The metrics obtained in this way have been compared with the results of measurements for the same set of Web pages using the browser imitation component in the UniLoG.HTTP adapter in order to check the validity of its implementation [Jah10]. The browser imitation component of the UniLoG.HTTP adapter makes use of the WinInet library (see https://msdn.microsoft.com/en-us/library/ windows/desktop/aa385483.aspx) in order to execute the actions required to retrieve a Web page (like managing the server connection, generation and injection of HTTP requests according to the specifications in the abstract HTTP requests in the UBA, receiving the response and processing the response data). The choice of WinInet is motivated by the fact that 1 By

means of altering the UserAgent and Accept request header fields and using an advertisement blocker to prevent downloads from non-origin-servers.

224

9. Generation of Web Workloads

this library has been used by the Microsoft Internet Explorer (IE) browser internally. Thus, using the library enables us to imitate the behaviour of the IE browser in a very realistic manner (however, restrictions on the interpretation of scripts, etc., discussed in the beginning of this section, still remain). In particular, the values of the User-Agent and Accept request header fields used by the Internet Explorer 8.0 have been extracted from the registry key HKLM/Software/Microsoft/Windows/CurrentVersion/Internet Settings and specified to be used in the adapter every time a new HTTP request is created using the corresponding HttpOpenRequest() WinInet API call. The settings are of particular importance because they declare the types of objects supported by the HTTP user agent which is represented by our “imitated browser” in the adapter. Web servers may vary the set of objects sent to the client dependent on the specifications in the User-Agent and Accept fields in the header of HTTP requests generated by the client. Furthermore, similar to the approach applied in [WeX06], a set of specifications controlled by the INTERNET FLAGs (NO CACHE WRITE, PRAGMA NO CACHE, RELOAD, KEEP CONNECTION) has to be made directly in the HttpOpenRequest() calls in order to instruct the adapter to retrieve the objects from the original server and not from the local cache or proxy, using, if possible, a keep-alive connection. These measures should help to ensure that the characteristics of the Web traffic induced by the requests generated in the adapter correspond to the values of workload characteristics specified in the abstract HTTP requests.

9.2.2. Browser Integration In this method, a real browser (e.g., Mozilla Firefox or Microsoft Internet Explorer) is integrated in the control path of the UniLoG.HTTP adapter thread (cf. Fig. 9.1). The actions required to request and completely load the page specified by the URL in the abstract HTTP request are all performed by the chosen real browser, in particular: • While rendering a page not only plain text/html, image, CSS, and JavaScript files but also other types of Internet media (MIME types) like, e.g., video and sound files, Flash, Java applets, etc., can be loaded and handled by the browser or the corresponding plug-in accordingly to the media type. • Script objects (e.g., JavaScript or ActionScript) are not only downloaded from the server but can also be interpreted and executed by the browser.

9.2. Implementation Aspects

225

The scripts may induce further HTTP requests either indirectly by changing the HTML DOM tree of the page (so that the browser has to reload the affected regions) or directly by using XMLHttpRequest primitive to request more content data from the server. • Content (different types of objects embedded into the page) can be downloaded from the non-origin servers of the page as well. Recall that the browser imitation established one single TCP connection to the origin server of the page and was therefore not able to retrieve any content from non-origin and third-party servers (like, e.g., Content Delivery Networks (CDNs), social networking, analytics, advertising, and tracking cookies providers, etc.). • Optimizations in rendering a page are employed, e.g., concurrent TCP connections can be established to the same server, pipelining technique can be used on the same TCP connection, and requests to distinct servers can be parallelized while rendering a page. Because the actions described above are incurred by the real browser, the programming effort remaining in the adapter can be reduced to 1) the remote control of the browser instance (in order to inject HTTP requests to the URLs and at the times specified in the abstract HTTP request), and 2) monitor the ready state of the browser instance (in order to determine the times when the browser has received a requested page or has finished its rendering, and to obtain the result / status codes of the corresponding HTTP requests to be used in the S-states of the UBA). In order to implement the remote control of an Internet Explorer browser instance, a Component Object Model (COM) interface WebBrowser2 available on Windows operating systems [MSDN4] has been used in the UniLoG.HTTP adapter. A new browser instance is created using the global COM call CoCreateInstance(CLSID_InternetExplorer, NULL, CLSCTX_LOCAL_SERVER, IID_IWebBrowser2, (void**)&pBrowser);

which returns a handle (a reference) to a new IWebBrowser2 object in the pBrowser parameter. This reference can be used to execute different methods available for the browser object. For example, in order to invoke the URL specified in the bstrURL parameter the pBrowser->Navigate(...) method can be used:

226

9. Generation of Web Workloads

// prepare the URL to be requested as an OLE String: BSTR bstrURL = SysAllocString(‘‘www.example.com’’); // instruct the browser to not use the cache and history object VARIANT vFlag; VariantInit (&vFlag); vFlag.vt = VT_I4; vFlag.lVal = navNoHistory | navNoWriteToCache | navNoReadFromCache; // invoke the specified URL in the browser control, HRESULT hr1 = pBrowser->Navigate(bstrURL, &vFlag /* DON’T USE CACHE */, &vEmpty, &vEmpty, &vEmpty);

The parameter vFlag can be used to prevent the browser object from writing to and reading from the local cache (in order to induce traffic which will correspond to the workload metrics of the page obtained when the local cache is empty). The pBrowser->put Visible() method can be used to show a new frame for the browser object. However, we don’t make use of this call because the browser object is used in UniLoG.HTTP to generate Web traffic and not to view Web pages. The pBrowser->Navigate(...) method can be used only to initiate the HTTP request to the specified URL and does not return any information on the status or progress of loading the page. In order to monitor the loading state of the page a call to the pBrowser->get ReadyState() or pBrowser->get Busy() methods can be made in a polling loop (until a desired state of the browser object is reached, e.g., the state READYSTATE COMPLETE which is returned by the pBrowser->get ReadyState() call when the requested page and all its embedded objects have been completely loaded). In general, the browser object may be in one of the following five “ready states” while processing a request to a Web page: READYSTATE UNINITIALIZED: The request has been created, but not initialized (i.e., the open method of the HTTP request has not been called). READYSTATE LOADING: The browser object has opened an HTTP request (and the HTTP request header fields have been prepared), but the send method has not yet been called. READYSTATE LOADED: The send method has been called for the HTTP request. No response data is available yet. READYSTATE INTERACTIVE: Some data has been received and the browser can, in principle, start rendering the page although the complete response data is not available yet.

9.3. Construction of the Pool of Web Sites

227

READYSTATE COMPLETE: All the data of the server response has been completely loaded. The point of time when this status is returned can be used to calculate the page load time (see below). Consider that there is a discussion in the Web community on the definition of the term page load time and the corresponding metrics to be used for it (i.e. at what point of time can a page be treated as loaded). Here, the adapter can use the time when the ready state READYSTATE COMPLETE is returned for the base object of the page in order to treat it as loaded (and to leave the corresponding S-state of the UBA). Or the adapter can use a further polling loop with the pBrowser->get Busy() method call to determine whether the browser object is busy with loading the additional embedded objects in the page. When the browser has finished loading embedded objects (pBrowser->get Busy() returns VARIANT FALSE result) the adapter can leave the corresponding S-state in the UBA. Finally, the browser window can be closed using the Quit() method and the underlying browser object/instance can be destroyed by means of the Release() method. Due to the integration of a real browser we expect to obtain more realistic values for abstract workload characteristics than the concrete values of metrics obtained using the browser imitation approach. Further, in case new types of embedded objects will be developed and used in Web pages in future, the integrated browser will most likely provide support for them and thus minimize the effort needed to extend the UniLoG.HTTP adapter. However, a substantial effort can be required to implement algorithms for the estimation of concrete values for the abstract workload characteristics of Web sites defined in Sec. 4.3.1. Because the portions of the source code required to set the measurement points are not easily accessible, the workload characteristics must be obtained from the traffic which is induced by the HTTP requests and captured in an appropriate format for further processing. This topic is discussed in detail in section 9.3.

9.3. Construction of the Pool of Web Sites As it has already been stated in Sec. 9.1 a repository (a pool) of Web pages each identified by the corresponding page URL must be provided in order to take the UniLoG.HTTP adapter in operation. Because abstract Web workload characteristics (cf. Sec. 4.3.1) can be used in the definition of abstract HTTP requests, each pool entry must contain concrete (e.g., measured or

228

9. Generation of Web Workloads

estimated) values of these abstract workload metrics in order to enable the allocation of a corresponding real HTTP request for each abstract HTTP request. In order to generate realistic and representative Web workloads such a repository must fulfil the following requirements: Comprehensive: the repository should contain a large number of pages (e.g., in the order of 106 entries) in order to provide a large number of possible combinations of values for the abstract workload characteristics. In this way we aim to increase the goodness-of-fit of the “best possible match” returned by the request allocation algorithm presented in Sec. 9.1. Representative: the pool should consist of pages which are frequently visited by the real HTTP user agents. The traffic induced by the HTTP requests to such popular Web pages will be rather more realistic than the traffic induced by requests to less popular or rarely used Web pages. Stable: the characteristics of workload (and traffic) induced by the HTTP requests to the pages from the pool should keep constant over a certain period of time (e.g. at least during one day). In the ideal case, the values of workload characteristics in the pool entries should not be dependent on the time and geographical place (“vantage point”) of their measurement. To approach such behaviour we can include the URLs of the main landing pages of Web sites which are supposed to change not very frequently over time. However, there are some counterexamples, e.g., the main pages of the news sites may change many times a day. Up-to-date: as it will rather hardly be possible to achieve an absolute stability of the pool, it should be at least assured that the pool entries (to be precise the values of the abstract workload characteristics) can be kept up-to-date (e.g., by means of a corresponding pool update function). This will secure that the traffic and workload induced by the HTTP requests from the UniLoG.HTTP adapter will correspond to the values of abstract workload characteristics from the pool entries (and thus correspond to the experimenter’s specifications in the abstract HTTP requests). Recall from [KoW11] that the browser imitation version of the adapter makes use of active measurements for the estimation of abstract workload characteristics numberOfObjects, requestSize, replySize, and inducedServerLoad as the points of browser code needed for the corresponding measurements are easily accessible. In contrast, when the UniLoG.HTTP adapter uses an integrated real browser (e.g., Firefox or Internet Explorer), portions of the browser code needed for

9.3. Construction of the Pool of Web Sites

229

measurements of abstract workload characteristics are not easily accessible. Web development tools coming, e.g., with the Firefox or IE browser also do not provide such functionality. Therefore, the estimation cannot be made internally in the browser and must be done externally, e.g., on the basis of the HTTP traffic generated during the page delivery. The induced traffic must be captured and stored appropriately (e.g., by means of an external traffic capture and analysis tool like Wireshark / Tshark) in order to analyse its characteristics.

9.3.1. Estimation of Abstract Workload Characteristics In order to obtain concrete values of the abstract workload characteristics an approach of actively initiated observations similar to the method applied, e.g., in [Cha10] or [BMS11], has been used in this thesis. In this approach, a defined set of URLs is used to make the browser download everything that is required to display the corresponding Web pages. Each time a single Web page is requested, all traffic sent and received by the client machine (i.e., including the requests to all embedded objects linked into the page from its origin and non-origin servers) is captured and stored in a Packet Capture file (a separate file is used for each page). The main advantages of this approach are, among others [Cha10]: 1. it allows to correctly associate the captured traffic to the Web pages requested, 2. enables full access to all transmitted content without the need of anonymization that comes with real user traces, and 3. it yields valuable insight into the workload properties of Web applications. Consider that the workloads produced by requesting only the home pages identified by the predefined set of URLs are always artificial and do not represent any user group’s real usage of the Web. In contrast, passive observations of HTTP traffic from real users may create potentially more representative and realistic workloads when the measurements are conducted at the network nodes with high concentration of user traffic (for example, at a router or gateway in a CDN [IhP11], in a data center [EGRSS11], or at a Broadband Remote Access Server (BRAS) connecting DSL customers to the ISP’s backbone [SAMFU12]). However, different privacy and security issues may often arise when results of passive measurements are to be made public or handed over to a third

230

9. Generation of Web Workloads

party and the corresponding traffic traces become subject to anonymization. Dependent on how the anonymization is exactly done, more or less information will be removed from the trace so that the modelling process can be hardly hindered. For example, when TCP payload is completely removed from the trace (which may be also due to capacity reasons as it is the case with CAIDA traces [CAIDA]), HTTP headers will not be available to our modelling and the numberOfObjects and inducedServerLoad characteristics cannot be reliably estimated (as the information at least from the HTTP header fields http.request.method and http.response.code is required). In a set of traces available from the Waikato Internet Traffic Storage (at the WAND network research group at the University of Waikato in New Zealand [WITS]), TCP packets are truncated four bytes after the end of the transport header. This would just enable us to recognize packets containing the HTTP GET request methods, but will still hinder to recognize the HTTP response codes (as they will be truncated by the anonymization procedure). There were attempts in HTTP traffic analysis to identify the HTTP request/response pairs in TCP streams without the HTTP headers available and using the information from the TCP header only (e.g., [ViV09]). However, this approach works reliably only in case of older HTTP/1.0 traffic and requires strong assumptions on timing of the request/response pairs in order to identify them in case of persistent HTTP/1.1 connections. Remember that in HTTP/1.1 many request/response pairs can be transmitted over the same TCP connection and the client can send the next GET request before it has received the corresponding server response when pipelining is activated. The authors of [SAMFU12] note that nearly 60% of all HTTP requests are persistent and demonstrated that by ignoring pipelined and persistent HTTP/1.1 connections the number of HTTP requests is highly underestimated (because only the first HTTP request sent over each TCP connection will be considered) and can lead to significant errors in HTTP traffic analysis and modelling. The issues with HTTP traffic analysis presented in [SAMFU12] motivated us to pay a special attention to the choice of the appropriate format for the traffic capture and report file to be used for the estimation of our abstract Web workload characteristics. For example, the authors in [BMS11] used a Firefox-based measurement agent in connection with the Firebug extension (ver. 1.7X.0b1) and the Net:Export (ver. 0.8b10) and Firestarter (ver. 0.1.a5) add-ons to automatically export a log of all the requests and responses involved in rendering a Web page. The Firebug extension generates a report in the HTTP archive record (HAR) format [Odv15] that provides a detailed record of the actions performed by the browser in loading the page.

9.3. Construction of the Pool of Web Sites

231

In particular, all request/response pairs are identified and disclosed in an HAR report while the responses are associated with the corresponding requests. However, only the information from HTTP headers (e.g., content length, content type) is included in the report details and the information from the TCP header is not available. According to [SAMFU12], the fields in the HTTP header can very often bear wrong values and thus lead to potential measurement and modelling errors. For example, the number of bytes reported in the content length header field was at least 3.2 times higher than the number of actually transferred bytes and the content type header was different for 35% of the HTTP volume as compared to the measurements in [SAMFU12]. For these reasons, we dropped the HAR format despite its convenience and decided to use PCAP capture files (which contain information from Ethernet, IP, TCP, and HTTP headers) as a primary input for the estimation of our abstract workload characteristics. By using PCAP just from the beginning, we also alleviate the use of results of passive measurements publicly available in PCAP format as input for our estimation algorithms to be extended for this capability in the future (we aim here in the first place at the traffic traces available from CAIDA [CAIDA] and WAND network research group [WITS]). The modules involved into the analysis of page workload characteristics are presented in Fig. 9.3. Recall that the AdapterThread creates a new RequestThread which has a corresponding instance of the browser object on its disposal in order to execute real HTTP request. In the browser integration approach we extend the RequestThread by a capability to start and stop the network analyser TShark for capturing the induced HTTP traffic. TShark is initialized and started immediately before the RequestThread creates a new browser object via the IWebBrowser2 interface. The RequestThread monitors the state of the browser object in order to track the progress of page loading and terminates the traffic capture with TShark after the requested page has been loaded completely. Notice that the result of traffic capture with TShark is a raw PCAP capture file containing all traffic passing the Ethernet interface at the Web client. Therefore, we have to take the necessary precautions (e.g. switch off the software and OS updates, etc.) to avoid that the traffic which has not been induced by loading the requested Web page will be included into the capture file. Further, the algorithms for the estimation of abstract Web workload characteristics of the page do not need all the information from the raw PCAP file (e.g., the HTTP payload is not needed as we do not have to parse the source code of the page in our analysis) and only a set of

232

9. Generation of Web Workloads (from the Generator)

(to the Generator) EQ (tj, ej): (logical time, abstract reaction)

RQ (ti, ri): (logical time, abstract request)

(tj, ej)

(ti, ri)

Semantics:

UniLoG.HTTP Pool

ri (a1,a2, …, an)

update characteristics Pool Updater

calculated characteristics Analyzer

header field information PCAP report file

Adapter Thread

(ti, ri) : (logical time, abstract HTTP request)

(ti, ri)

(ti, r*i) : (logical time, real HTTP request) (t*i, r*i) : (physical time, real HTTP request) (t*j, e*j) : (physical time, real HTTP system reaction) (tj, e*j) : (logical time, real HTTP system reaction)

Request and Event Mapper

(tj, ej) : (logical time, abstract HTTP reaction message (ti, r*i)

(tj, e*j)

Request Thread #N

(t*i, r*i)

(t*j, e*j) Web server (WS1)

Browser Engine (e.g. IE) HTTP requests

filtering raw PCAP

(tj, ej)

Web server (WS2)

HTTP responses

Web server (WSn)



HTTP service provider TCP/IP/Ethernet

Internet

Tshark traffic capture Ethernet/IP/TCP/HTTP packets (requests) Ethernet/IP/TCP/HTTP packets (server responses)

Figure 9.3.: Modules involved into the analysis of induced HTTP traffic and estimation of the abstract Web workload characteristics for Web pages in the UniLoG.HTTP adapter (own Fig.).

selected IP, TCP, and HTTP header fields is required for our algorithms. Thus, the RequestThread instructs TShark (by means of the corresponding configuration of the TShark call) to generate a reduced report file which contains only the following header fields from the original raw PCAP capture file: frame.number: the unique number of the Ethernet frame from the raw PCAP file. frame.time relative: the time stamp from the Ethernet frame header specified relatively to the first frame in the capture file. ip.src: the source IP address from the IP packet header. ip.dst: the destination IP address from the IP packet header.

9.3. Construction of the Pool of Web Sites

233

ip.proto: the transport protocol number (e.g., ip.proto = 6 for TCP) from the IP packet header. The field will be used to filter out non-TCP traffic from the PCAP report file. tcp.srcport: the source TCP port number from the TCP packet header. tcp.dstport: the destination TCP port number from the TCP packet header. The fields ip.src, ip.dst can be used in conjunction with tcp.srcport and ip.dstport to identify HTTP traffic from or to a specific Web server or host. tcp.len: the TCP payload length from the TCP packet header (used in the estimation of the replySize and requestSize characteristics). tcp.seq: the TCP sequence number from the TCP header. tcp.ack: the TCP acknowledgement number from the TCP header. The fields tcp.seq and tcp.ack can be used to remove packet duplicates from the PCAP report file. http.request.method: the HTTP request method to be used to serve the request on the Web server (e.g., GET, POST, PUT; etc.). This field is filled only for packets which bear the HTTP requests. http.response.code: the HTTP status code from the server response. The field is occupied only for packets transferring an HTTP response from the Web server. http.request.method and http.response.code fields are indispensable to reliably recognize the request/response pairs contained in the captured TCP streams and to determine the numberOfObjects and inducedServerLoad characteristics. Note that the set of header fields included into the PCAP report file (see the list described above) can be easily extended (e.g., when information from the HTTP header fields Content-Type, Content-Length, or Content-Encoding are to be taken into account, with restrictions discussed in [SAMFU12]) by means of the corresponding command line parameters in the call to TShark. After the TShark capture process has been terminated the generated PCAP report file is used for the input to the Analyser component (cf. Fig. 9.3) which executes the algorithms for the estimation of workload characteristics described in the following. The results of the estimation algorithms are transferred to the Pool Updater which refreshes the corresponding fields from the page entry in the pool.

234

9. Generation of Web Workloads

Let us now describe the algorithms used to estimate the values of abstract workload characteristics requestSize, replySize, numberOfObjects, and inducedServerLoad for each requested Web page. Note that the algorithms expect a PCAP report file generated by TShark as input and process this file line by line (i.e., packet by packet). Before the main processing can begin, a preprocessing procedure is executed in order to ensure that the following preconditions are met for the packets in the report file: 1. As the UniLoG.HTTP adapter has to deal with the HTTP workloads, only the TCP packets (marked by the ip.proto = 6 field value in the IP header) are considered from the PCAP report file. That means, that the UDP packets bearing, e.g., the DNS queries and responses are filtered out. Further, we sort out the non-HTTP traffic (from the TCP packets captured at the Web client) by extracting only the TCP packets to and from the well-known TCP port numbers 80, 3128, 3132, 5985, 8080, 8088, 11371, 1900, 2869, and 2710, which are usually used to offer HTTP and HTTP Secure (HTTPS) services on Web servers. Note that large HTTP packets can be fragmented into a number of TCP segments and the TShark packet dissector for HTTP marks only the first of these segments as an HTTP packet. So, the computation of replySize and requestSize cannot simply rely on the PCAP HTTP filter and must inspect all underlying TCP segments from the PCAP report file. 2. The workload characteristics of Web pages should be determined for the case of a correct page transmission. In case of transmission errors, missing, incorrect or duplicate packets will be handled by TCP internally in order to guarantee for reliable delivery. But the duplicate TCP segments and ACK segments generated by the TCP instance at the Web client (or also at the Web server) to indicate missing or incorrect TCP packets can still remain in the PCAP report file because PCAP captures all (Ethernet) packets from the network interface layer in the Web client. 3. In order to remove duplicate TCP segments, we establish a list of packets for each TCP connection where each packet is uniquely identified by the key comprising the packet’s destination IP address, source and destination TCP port numbers, and TCP sequence (SEQ) and acknowledgement (ACK) numbers. For each TCP packet from the PCAP report file, the packet’s key is added to the list only if it is not already contained in the list. Otherwise, the packet is rejected and not considered in further calculations.

9.3. Construction of the Pool of Web Sites

235

4. TCP segments with no content (i.e., with zero payload) are eliminated. This helps us to establish per-connection lists of TCP packets which are used primarily for the data transmission and not for the protocol and/or error control in TCP. Further, this operation assures that no two TCP packets per connection have the same combination of SEQ and ACK numbers. For example, consider that the TCP SEQ number is not changed when the last generated TCP segment had no content. Such empty TCP packets can be generated to acknowledge a receipt of a number of bytes (segments) when the receiving instance itself does not have any data (bytes) to send (which is very often the case with Web page downloads). If we added such an ACK segment to the packet list, the subsequent TCP segment with content relevant for our calculations (e.g., the next generated GET request) would be rejected by the preprocessing procedure because its key would be identical to the key of the ACK segment (the SEQ number remains unchanged because the last generated ACK had no content and the ACK number will also remain unchanged when no packets were received since the last generated ACK segment). In summary, the preprocessing procedure ensures that only the TCP packets which are relevant for the computation of page characteristics are left in the report file by means of the following operations: 1. Filtering out the packets which are no TCP segments or do not represent HTTP traffic, 2. Removing TCP packets with zero length (e.g., “stand-alone” ACK segments generated by the TCP instance in the Web client or in the Web server which are not attached (piggy-backed) to the data segments), 3. Sorting out duplicate TCP segments (whose key is already contained in the packet list used for each TCP connection found in the given PCAP report file). After the preprocessing is completed the packets are forwarded to the main part of the algorithm for the calculation of page characteristics. Consider that the preprocessing procedure will have to be extended for the computation of the inducedServerLoad characteristic (see explanations in the following). requestSize is implemented as a counter accumulating the total amount of data (in byte) sent by the Web client to the server (or the servers) involved into the delivery of a Web page. TCP packets used to transmit the HTTP

236

9. Generation of Web Workloads

GET requests (and also POST requests when form data is submitted) from the client to the server(s) contribute to the requestSize. So, after the next TCP packet has passed the preprocessing step we have only to check whether its source IP address is equal to the local IP address of the Web client. If yes, we add the length of its payload (specified in the tcp.length field of the TCP header) to the requestSize counter. Consider that in this way the value of the requestSize counter becomes the total amount of data having been sent by means of different HTTP request methods (e.g., GET, POST, etc.) to the origin and also to the non-origin servers of the page, so that the objects linked into the page from other (non-origin) servers are also considered. replySize is computed as the total amount of data (in byte) loaded by the Web client during the retrieval and rendering of a Web page including the size of all its embedded objects linked from origin and/or non-origin servers. The calculation of the replySize is similar to the calculation of the requestSize, but we have to consider only the incoming TCP traffic at the Web client. Again, we can assume that only TCP packets used for the transmission of requests to the page, its embedded objects and the corresponding replies pass the preprocessing step. For each such TCP packet, we compare its destination IP address with the local IP address of the Web client. If they are identical, the length of the payload (from the tcp.length header field) is added to the replySize counter. Note that we cannot simply take the value from the http.content length header of the HTTP server response because this field can contain values which strongly differ from the actually transmitted amount of HTTP data [SAMFU12]. numberOfObjects has been determined by the analysis of the HTML source code of the page in the browser imitation approach. In the current browser integration approach we analyse the HTTP requests and replies from the PCAP report file used to retrieve each of the linked objects in the page. The number of GET requests generated by the client or the number of the corresponding server responses with HTTP/1.1 200 OK status code can be used to determine the value of numberOfObjects. We decided to accumulate the number of GET requests in order to consider also the embedded objects which are temporarily not available at the moment of our measurement (linked, e.g., from a non-origin server which is not active so that status codes other than HTTP/1.1 200 OK are obtained by the client). For each outgoing packet (ip.src is equal to the client’s local IP address)

9.3. Construction of the Pool of Web Sites

237

from the PCAP report file we check if the http.request.method field contains a GET request method. If yes, we increment the numberOfObjects counter by one. After all packets in the report file have been processed, we decrement the accumulated number of GET requests by one in order to exclude the first GET request to the main page object itself (as it cannot be considered as an embedded or linked object). Also here, we follow the suggestions from [SAMFU12] for analysis of persistent HTTP traffic and parse the whole TCP stream of each connection in order to find the HTTP request/response pairs transmitted over it. Not considering the fact of persistent connection may lead to significant underestimation of the number of GET requests [SAMFU12]. inducedServerLoad requires a more complex procedure for its computation. Generally we will not be able to measure the processing time required to serve HTTP requests directly on the Web server(s) involved into the delivery of the page. However, we can measure the request service time Δt between the generation of a GET request and the receipt of the corresponding server reply HTTP/1.1 200 OK at the client for each request/response pair (on each connection to each of the involved servers). The request service time Δt contains not only the time tISL needed by the server to process the GET request and to generate a corresponding reply, but also the round trip time tRT T between the client and server. Assuming that the round trip time tRT T can be properly estimated, we can use the difference between the measured request service time Δt and the round trip time tRT T as an approximation for the server load tISL induced by the GET request in each request/response pair, i.e. tISL = Δt − tRT T . In order to accumulate the server load induced by the request/response pairs transmitted over the same TCP connection, the computation algorithm creates a FCFS queue for HTTP requests and responses in the corresponding entry of the connection list (cf. Fig. 9.4). Each connection is uniquely identified by the client IP address and port number and the server IP address and port number (however, the client IP address is fixed and the server port number is always one of the well-known TCP port numbers used for HTTP(S) services). Further, if the current TCP packet from the report file contains a GET request (which is specified in the http.request.method field), its time stamp (e.g., trequest ) is inserted at the end of the queue for the corresponding connection (over which this request has been sent). If the current TCP packet contains an HTTP server response code (specified in the http.response.code field), the time stamp of the corresponding GET request is taken (and removed) always from the head of the FCFS

238

9. Generation of Web Workloads

connection client local tcp.srcport = 1726

server ip.dst = aaa.bbb.ccc.ddd tcp.dstport = 8080

Web page www.example.com ip.src =

server ip.dst = eee.fff.ggg.hhh tcp.dstport = 80 server ip.dst = iii.jjj.kkk.lll tcp.dstport = 80

server ip.dst = mmm.nnn.ooo.ppp tcp.dstport = 8080

connection client local tcp.srcport = 1727 connection client local tcp.srcport = 1728 connection client local tcp.srcport = 1769 connection client local tcp.srcport = 1770 connection client local tcp.srcport = 1772 connection client local tcp.srcport = 1774 connection client local tcp.srcport = 1775

connection (tcp.srcport = 1775) on TCP connection establishment: measure tSYN, tACK/SYN calculate tRTT = tACK/SYN - tSYN

for each request / response pair on this connection:

tISL= tRESPONSE - tREQUEST - tRTT inducedServerLoad = inducedServerLoad + tISL

Figure 9.4.: List of the connections established to the server(s) which are involved into the delivery of the page www.example.com (own Fig.).

queue from the corresponding connection. The server load induced by the request/response pair is calculated as tISL = tresponse − trequest − tRT T where trequest is the time stamp of the generated GET request, tresponse is the time stamp of the received server response code, and tRT T is the estimated round trip time between the client and the particular server. Notice that in HTTP/1.1 the server must generate the response messages in the order of generation of the corresponding GET requests (FCFS discipline), so that the described procedure is valid even for the case when pipelining is used. The value of tISL obtained for the last request/response pair from the report file is added to the separate per-connection inducedServerLoad counter which accumulates the server load induced by all request/response pairs transmitted over the same connection (cf. Fig. 9.4, bottom right). After all packets from the report file have been processed, we obtain the inducedServerLoad for a particular server by cumulating the values of inducedServerLoad counters for all its connections. Finally, the inducedServerLoad characteristic for the complete Web page is obtained

9.3. Construction of the Pool of Web Sites

239

by adding together the accumulated values of inducedServerLoad counters for all servers involved into the delivery of the page. The remaining issue is the estimation of the round trip time tRT T for each connection. We decided to estimate the round trip time for each connection separately even when many connections to the same server are used concurrently. The time stamps of the TCP SYN and ACK/SYN segments used by the TCP client and server to establish a new TCP connection (“three-way handshake”) are used to approximate the round trip time tRT T to the server as tRT T = tACK/SY N − tSY N (cf. Fig. 9.4, bottom left). Because the TCP SYN and ACK/SYN segments have zero payload length, they will be removed from the report file at the preprocessing step. So, we have to extend the preprocessing procedure by the calculation of tRT T for each connection found in the report file immediately before the TCP segments of zero length are removed. For each outgoing TCP segment with the SYN flag set (TCP connection request) we create a new connection in the per-server connection list (which is also used for the removal of duplicate packets in the preprocessing step), store the time stamp tSY N of the generated TCP SYN segment in the connection object and put the connection in the “SYN sent” state. For each incoming TCP segment with the ACK/SYN flags set (TCP connection acknowledgement), we identify the corresponding connection from the per-server connection list and, if this connection is already in the “SYN sent” state, we calculate the round trip time tRT T as the difference between the time stamp tACK/SY N of the TCP ACK/SYN segment received from the server and the time stamp tSY N of the TCP SYN segment generated by the client. The calculated round trip time tRT T is stored in the connection object for later use in the estimation of the inducedServerLoad characteristic (cf. Fig. 9.4, bottom left). We notice that the load induced by DNS requests to the DNS server(s) is explicitly not considered in the computation of the inducedServerLoad characteristic (and the corresponding DNS traffic is removed from the PCAP report file at the preprocessing step). numberOfServers is determined at the preprocessing step by accumulating the number of unique hosts (each identified by its corresponding IP address) involved into the delivery of a page in the corresponding numberOfServers counter. Each time a new TCP connection is found in the report file (see the procedure for the estimation of inducedServerLoad), we check if any connection to the corresponding host does already exist in the list of connections before we create a new connection object and insert it into

240

9. Generation of Web Workloads

the list. In case the host was unknown to the preprocessing procedure, we found a new host involved into the delivery of the page and the value of the numberOfServers counter is incremented by one. It is also possible to use routing information (e.g. from the OIX route-views BGP table [Mey15]) in order to identify the number of unique network prefixes and autonomous systems (ASs) involved into the delivery of the page, if needed (cf. [Cha10]). Number of TCP connections established to the server(s) during the delivery of the page may be also introduced as a further new abstract workload characteristic (e.g., numberOfConnections). Finally, we notice that the described algorithms for the computation of the requestSize, replySize, numberOfObjects, and inducedServerLoad characteristics need only one single read cycle through the PCAP report file.

9.3.2. Measurement Results In this section we present the construction of a repository of Web pages using the Alexa Web statistics service [Alexa] which ranks Web sites according to their popularity among a large number of users. The use of Alexa service allows to optimally fulfil the requirements to our repository specified in the beginning of Sec. 9.3, in particular: Comprehensive: Alexa Web statistics service daily generates a ranking list of 1 million most popular sites which seems to be sufficiently large to construct a comprehensive pool for UniLoG.HTTP. The list is provided for free download from the Alexa’s Web site and can be used to initially populate the repository with the URLs of the corresponding home pages. Representative: because the Alexa ranking includes the most popular Web sites the traffic generated by requesting the corresponding page URLs is representative of a large number of users. However, the Alexa’s sample considered in the ranking is restricted to a set of users which were able to install the Alexa’s toolbar required to gather the Web traffic statistics. Stable: the restriction to include in the ranking only the URLs of home pages for Web sites guarantees for some degree of stability in the resulting pool (in respect of reachability of the URLs and validity of the corresponding pool entries). Further, this restriction allows simple replication of the measurement results in other places and at other times and does not introduce personal bias with respect to navigation preferences.

9.3. Construction of the Pool of Web Sites

241

Up-to-date: in order to keep the pool entries up-to-date a corresponding pool update function has been implemented in the UniLoG.HTTP adapter using the browser integration approach (cf. Sec. 9.2.2) and the algorithms for the estimation of abstract Web workload characteristics presented in Sec. 9.3.1. Measurements using the new pool update function have been conducted in [Wei12] using the Deutsche Telecom Entertain VDSL Internet connection with a maximum available data rate of 16 Mbit/s in downstream and 1024 kbit/s in upstream direction. In the following, we present a summary of the pool update results and compare them to the results of measurements obtained in [KoW11] using the browser imitation approach. First, in order to obtain results which can be appropriately compared to the results in [KoW11], the empty repository was initialized with the URLs of the top 1000 home pages from the Alexa ranking dated May 3, 2010, because this version has been used in [KoW11]. Next, the UniLoG.HTTP adapter was instructed to traverse the entire pool five times using the UBA model for the HTTP user agent presented in Fig. 4.19 (see Sec. 4.3.1). In order to force the retrieval of each page from the pool, the values of the ObjectURL attribute of abstract GetPage requests to be generated in the R-state RGetP age were specified to be taken from the trace containing exactly the same 1000 top sites from the Alexa’s ranking used to initialize the repository. During the execution of this UBA each abstract GetPage request induced a retrieval of the corresponding Web page from the pool while the pool update function (consisting of the traffic capture with TShark and the algorithms for the estimation of abstract workload characteristics) was activated. After each page has been retrieved exactly five times, the mean values of the obtained abstract workload characteristics per page (requestSize, replySize, numberOfObjects, inducedServerLoad, etc.) have been stored back into the corresponding pool entries. For a total of 837 out of 1000 pages the values of abstract workload characteristics have been successfully estimated and updated in the repository. Compare that in March 2010 a total of 979 pages have been successfully loaded and the corresponding characteristics have been updated in the repository. As we used the Alexa’s ranking list from May 2010, the corresponding Web sites may have been switched off or moved to other URLs in the meantime, or they could also have been just temporarily unavailable at the time of measurement. Further, consider that the Alexa ranking contains also some pages used only for Web statistics (“Webbugs”) or user tracking so that the corresponding sub-domain or Web page to be loaded are missing. The estimation of the page characteristics is also not

242

9. Generation of Web Workloads

Characteristic

Mean

Median

Maximum

Total

pages requested pages requested, old

– –

– –

– –

837 979

requestSize [Byte] requestSize, old [Byte] replySize [Byte] replySize, old [Byte] numberOfObjects numberOfObjects, old

35.706 23.897 805.345 905.613 70,91 38,58

26.849 15.912 597.497 285.534 57 25

244.942 212.364 5.148.080 40.655.464 429 345

29.885.897 23.394.924 674.074.112 886.594.719 59.352 37.248

numberOfServers numberOfConnections

15,1 47,77

12 41

77 206

12.642 39.984

Table 9.1.: Summary of measurement results for abstract Web workload characteristics of Web pages, cf. [KoW11, Wei12] (own Tab.).

possible in case when the site caused a critical error in the browser during the download, e.g., due to violations of the XHTML or W3C standards in the page source code. The mean value of the requestSize characteristic has been estimated to 35.706 Byte compared to 23.897 Byte in the old measurement (cf. Tab. 9.1). The reasons for the increase of the requestSize value are twofold. First, the real browser provides support for a wide set of (actually, nearly for all possible) types of linked objects. So, compared to the browser imitation approach more requests per page are issued to retrieve the linked objects. Second, the value of requestSize is now directly measured using the HTTP traffic data and is no longer based on the estimation using fixed values for the size of the request to the main page and the size of the requests to the embedded objects (as it was the case in the browser imitation approach [KoW11]). The new median and maximum of the requestSize (26.849 Byte and 244.942 Byte) are also greater than the corresponding median and maximum of the requestSize in the old measurement (15.912 Byte and 212.364 Byte). The total amount of data sent with the requests yields to 29.885.897 Byte and thus exceeds the total size of the requests in the old measurement (23.394.924 Byte) although more sites have been requested (979 pages in the old vs. 837 pages in the new measurement). The mean value of the replySize characteristic (805.345 Byte) is far below the mean value from the old measurement (905.613 Byte) but the new median (597.497 Byte) is significantly higher than the median in the

9.3. Construction of the Pool of Web Sites

243

old measurement (285.534 Byte). This discrepancy is determined by the maximum value of 40.655.464 Byte for the replySize in the old measurement which is rather too high compared to the maximum value of 5.148.080 Byte for the replySize in the new measurement. Inspecting the affected entries in the repository based on the old measurement and comparing them with the corresponding entries in the new repository revealed that in the old measurement a significantly larger amount of data has been downloaded for a number of selected pages (from which the page with the maximum replySize is the most prominent example). There may be several reasons for the observed reduction of the total data amount retrieved during the download of such pages. For example, the provider of the affected site may switch to more storage-saving data formats for the linked objects or activate extended content and transfer encoding techniques (e.g. gzip compression) which may have led to a significantly lower total replySize of 674.074.112 Byte than the total replySize of 886.594.719 Byte at the time of the old measurement when these techniques have not been used by the affected sites. Using the new estimation methods significantly higher values for the numberOfObjects characteristic have been obtained compared to the old measurement (cf. Tab. 9.1). The new mean value yields to 70,91 embedded objects per page while it has been estimated to 38,5 objects per page in the old measurement. The median (57) and the maximum value (429) are also higher than the corresponding median (25) and the maximum value (345) from the old measurement. Even the total number (59.352) of embedded objects in all 837 requested pages is higher than the total number of objects (37.248) in the old measurement although more pages were evaluated (979 instead of 837). The results attest that a considerably higher number of embedded objects can be detected and requested using the new adapter implementation. As a consequence, the employed browser integration approach leads to more realistic Web workloads compared to the rather limited browser imitation approach. The increased number of objects per page has a direct influence on the classification of the requested pages into different complexity classes. Remember that nine complexity classes C1 –C9 have been used in [KoW11] and the class boundaries have been specified by means of two threshold values (o1 = 10 and o2 = 20) for the numberOfObjects characteristic and two threshold values (s1 = 50 kByte and s2 = 150 kByte) for the replySize characteristic. When we used the same class boundaries for the new measurement, the fraction of sites put in the classes C5 , C6 , and C7 would coercively increase and reach almost 85%. In particular, nearly 73% (609 out of 837) of pages

244

9. Generation of Web Workloads

Characteristic Classes ↓



Not assigned L0 (0ms, 10ms] “Immediate” L1 (10ms, 30ms] “Fast” L2 (30ms, 100ms] “Slow” L3 (100ms, ∞] “Annoyingly Slow”

inducedServerLoad, old (browser imitation)

inducedServerLoad, new (browser integration)

328 159 66 120 306

8 812 5 2 10

Characteristic → Classes ↓

inducedServerProcessingTime

P0 (0sec, 1sec] P1 (1sec, 3sec] P2 (3sec, 10sec] P3 (10sec, ∞]

194 173 303 167

Table 9.2.: Number of pages in different classes of inducedServerLoad and inducedServerProcessingTime, cf. [KoW11, Wei12] (own Tab.).

requested in the new measurement would be placed into the class C7 which is specified to include sites with more than 20 objects and replySize larger than 150 kByte (which was the case for ca. 51% (504 out of 979) of pages in the old measurement). For this reason, the threshold values used to specify the class boundaries should be adjusted to create classes with approximately uniform capacity (i.e., uniformly distributed number of pages among the classes). Even more stronger discrepancy between the old and the new measurement can be observed for the inducedServerLoad characteristic (cf. Tab. 9.2). In [KoW11] four different classes L0 –L3 (named “Immediate”, “Fast”, “Slow”, and “Annoyingly Slow”) have been used to characterize the induced server load. The class boundaries have been specified by means of the three threshold values l1 = 10 ms, l2 = 30 ms, and l3 = 100 ms and each page has been assigned to the corresponding class using the minimum of the delays induced on the Web server by the requests to the embedded objects2 . When we kept the class boundaries unchanged, 97% of pages from the new measurement would be assigned to the class L0 (“Immediate”) while only 16% of pages belonged to that class in the old measurement (cf. Tab. 9.2). 2 The

use of the minimum of the delays was motivated by the aim at minimizing the fraction of waiting time in the overall server processing time induced/required to process requests to the embedded objects of the page when objects only from the same / origin server have been considered

9.3. Construction of the Pool of Web Sites

245

However, a large number (328 or 34%) of pages in the old measurement couldn’t be classified because of the failed RTT estimation (which was then based on the ICMP echo request message exchange between the Web client and Web server and failed in case the server didn’t respond to ICMP requests). But the main reason for the classification of the immense fraction of sites (97%) into the class L0 is definitely the use of an integrated real browser engine in the adapter with all its optimizations in the page rendering process (e.g. concurrent TCP connections, pipelining, etc.) and support for analytics, user tracking and add services. In particular, very small objects (so-called “Webbugs”) used for analytics and user tracking can yet also be downloaded from non-origin servers. Because of their tiny size, these objects most likely represent the “fastest” objects (i.e. objects with the smallest delay induced on a server involved into the delivery of the page). And because such objects are included in nearly every Web page today, the minimum delay characteristic used to determine the inducedServerLoad class of the page become heavily distorted (cf. Tab. 9.2). So, when the inducedServerLoad characteristic should be used as before and calculated according to its previous definition, i.e. using the minimum of delays induced by the requests to embedded objects, we definitely have to adjust at least the class boundaries l1 –l3 to take into account the distribution of server delays from the new measurement results. A good alternative would be to extend the definition of the inducedServerLoad characteristic used in [KoW11] to accumulate the (total) processing time induced on the origin and non-origin server(s) by the requests to the main page and all its embedded objects. To avoid ambiguity, the resulting new server load characteristic can be referred to as inducedServerProcessingTime. In the new measurement, most of the sites exhibit a value of inducedServerProcessingTime between 1 and 10 sec and the threshold values p1 = 1 sec, p2 = 3 sec, and p3 = 10 sec seem to be appropriate as class boundaries to provide classes of approximately uniform capacity (see Tab. 9.2). We note that the use of the new estimation algorithms for the analysis of the induced Web/HTTP traffic on the packet layer in the browser integration approach not only improves the accuracy of the measured abstract Web workload characteristics but also allows to obtain completely new workload characteristics. E.g., along with the new inducedServerProcessingTime characteristic mentioned above, concrete values have been obtained for the numberOfServers and numberOfConnections characteristics (as defined in Sec. 9.3.1) from the measurement in [Wei12] (cf. Tab. 9.1, last two lines). In the mean case, 15 servers have been involved into the delivery of the page and 48 TCP connections must be established to retrieve the main page and all its

246

9. Generation of Web Workloads

embedded objects from the origin and also from the non-origin servers. In the extreme case, a maximum of 77 servers were involved into the delivery and a total of 206 connections have been used by the browser to retrieve the page. During the whole measurement, a total number of 12.642 servers have been contacted and 39.984 connections have been established to retrieve 837 pages. Consider that for each of 837 pages we accumulated the number of distinct servers involved into its delivery. In the calculation of the total number of servers contacted in the whole measurement, we have simply summed up the number of servers for each page and haven’t removed the multi-counted servers. The results confirm that concurrent TCP connections are very frequently used by the browser to retrieve Web pages and demonstrate the strong limitation of the browser imitation approach formerly employed in [KoW11] which used a single TCP connection to retrieve the main object of the page and the objects linked from the same server in serial mode (i.e., the objects have been requested one after another without the use of concurrent connections or pipelining). A series of further significant workload metrics could be obtained when we determined the total time required on the client side to load a page (e.g., the page load time loadTime) which is, however, usually dependent on the geographical point of its measurement. Such, a quotient replySize / loadTime would characterize the data rate required on the client to retrieve a page while the concurrent connections to different servers and the inactive periods of the client are considered. Note that, e.g., the quotient replySize / inducedServerProcessingTime would not really make sense (neither as a maximum nor as a minimum boundary) to characterize the amount of data retrieved by the client per second due to the following reasons: • the inducedServerProcessingTime part has been estimated for the server side, • it is independent from the geographical point of its measurement (to be more precise, it is independent from the geographical point of generation of HTTP GET requests used for its measurement), • it does not include the inactive times of the client and the RTT between the client and the server(s). Therefore, the value of the characteristic obtained by simple summation of the processing times induced on each particular server involved into the delivery of the page can strongly exceed the total time needed at the client to load the page.

9.3. Construction of the Pool of Web Sites

247

At this point, a quotient inducedServerProcessingTime / loadTime may deliver a significant measure to characterize the effective use of concurrent TCP connections to servers involved into the delivery of a page. However, there is a discussion in the Web community on the appropriate metrics for the page load times which apparently represent a strong issue. The first (and more technical) problem is that the points of time when the corresponding events are fired by the browser are not fairly well-defined and also vary among different browsers (suggestions include, e.g., “above-the-fold” time, time to first “paint”, time-to-first-byte, etc. [Wel11]). Further, latency measurement of a service based on the latencies of the services that it invokes is a related problem which has been studied in a broader context of performance prediction in SOA-based solutions, e.g., in [MSKGE11]. In this section we demonstrated that the experimenter can create a comprehensive and stable repository of representative Web sites without lot of effort and in short time using the pool update function provided with UniLoG.HTTP. The execution of the pool update function immediately before the actual experiment in the particular test bed is a very recommended measure to ensure that the characteristics of Web workloads generated by the UniLoG.HTTP adapter will correspond to the values of abstract workload characteristics stored in the repository (and thus be conform to the specifications of the experimenter in the UBA). The results of measurements presented in this section are based on the Alexa’s ranking list of popular Web sites which contains the URLs of home pages only and doesn’t include the URLs of pages delivered after some user interactions with the site (e.g., search or navigating activity, submitting a form, etc.). Further, the home pages of an increasing number of Web sites these days may be personalized by means of an automatic log-in procedure (especially, for Web sites of internet shops, booking services, social networks, etc.). The first page shown to the user will be a completely different (and a much more complex) one in case the automatic log-in procedure is activated by the user. Therefore, it would be interesting to clarify whether these facts are significantly biasing the obtained measurement results. These issues have been outside of the scope of this thesis and we leave them for further studies. Some preliminary research can be found, e.g., in [BMS11].

Part IV.

Applications of the UniLoG Load Generator

10. Estimation of QoS Parameters for RTP/UDP Video Streaming in WLANs Wireless Local Area Networks according to IEEE 802.11 standard became very frequently deployed these days. Beginning with laptops, notebooks, and PDAs also other emerging portable and mobile devices, smart phones, tablets, eBook readers, and even (smart) TV sets and video cameras most include the WLAN feature. While the private use of WLANs became nearly ubiquitous in the Home Area Networks (HANs), wireless access points can also often be provided in many corporate offices to enable legitimate access to specific parts of the corporate network (and sometimes also to the Internet) for different types of mobile devices of employees, business partners, or visitors. Furthermore, public access points (so called “hotspots”) are very often provided in various public establishments in many developed urban areas throughout the world, e.g., in congress and conference centres, public libraries, air ports, trains, planes, retails, hotels, coffee shops and shopping malls, etc., in order to enable the visitors or customers to access Internet services using different private devices equipped with WLAN adapters. The maximum data rate supported by the WLAN devices has been steadily increased over different IEEE 802.11 standards (e.g., from 54 Mbit/s starting with 802.11g in a 2,4 GHz frequency band, up to 600 Mbit/s with 802.11n in the 2,4 and/or 5 GHz band, and further up to 6933 Mbit/s with 802.11ac, of course dependent on the chosen bandwidth of the channels, modulation, and number of antennas and data streams being used). Therefore, it became possible in the meantime to use VoD services with high quality video (and audio) content also in WLANs. However, a number of different WLAN users quite often share a single wireless access point and thus (at least with the IEEE 802.11g WLAN in a single 2.4 GHz frequency band) they also share the available data rate in the cell spawned by the access point as well, so that a certain background load may exist from the point of view of each particular streaming user. © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_10

252

10. Estimation of QoS Parameters for RTP/UDP Video Streaming

For the case study in this section we picked up a video streaming scenario which is quite often taking place in home area or small business (corporate) networks. In particular, we observe a set of quantitative characteristics of an RTP video stream transmitted under different IP background loads in WLAN. The VoD server is located in the wired part (segment) of the experimental network and can provide video films for streaming in HD quality (720p or 1080p). The consumer of the video stream is located in the wireless part (segment) of the experimental network at the “VoD client” station. In this scenario, traffic from other service users in WLAN (for example, navigating Web pages, reading email, watching videos, or transmitting data back-ups to the back-up server in the wired network segment) is modelled using a number of UniLoG load generators which produce different IP background load streams in WLAN. The case study should help us, among others, to answer the following questions: 1. At which level of IP background load in WLAN does the quality of the observed RTP/UDP video stream degrade significantly? 2. Does the type of background load play an important role (e.g., CBR vs. VBR background load, load consisting of “small” vs. load consisting of “large” IP packets, etc.)? 3. At which level of IP background load does the streaming service become “unusable”?

10.1. Experimental Network The experimental network for the case study consists of a WLAN segment and the Fast Ethernet LAN segment (cf. Fig. 10.1). The Fast Ethernet segment is established using the 100 Mbit/s Fast Ethernet D-Link DGS1008D switch. The WLAN segment is provided by the IEEE 802.11g WLAN access point D-Link DWL-2100 AP. The access point supports a maximum data rate of 54 Mbit/s. It has been configured to use channel number 1 in the 2.4 GHz band (while OFDM modulation has been used) and to transmit frames with the maximum allowed signal strength (100 mW). The distance between the access point and the VoD client, between the access point and the load generators, and between each two of the three WLAN stations has been chosen such, that the hidden station problem shouldn’t occur in the WLAN segment. This allowed us to switch off the request-to-send /

10.1. Experimental Network

253

clear-to-send (RTS/CTS) mechanism on each WLAN station and the access point itself (by setting the corresponding threshold value for the RTS/CTS function to 2346 byte). VoD server ( LIVE555 Media Server)

100 Mbit/s

BBB stream (RTP/UDP)

Load sink (UniLoG.IP)

100 Mbit/s Fast Ethernet (D-Link DGS-1008D Switch)

100 Mbit/s

100 Mbit/s

background IP load streams #1, #2, #3

54 Mbit/s IEEE 802.11g (D-Link DWL-2100 AP)

background IP load streams #1, #2 background IP load stream #3

BBB stream (RTP/UDP)

Load generator 2 (UniLoG.IP)

Load generator 1 (UniLoG.IP)

VoD client ( VLC Media Player 1.1.9)

Figure 10.1.: Experimental network: 100 Mbit/s Fast Ethernet, 54 Mbit/s IEEE 802.11g WLAN, transmission of the BBB video stream by means of RTP over UDP (own Fig.).

The VoD server is located in the 100 Mbit/s Fast Ethernet segment and uses the LIVE555 Media Server software (freely available at http://www. live555.com/mediaServer/) to provide the “Big Buck Bunny” (BBB for short) video stream available from the “Peach open movie project“ [BBB] at www.bigbuckbunny.org in HD quality (1280x720p) using the frame rate of 24 Hz and the GOP pattern of (24,2). The video stream has been compressed using the H.264/AVC encoding scheme and encapsulated (transcoded) into the corresponding MPEG transport stream container file (.ts) for the network transmission. The LIVE555 Media Server is responsible for the encapsulation of the video and audio frames into RTP packets and can deliver the resulting stream by means of both UDP and TCP protocols.

254

10. Estimation of QoS Parameters for RTP/UDP Video Streaming

The streaming user is located in the WLAN segment on the “VoD client” station and makes use of the “Video LAN Client” (VLC) media player software (freely available at http://www.videolan.org/vlc/) to receive, decode, and playback the video stream. Consider that in this case study the VLC media player on the VoD client side has been instructed to request the BBB video stream over the UDP protocol (RTP over UDP delivery). The video streaming software in the VoD client makes use of the SDP protocol (RFC4566) to specify the initialization parameters of the streaming session and to describe the requested streaming media. Further, it uses the RTSP protocol [RFC2326] to control the streaming session. The server part of the video streaming software (VoD server) uses the Session Description Protocol (SDP) protocol to locate the streaming media requested by the VoD client and the RTP protocol [RFC3550] to encapsulate the frames from the video and audio streams into RTP packets. The resulting RTP packets are injected into the experimental network as UDP datagrams at the transport layer in the VoD server (i.e., the RTP over UDP encapsulation/delivery variant is used). The resulting RTP video stream consists of a total number of 14314 video frames, the total size of the video data is 421 MByte, and the total duration of the video playback (without any background load) is 9 min 56 sec. Traffic generated/induced in the observed WLAN by users of different services in our experimental network (e.g., watching videos, transmitting data back-ups to the back-up server in the wired network segment, or simply navigating Web pages or reading email) is modelled by means of two UniLoG load generators used in combination with the corresponding UniLoG.IP adapters. The load generators (and the UniLoG.IP adapters) are installed at the hosts “Load generator 1” and “Load generator 2” in the WLAN segment. Parts of the UniLoG.IP adapter required to model the peer IP service user (e.g., to provide a receiver for the generated IP background load streams) have been installed at the “Load sink” host in the Fast Ethernet segment. Consider that the UniLoG.IP adapter components installed in the “Load sink” do not generate any IP packets during the experiments. The VoD client receives the BBB RTP/UDP video stream from the VoD server over the WLAN while the “Load generator 1” and “Load generator 2” inject background IP load streams into the WLAN targeting the “Load sink” in the Fast Ethernet segment. Hence, in this case study configuration, the VoD client stays in contention with the load generators when accessing the shared wireless communication media (i.e, the WLAN channel number 1) by means of the CSMA/CA protocol. The network and protocol analysis software TShark (freely available at

10.2. Configuration of the Background Load

255

www.wireshark.org) has been used on the VoD client side in order to capture the RTP/UDP/IP video traffic data received by the VLC video player and to obtain a set of quantitative metrics (described later in Sec. 10.3) for the quality of the transmitted video film on the basis of the captured traffic. The “VoD client” station represents the “central coordination point” in this case study and, therefore, we installed the UniLoG management station also at the VoD client in order to remotely control the UniLoG.IP load generators and to be able to execute the scripts for the analyses of the captured video traffic to obtain the QoS metrics. The experiments have been conducted in the laboratory of the TKRN research group at the University of Hamburg during the working hours of the university staff (from about 9 a.m. to 6 p.m.). It should be noted that beacon frames from other eleven active access points have been observed in the laboratory area. Three of these access points transmitted data on the same channel 1 as the access point D-Link DWL-2100 AP used in our experimental network. Therefore, we cannot generally exclude the influence of interfering send requests from other WLAN devices during our experiments.

10.2. Configuration of the Background Load The case study covers four series of experiments. In all four series the QoS metrics of the video streaming service are measured at the “VoD client” but a different combination of the IP background load type (CBR or VBR) and the size of the IP payload (50 or 1480 Byte) comprising the IP background load streams is used in each of the series. The size of the IP payload remains constant during the whole experiment series. The choice of the type of the IP background load (CBR or VBR) can be made by means of the specification of the inter-arrival times between the IPv4 packets in the stream to be generated. Such, the inter-arrival times between the IPv4 packets are specified as a constant for CBR loads and according to an exponential distribution for VBR loads. In each of the four experiment series, the type of background load (CBR or VBR) and the IP packet size (50 or 1480 Byte) is constant while the level of background load (i.e., the data rate of the streams generated by the load generators) is increased from experiment to experiment in every series. In particular, aggregated background loads with the required mean throughput of 1, 2, 3, 4, 8, 12, 16, 20, 24 and 28 Mbit/s on the IP layer are used in each experiment series. For this reason, a set of corresponding UBAs (using the request type definition presented in Fig. 7.2 in Sec. 7.2.1) has

256

10. Estimation of QoS Parameters for RTP/UDP Video Streaming

been prepared to specify CBR and VBR IP traffic loads with the data rate (or required mean throughput in case of VBR) on the IP layer of 0.5, 1, 1.5, 2, 4, 6, 8, 10, 12 and 14 Mbit/s. The corresponding parameter values used to specify the IP background load streams in the case study are presented in Tab. 10.1. For both CBR and VBR traffic the value of the request attribute payloadLength of the abstract IP request type InjectIPPacket in a single R-state of the UBA has been set to 50 or to 1480 byte. The inter-arrival times between the subsequent InjectIPPacket requests are modelled in the corresponding single D-state by means of a constant value for CBR traffic and an exponential distribution for the VBR traffic. In a single experiment, the “Load generator 1” and the “Load generator 2” execute both the same UBA in order to achieve the specified level of utilization in WLAN by means of generating the corresponding aggregated IP background load. Mean IP Throughput [Mbit/s] CBR(50 Byte payload): IAT: Constant [ms] VBR(50 Byte payload): IAT: Exp(λ) with rate λ CBR(1480 Byte payload): IAT: Constant [ms] VBR(1480 Byte payload): IAT: Exp(λ) with rate λ

0.5

1.0

1.5

2.0

4.0

1.120

0.560

0.373

0.280

0.140

892.9

1785.7

26801

3571.4

7142.9

24

12

8

6

3

41.7

83.3

125

166.7

333.3

Mean IP Throughput [Mbit/s] CBR(50 Byte payload): IAT: Constant [ms] VBR(50 Byte payload): IAT: Exp(λ) with rate λ CBR(1480 Byte payload): IAT: Constant [ms] VBR(1480 Byte payload): IAT: Exp(λ) with rate λ

6.0

8.0

10.0

12.0

14.0

0.093

0.070

0.056

0.047

0.040

10752.7

14285.7

17857.1

21276.6

25000

2

1.5

1.2

1.0

0.857

500

666.7

833.3

1000

1166.7

Table 10.1.: Parameters of the IP background load streams in the case study (own Tab.).

In two experiment series with the IP payload length of 1500 Byte we started with an experiment using the aggregated traffic load of 1 Mbit/s, ramped up the traffic load in each subsequent experiment to the next load level of 2, 3, 4, 8, 12, 16, 20, and 24 Mbit/s, correspondingly, and, finally, we ended with

10.3. Streaming Quality Metrics

257

an experiment using the 28 Mbit/s load. The complete experiment series has been repeated ten times (so that the experiments at a specific traffic load level have been also repeated ten times). In the two experiment series with the IP payload length of 50 Byte, we used UBA models specifying the aggregated traffic loads up to 4 Mbit/s due to the fact, that IEEE 802.11g WLANs can support a maximum IP throughput of only 4.9 Mbit/s for such small payload lengths [Rec12]. Consider, for example, that the number of “small” packets with 50 Byte payload required to achieve a given throughput of 4 Mbit/s in WLAN is approximately (not considering the protocol overhead in WLAN) 30 times higher than the number of “large” packets with 1500 Byte payload. The protocol overhead induced by the CSMA/CA channel access mechanisms as well as an additional acknowledgement scheme employed in WLAN for each frame further diminish the achievable throughput for “small” packets.

10.3. Streaming Quality Metrics A number of quantitative metrics to characterise the quality of the transmitted video stream has been extracted from the RTP video traffic captured at the VoD client side. Because we used TShark to passively measure the RTP traffic on the VoD client side, no modifications to the VoD server software were required. The following streaming quality metrics have been considered in this case study: IP throughput: The (mean) throughput on the IP layer is defined as the amount of data sent by the VoD client to the VoD server and received from it per second during the streaming session. Consider that not only the IP packets received by the VoD client from the VoD server but also the IP packets sent to the VoD server (containing, e.g., different RTSP requests, RTP Control Protocol (RTCP) receiver reports, etc.) are contributing to the IP throughput. Further, the IP header length does also contribute to the IP throughput. Jitter: We use a definition of jitter from the RTP protocol [RFC3550] which specifies jitter as a variability in the delays of the RTP packets from the same packet stream. In our measurements we extract the jitter values from the RTCP receiver reports (RR) and consider the jitter values from the audio stream only as the jitter values from the video streams encoded with the H.264 schema may not consider the different order of generation, transmission, and decoding for different types of video frames (I-, P-, and B-frames) within the same GOP (cf. RFC 3984).

258

10. Estimation of QoS Parameters for RTP/UDP Video Streaming

Number of sequence errors: A sequence error occurs in the situation, when an RTP packet arrives at the VoD client and its RTP sequence number is not equal to the RTP sequence number of the last received RTP packet incremented by one. Packet loss: The VoD client reports (in the corresponding RR referring to the video stream) the difference between the number of received and the number of expected RTP video packets. Number of duplicates: An RTP packet is treated as a duplicate in case at least one other RTP packet received during the last two seconds has the same RTP sequence number. In order to obtain statistically significant results for the streaming quality metrics, each experiment has been repeated ten times.

10.4. Results and Discussion Increasing the background load level from experiment to experiment in the experiment series we observed a significant decrease in the video streaming quality both in terms of quantitative characteristics (by monitoring the values of the streaming quality metrics described in the previous section) and in qualitative manner (by assessing the quality of the video playback perceived by a human user). From the qualitative point of view, the video streaming service becomes definitely unacceptable (i.e. unusable) when the background load leads to the utilization levels above the 50% mark of the nominal data rate of 54 Mbit/s in the IEEE 802.11g WLAN. Remember that we instructed the VLC video player to request the RTP video stream from the VoD server as an RTP stream encapsulated in UDP datagrams for the transport, so that no significant protocol overhead is introduced at the transport layer (cf. the explanations below). At the utilization levels in the WLAN of 50% of the nominal data rate and above the VoD client receives just too few audio and video data so that the VLC video player software becomes unable to properly decode and playback the video frames (and also parts from the audio track). As a consequence, the video playback often stalls or becomes “frozen” while the single video frames may contain strong artefacts. Note that in WLAN every single frame will be acknowledged by a corresponding ACK frame thus reducing the maximum achievable throughput for the video stream (measured on the transport layer). This throughput

10.4. Results and Discussion

259

will be further significantly reduced when additional protocol overhead is added on the transport layer. For example, when the TCP protocol will be used to transport the RTP video stream, the VoD client will generate an ACK segment for each TCP segment received from the VoD server and the corresponding WLAN frame bearing this TCP ACK segment will be also acknowledged at the MAC layer in the WLAN access point. This double acknowledgement mechanism does significantly reduce the effective throughput in WLAN in case of TCP transport. The qualitative observations are confirmed by means of observation of the quantitative characteristics for the BBB video stream. Such, the mean IP throughput of the BBB stream decreases from 5.7 Mbit/s (at the zero background load level in WLAN) to 4 Mbit/s at 24 Mbit/s load level and further down to nearly 2 Mbit/s at the 28 Mbit/s CBR background load consisting of IP packets with 1480 Bytes payload (cf. Fig. 10.2, left part). In the same experiments, the values of jitter increase from 2 ms under zero background load up to 18 ms under the background load level of 24 Mbit/s. The values of the characteristics number of sequence errors and number of lost video packets escalate at the load level between 2 and 3 Mbit/s for background loads with 50 Byte payload per packet and at the level between 20 and 24 Mbit/s for background loads with 1480 Byte payload per packet (cf. Fig. 10.2, right part). Comparing the plots for experiment series with “small” packets with the plots of experiment series with “large” packet payload we can observe that in the series with small packets the values of all quality metrics escalate much earlier, i.e. already at the lower background load levels of 2 Mbit/s. Compare that in the series with CBR/VBR background load of large packets, the values of the quality metrics escalate much later in the series beginning at the load levels of 12 Mbit/s. This behaviour is due to the fact, that a number of small IP packets (and thus WLAN frames) with 50 Byte payload required to generate the same background load level, e.g, of 4 MBit/s, is approximately 30 times larger than in the case when these background load level would be generated by means of large IP packets with 1500 Byte payload each. The protocol overhead induced by the CSMA/CA channel access mechanisms as well as the additional acknowledgement mechanism used in WLAN to indicate the receipt of each frame significantly reduce the achievable data throughput especially for small packets. Such, the maximum achievable IP throughput on the network layer in IEEE 802.11g WLAN for IP packets with 50 Bytes payload is limited to approximately 4.9 Mbit/s [Rec12]. In the plots of the video quality metrics from the experiment series with 1480 Bytes packet payload we can observe a significant decrease of the IP

260

10. Estimation of QoS Parameters for RTP/UDP Video Streaming

6

4500

CBR 50 VBR 50 CBR 1480 VBR 1480

5

3

2

audio sequence errors

3500

IP throughput [Mbit/s]

4

CBR 50 VBR 50 CBR 1480 VBR 1480

4000

3000 2500 2000 1500 1000

1 500 0

0 0

2 3 4

8

12

16

20

24

28

0

2 3 4

aggregated mean background load [Mbit/s] CBR 50 VBR 50 CBR 1480 VBR 1480

30

200000

25 150000

jitter [ms]

20 15

8

12

16

20

24

28

aggregated mean background load [Mbit/s] 250000

100000

10 50000 5 0

CBR 50 VBR 50 CBR 1480 VBR 1480

number of lost video packets

35

0 0

2 3 4

8

12

16

20

24

aggregated mean background load [Mbit/s]

28

0

2 3 4

8

12

16

20

24

28

aggregated mean background load [Mbit/s]

Figure 10.2.: Quality metrics for the BBB video stream observed under different background loads, 95% confidence intervals (own Fig.).

throughput of the video stream beginning at the background load levels of 12 Mbit/s for both CBR and VBR background loads (cf. Fig. 10.2). This behaviour is consistent with the observable increase of values of other three quality metrics (jitter, number of sequence errors, and number of lost packets) dipping at the background load level of 12 Mbit/s. Beginning with the load level of 12 Mbit/s, the mean IP throughput of the video stream significantly degrades and the values of jitter as well as the number of sequence errors, lost packets, and packet duplicates (not shown) strongly increases (cf. Fig. 10.2). The particular fluctuations of values of the quality metrics may be due to the possible signal collisions on the channel 1 used in this case study by the D-Link DWL-2100 AP access point while 3 other access points in the area of the experimental network have also used this channel (and thus may have influenced the experiments). Further, the measurement results obtained in the case study show that

10.4. Results and Discussion

261

the streaming quality is affected only marginally in case VBR loads are used instead of CBR traffic loads (of course, given the traffic loads have the same mean IP throughput and use the same IP payload size). This is mainly due to the use of an exponential distribution to specify the inter-arrival times between IP packets in our VBR traffic loads. Therefore, it would be interesting to study the influence of background loads with stronger “bursty” structure on the observed RTP video stream. The results reveal the apparently strong influence of the payload length chosen for IP packets comprising the background loads on the quality of the observed video stream. Consider that the used payload length mainly determines the period of time during which the sending WLAN station will occupy the wireless communication media. An interesting observation can be made for the stream characteristics under the 1480 Byte background load at higher levels of utilization (up from 16 Mbit/s). The plots of the IP throughput, number of sequence errors and number of lost packets characteristics may suggest that the observed BBB video stream is more resistant against the VBR loads than against the CBR loads at high load levels in WLAN. This may be due to the higher probability for the I-frames in the video stream to be (periodically) hit by the packet loss events in case of CBR loads specified with constant packet inter-arrival times and constant packet payload length. Finally we should remark that the presented experimental set-up can be extended to study the quality of video streaming not only from VoD servers located in the Fast Ethernet segment (which is directly connected to the observed WLAN), but also from VoD servers with a longer distance to the VoD client (here, the distance is measured in terms of the induced network transmission delays). For example, the observed VoD server may be located in the backhaul network of the partner ISP, in a CDN of a cooperative video streaming provider, or in the Internet. In order to be able to model different properties of wide area networks (e.g., variable delay, loss, duplication and re-ordering for packets) during the transmission of the observed video stream, an appropriate network emulator (e.g., netem freely available for the Linux systems at http://www.linuxfoundation. org/collaborate/workgroups/networking/netem) can be installed in the Fast Ethernet segment of the case study network and used for the corresponding experiment series. Results of similar experimental studies can be found, e.g., in [CrD05, BBM10].

11. Estimation of QoS Parameters for RTP/TCP Video Streaming in WLANs A large number of studies in the computer networks research field come to the conclusion that the TCP protocol is not suitable for the transmission of traffic generated by real-time applications. The main argument is that the congestion-controlled reliable delivery used by TCP may lead to excessive end-to-end delays of TCP packets and result in violation of the real-time requirements (e.g., in missing play-out deadlines) of the particular video frames [WKST08]. The conventional wisdom for media streaming is to use UDP, rather than TCP, as the transport protocol. Therefore, a number of UDP-based streaming solutions and alternatives for the TCP transport have been proposed by different network research groups, e.g.: • TCP-friendly rate control (TFRC) [RFC3448, RFC5348, LeC08, LCWT10, EBB10], • Congestion control without reliability (DCCP) [KHF06], • Stream Control Transmission Protocol (SCTP) [RFC2960, RFC4960, DRR11], • RTP: A transport protocol for real-time applications [RFC3550]. The proposed protocols have been designed to favour the timely data delivery over reliability in the first place in order to take into account the requirements of real-time media. Moreover, they may provide (minimalistic) mechanisms for congestion control and TCP-friendliness. However, despite the mentioned shortcomings of TCP, a significant fraction of the commercial streaming traffic is carried over TCP (cf. [GTC06, WKST08, EBB10]. This is mainly due to the fact that many commercial streaming media services (like, e.g., QuickTime, RealNetworks, and Windows Media Services) use TCP for the delivery of streaming media in order to © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_11

264

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

be able to pass through restrictive NATs and firewalls that may block the traversal of traffic from other protocols, in particular from the UDP protocol (cf. [GTC06]). The fraction of the streaming traffic carried by means of TCP is further growing because of the emerging HTTP-based adaptive video streaming solutions like, e.g., Microsoft’s Smooth Streaming, Apple’s HTTP Live Streaming, and Adobe’s HTTP Dynamic Streaming which use the standard HTTP protocol over TCP for the transport of video data over unicast connections in unmanaged networks [BAB11, RLB11, JSZ14]. In the network practice, TCP represents a mature and widely tested protocol which is also “TCP-friendly” per default (in contrast to video streams transmitted over UDP which does not employ any sending rate control). Finally, the performance of the TCP protocol stack can be fine-tuned, e.g., by the appropriate choice of the buffer sizes in the TCP sender and the TCP receiver instance as well as the congestion control parameters (e.g., initial congestion control window init cwnd and slow start threshold ssthresh). The objective of the case study in this section is to investigate the streaming session of a pull-based RTSP video streaming service using RTP over TCP protocol to transmit an H.264-coded VBR video over a unicast connection in unmanaged networks. For this reason we picked up a video streaming scenario which is quite often taking place in home area or small business (corporate) networks. In particular, we observe a set of quantitative characteristics of an RTP video stream transmitted under different TCP background loads in a WLAN. The VoD server is located in the wired part (segment) of the experimental network and can provide video films for streaming in HD quality (720p or 1080p). The consumer of the video stream is located in the wireless part (segment) of the experimental network at the “VoD client” station and starts playing out the video after a specified initial start-up delay of 3 sec during which the video frames are stored in the internal play-out buffer in the VoD client. In our chosen scenario, traffic from other service users in the WLAN (for example, navigating Web pages, reading email, watching videos, or transmitting data back-ups to the back-up server in the wired network segment) is modelled using a set of UniLoG load generators in combination with the UniLoG.TCP adapters which produce a specified number of background TCP load streams. The case study should help us, among others, to answer the following questions: 1. Which level of TCP background load and/or which number of TCP background load streams in the experimental WLAN does lead to a

11.1. Experimental Network

265

significant degradation in the transmission quality of the observed BBB RTP/TCP video stream? 2. At which level of TCP background load does the streaming service become “unusable”?

11.1. Experimental Network The experimental network for this case study consists (similar to the network used for the case study presented in Chapter 10) of two segments, a wireless and a wired one (cf. Fig. 11.1). But in contrast to the case study from Chapter 10, in this case study the wired segment is provided by means of a 1 Gbit/s Gigabit Ethernet LAN. The WLAN segment is, again, provided by the same IEEE 802.11g WLAN access point D-Link DWL-2100 AP used in the case study in Chapter 10. The access point supports a maximum data rate of 54 Mbit/s and has been instructed to use channel 1 in the 2.4 GHz band (while OFDM modulation has been used) and to transmit frames with the maximum signal strength allowed (100 mW). The distance between the access point and the VoD client, between the access point and the load generators, and between each two of the three WLAN stations has been chosen such, that the hidden station problem shouldn’t occur in the WLAN segment. This allowed us to switch off the request-to-send / clear-to-send (RTS/CTS) mechanism on each WLAN station and the access point itself (by setting the corresponding threshold value for the RTS/CTS function to 2346 byte). The VoD server (Intel Quad Core 2,83 GHz, 3,25 GB RAM, Broadcom 57xx Gigabit Ethernet NIC) is located in the Gigabit Ethernet segment and uses the LIVE555 Media Server software (freely available at http://www.live555. com/mediaServer/) to provide the “Big Buck Bunny” video stream available from the “Peach open movie project“ [BBB] at www.bigbuckbunny.org in HD quality (1280x720p) using the frame rate of 24 Hz and the GOP pattern of (24,2). The video stream has been compressed using the H.264/AVC encoding scheme and encapsulated (transcoded) into the corresponding MPEG transport stream container file (.ts) for the network transmission. The LIVE555 Media Server is responsible for the encapsulation of the video and audio frames into RTP packets and can deliver the resulting stream for requests / streaming by means of both UDP and TCP protocols. The streaming user is located in the WLAN segment on the “VoD client” station (Intel Core 2 2,13 GHz, 1 GB RAM, X-Micro 802.11g WLAN USB) and makes use of the “Video LAN Client” (VLC) media player software

266

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

Figure 11.1.: Experimental network: 1 Gbit/s Gigabit Ethernet, 54 Mbit/s IEEE 802.11g WLAN, transmission of the BBB video stream by means of RTP over TCP (own Fig.).

(freely available at http://www.videolan.org/vlc/) to receive, decode, and playback the video stream. Consider that in this case study the VLC media player on the VoD client side has been instructed to request the BBB video stream over the TCP protocol (RTP over TCP delivery). The video streaming software in the VoD client makes use of the SDP protocol (RFC4566) to specify the initialization parameters of the streaming session and to describe the requested streaming media. Further, it uses the RTSP protocol [RFC2326] to control the streaming session. The server part of the video streaming software (VoD server) uses the SDP protocol to locate the streaming media requested by the VoD client and the RTP protocol [RFC3550] to encapsulate the frames from the video and audio streams into RTP packets. The resulting RTP packets are injected into the experimental network as TCP segments at the transport layer in the VoD server (i.e., the RTP over TCP encapsulation variant is used). The resulting RTP video stream consists of a total number of 14314 video frames, the total size of the video data is 421 MByte, and the total duration of the video playback (without any background load) is 9 min 56 sec.

11.1. Experimental Network

267

Traffic induced in the observed WLAN by users of different services in our experimental network (e.g., watching videos, transmitting data back-ups to the back-up server in the wired network segment, or simply navigating Web pages or reading email) is modelled by means of a number of UniLoG load generators which are used in combination with the corresponding UniLoG.TCP adapter. The load generators and the UniLoG.TCP adapters are installed at the host “Load generator” (Intel Quad Core 2,83 GHz, 3,25 GB RAM, Broadcom 57xx Gigabit Ethernet NIC) in the Gigabit Ethernet segment. Parts of the UniLoG.TCP adapter required to model the peer TCP service user (e.g., to provide a receiver for the generated TCP background load streams) have been installed at the “Load sink” host (Intel Core 2 2,13 GHz, 1 GB RAM, X-Micro 802.11g WLAN USB) in the WLAN segment. Consider that the UniLoG.TCP adapter components installed in the “Load sink” generate TCP acknowledgements for the TCP segments of the background load streams injected by the load generator at the “Load generator” host in the Gigabit Ethernet segment. The VoD client receives the BBB RTP/TCP video stream from the VoD server over the WLAN while the “Load generator” injects background TCP load streams at the “Load generator” host in the Gigabit Ethernet which are targeting the “Load sink” station in the WLAN segment. Consider that the number of TCP streams contributing to the background load can be controlled by means of the specification of the number of UniLoG.TCP generators to be used in an experiment. Thus, in this particular network configuration, the following issues take place: 1. The observed BBB video stream and the generated TCP background load streams are multiplexed already in the Gigabit Ethernet switch and the TCP traffic reaching the access point AP is already aggregated. 2. Because of the TCP congestion control mechanisms used by the TCP protocol for each established connection / stream, we expect that the observed BBB RTP/TCP video stream and the generated TCP background load streams will fairly share the available capacity in the wireless segment. 3. The access point and the “Load sink” compete for the WLAN channel number 1 when accessing the wireless communication media by means of the CSMA/CA protocol. Consider that the receiving TCP protocol instance in the “Load sink” station generates TCP acknowledgements for the TCP segments comprising the TCP background load streams injected by the load generator.

268

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

The network and protocol analysis software TShark (freely available at www.wireshark.org) has been used on the VoD client side in order to capture the RTP/TCP/IP video traffic data received by the VLC video player. Further, the captured traffic data has been used to obtain a set of quantitative metrics (as they have already been introduced in Sec. 10.3 for the case study described in Chapter 10) for the quality of the transmitted video film. The “VoD client” station represents the “central coordination point” in this case study and, therefore, we installed the UniLoG management station also at the VoD client in order to remotely control the UniLoG.TCP load generator and to be able to execute the scripts for the analyses of the captured video traffic to obtain the QoS metrics. Similar to the case study for the quality estimation of the RTP/UDP video streaming in WLANs presented in Chapter 10, the experiments of this case study have been conducted in the laboratory of the TKRN research group at the University of Hamburg, too. It should be noted that beacon frames from other eleven active access points have been observed in the laboratory area. Three of these access points transmitted data on the same channel 1 as the access point D-Link DWL-2100 AP used in our experimental network. Therefore, also in this case study, we cannot generally exclude the influence of interfering send requests from other WLAN devices during the experiments.

11.2. Configuration of the TCP Background Load The case study covers six series of experiments. In all six series the QoS metrics of the video streaming service are measured at the “VoD client” but a different number of TCP background load streams is used in each of the series. In all experiment series the size of the TCP payload has been chosen to be 1460 Byte and it remains identical during all experiments in the series. The required mean TCP throughput of each single TCP background load stream can be specified by means of the corresponding inter-arrival times between the TCP requests in the stream to be generated. Such, the inter-arrival times between the TCP requests comprising a TCP background load stream with a required mean throughput of 0.5, 1, 2, 4, 8, 10, 12, and 16 Mbit/s are presented in Tab. 11.1. In each of the six experiment series, the number of TCP background load streams is constant while the level of TCP background load (i.e., the required mean throughput of the TCP streams generated by the load generators)

11.3. Measurement Results and Discussion Mean TCP Throughput [Mbit/s] CBR (1460 Byte payload) IAT: Constant [ms]

269

0.5

1.0

2.0

4.0

8.0

10.0

12.0

16.0

23.3

11.7

5.8

2.9

1.46

1.16

0.97

0.73

Table 11.1.: Parameters of the TCP background load streams in the case study (own Tab.).

is increased from experiment to experiment in the series. For this reason, a set of corresponding UBAs has been prepared to specify different TCP traffic loads with the required mean throughput on the TCP layer of 0.5, 1, 2, 4, 8, 10, 12 and 16 Mbit/s. The corresponding parameter values used to specify the TCP background load streams in the case study are presented in Tab. 11.1. For example, in the first experiment of the experiment series with 3 background load streams, three corresponding UniLoG.TCP load generators execute the same UBA in order to generate three background TCP load streams with a specified mean throughput of 0.5 Mbit/s per stream. In the second experiment of the series, all three load generators execute a UBA to generate three TCP streams with a specified mean throughput of 1 Mbit/s per stream, and so forth. Finally, in the last experiment, three TCP streams with a specified mean throughput of 16 Mbit/s per stream are generated. Consider that the TCP throughput effectively achieved for each TCP background load stream can differ from its specified mean throughput due to the WLAN capacity and TCP congestion control reasons. Each of the experiment series has been repeated ten times (so that every experiment using a specific combination of the number of background load streams and the specified mean throughput per stream has been repeated ten times).

11.3. Measurement Results and Discussion In this section we present results of measurements for the video quality metrics (as they have already been introduced in Sec. 10.3 for the case study described in Chapter 10) when the BBB video stream is transmitted using RTP over the TCP protocol in the experimental network. The values of the metrics IP throughput, jitter, number of sequence errors, number of lost packets, and number of duplicate packets have been obtained from the RTP video traffic captured at the VoD client side. Because we used TShark to passively measure the RTP traffic on the VoD client side, no modifications to the VoD server software were required. In particular, results

270

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

for quality metrics for streaming without any background load in the WLAN are presented in Sec. 11.3.1. Further, results for quality metrics obtained for the BBB video stream under different number of TCP background load streams in the WLAN are described in Sec. 11.3.2.

11.3.1. Streaming in the Experimental Network without Background Load For the interpretation of the streaming results in our experimental network we have to consider the maximum throughput achievable in a WLAN for data transfers over TCP. A theoretical calculation of the maximum achievable TCP throughput in IEEE 802.11g WLANs with OFDM modulation can be found, e.g., in [Rec12] in Sec. 8.6. For example, for a single TCP stream consisting of segments with maximum possible payload size of 1460 Bytes the achievable throughput yields to 24.4 Mbit/s. This calculation considers the time required for the transmission of TCP and MAC acknowledgements while the additional overhead for beacon frames and back-off times after collisions during the channel access in the CSMA/CA protocol are not taken into account and can significantly reduce the factually achieved throughput in the WLAN. For this reason, we measured the achievable TCP throughput in our experimental network by means of the Iperf tool (freely available, e.g., at https://iperf.fr/ or http://www.nwlab.net/art/iperf/). To be precise, a single throughput measurement has been executed before each of the 400 experiments described in Sec. 11.3.2. In case the measured TCP throughput in the WLAN was below 20 Mbit/s the main streaming experiment has not been started and a further TCP throughput measurement has been initiated instead of it. Among all 400 experiments, a minimum and a maximum TCP throughput achieved by a single TCP stream (generated and evaluated by the Iperf tool) has been estimated to 21.3 Mbit/s and 22.9 Mbit/s, correspondingly. The mean throughput among all 400 experiments yielded to 22.3 Mbit/s. Considering the transmission of the BBB video in our experimental network we first obtained the quantitative streaming characteristics for the BBB streaming by means of RTP over TCP without any background load. The TCP protocol instance at the VoD client used a receive buffer size RCVBUF of 17520 Byte (which is the default receive buffer size in Windows XP Professional SP3 for links with data rates between 10 and 100 Mbit/s and hence also for our 54 Mbit/s WLAN segment). The TCP protocol instance at the VoD server used a default send buffer size SNDBUF of 8 KByte.

11.3. Measurement Results and Discussion

271

The plot of the IP throughput in Fig. 11.2 reveals the strongly variable structure of the BBB video stream. The mean IP throughput measured with intervals of 1 sec at the VoD client side in the experimental network without background load was estimated to 5.8 Mbit/s. During the streaming, fragments with high motion intensity in the video lead to significant peaks in the throughput plot of up to 18.5 Mbit/s in particular measurement intervals (cf. Fig. 11.2). As can be seen from the summary of the streaming statistics, even in the case of no background load in the experimental network, the VoD server was forced to skip a number of RTP frames during the video streaming so that 520 RTP sequence errors and 13172 packet losses have been induced and recorded at the VoD client side. 20

Streaming statistics Throughput, max Throughput, mean Throughput, med Jitter, mean Jitter, med #RTPPackets #ReceiverReports #RTPSeqErr #TCP Duplicates #RTPLostPackets

IP throughput of the video stream [Mbit/s]

IP throughput of the video stream (RTP over TCP) at the VoD client Mean IP throughput of the video stream (RTP over TCP) 18 16 14 12 10 8 6 4 2 0 100

200

300

400

500

600

RTT, mean RTT, max RTT, med Q 95% Q 98% Q 99% Q 99.99% RTT > 0.020 s

18.537 Mbit/s 5.886 Mbit/s 4.9 Mbit/s 0.001033 s 0.000789 s 314864 291 520 0 13172 (≈4.183%) 0.004376 s 0.520042 s 0.003879 s 0.008229 s 0.009762 s 0.010988 s 0.021134 s 22 (of 165039)

Playback time [s]

Figure 11.2.: IP throughput and streaming statistics of the BBB video stream (RTP over TCP) at the VoD client side, no background load, TCP receive buffer size RCVBUFF in the VoD client set to default value of 17520 Byte (own Fig.).

It should be noted that the maximum size of the TCP receive buffer RCVBUFF of 17520 Byte used by the VoD client didn’t represent a bottleneck for the transmission of our BBB video stream. Recall that the maximum receive buffer size RCVBUFF determines the maximum achievable data rate of the receiving TCP protocol instance which can be approximated by the quotient of the maximum TCP receive buffer size RCVBUFF and the mean TCP round trip time RT Tmean in the particular network. In our experimental network the mean TCP round-trip time RT Tmean was measured to be 0.004376 s (cf. Fig. 11.2) and hence the maximum achievable data rate of 17520[Byte] BU F F the receiving TCP protocol instance yields to RCV RT Tmean = 0.004376[s] ≈

272

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

32 M bit/s. This available data rate is significantly higher than the mean throughput of 5.7 Mbit/s of the BBB video stream and also higher than the maximum throughput of 18.7 Mbit/s required by the BBB stream at some particular measurement intervals. 20

IP throughput of the video stream [Mbit/s]

IP throughput of the video stream (RTP over TCP) at the VoD client 18 16 14 12 10 8 6 4 2 0 100

200

300

400

500

600

Streaming statistics Throughput, max Throughput, mean Throughput, med Jitter, mean Jitter, med #RTPPackets #ReceiverReports #RTPSeqErr #TCP Duplicates #RTPLostPackets RTT, mean RTT, max RTT, med Q 95% Q 98% Q 99% Q 99.99% RTT > 0.010 s

8.099 Mbit/s 3.664 Mbit/s 3.582 Mbit/s 0.003448 s 0.000856 s 55570 83 1069 124 31217 (≈56%) 0.020256 s 0.887832 s 0.002286 s 0.128477 s 0.378638 s 0.526944 s 0.852465 s 258 (of 3943)

Playback time [s]

Figure 11.3.: IP throughput and streaming statistics of the BBB video stream (RTP over TCP) at the VoD client side, no background load, TCP receive buffer size RCVBUFF in the VoD client increased to 65535 Byte (own Fig.).

An interesting result has been obtained by increasing the maximum TCP receive buffer size RCVBUFF to the value of 65535 Byte (which is the maximum possible buffer size without the activating the TCP window scaling option). Against our expectations from the video streaming behaviour in wired Ethernet networks changing the receive buffer size in WLAN has not induced any improvement in the streaming quality. Instead of that, it resulted in a drastic degradation of all streaming quality metrics and a crash of the VLC video player software after ca. 3 minutes because too many RTP packets with video data have been lost in the WLAN (cf. Fig. 11.3). The reason for this behaviour may be the increased number of TCP segments allowed to be generated by the sending TCP protocol instance at the VoD server until the next TCP acknowledgement is generated by the receiving TCP protocol instance at the VoD client side. In this way, the probability that a lost and retransmitted TCP segment arrives just too late at the receiving TCP protocol instance in the VoD client and thus produces an RTP packet loss (and hence potential losses of many video frames) is strongly increasing (cf. the number of RTP packet losses of nearly 56% reported in Fig. 11.3). Therefore, we suggest that increasing the TCP send buffer size

11.3. Measurement Results and Discussion

273

(SNDBUFF in the VoD server) and decreasing (rather than increasing) the TCP receive buffer size (RCVBUFF in the VoD client) combined with a longer initial playout delay (buffer) in the video player software (e.g., 5 sec instead of the used 3 sec initial playout buffer) would be more appropriate measures to improve the quality of BBB video streaming in WLAN.

11.3.2. Streaming in the Experimental Network under Background Load In each series of experiments in this case study, the number of TCP background load streams has been kept constant and the TCP throughput required for each single stream has been specified by means of the corresponding data rate and stepped up from experiment to experiment within the series starting with 0.5 Mbit/s per stream, then up to 1 Mbit/s, 2 Mbit/s, 4 Mbit/s, 8, 10, 12, and finally 16 Mbit/s per stream. As long as the background load contains only one single TCP stream the capacity of WLAN is sufficient to transmit the BBB video stream and the values of the mean IP throughput of the video stream for different specified throughput of the background load stream are all nearly identical and yield to ca. 5.8 Mbit/s (cf. Fig. 11.4). When a second TCP stream is added to the background load and each of these two streams requires a specified data rate of 8 Mbit/s (so that both streams require a total data rate of 16 Mbit/s in WLAN), the remaining capacity in WLAN shrinks and the throughput of the BBB video stream slightly decreases to 5.1 Mbit/s. With three TCP streams comprising the background load, the capacity of our experimental WLAN is exhausted already for a specified data rate of 4 Mbit/s per stream. The BBB video throughput decreases to nearly 5 Mbit/s under this load. A further increase of the specified data rate per background load stream does not induce any decrease in the throughput of the video because the capacity of the WLAN is (uniformly) shared by the three background load streams and the BBB video stream. In case of five (5) background load streams with a required data rate of 2 Mbit/s per stream the video throughput yields to 5.06 Mbit/s and the video quality is still acceptable from our point of view. Increasing the data rate per stream up to 4 Mbit/s leads to a significant decrease of the BBB video throughput down to 4.2 Mbit/s and the perceived video quality becomes not acceptable. In case of zero background load we measured a jitter value of 0.8 ms (cf. Fig. 11.5). With one single background load stream the jitter values increase at most up to 1.3 ms. The jitter values can be used and are very well suited to assess the perceived video streaming quality in general and to

IP throughput of the video stream [Mbit/s]

274

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

6

5.5

5

4.5

1xTCP 2xTCP 3xTCP 4xTCP 5xTCP 6xTCP

4

3.5

3 0

0.5

1

2

4

8

10

12

16

Specified data rate of a single TCP background load stream [Mbit/s]

Figure 11.4.: IP throughput of the BBB video stream observed under different number of TCP background load streams, 95% confidence intervals (own Fig.).

determine the level of background load at which the video quality becomes unacceptable. As soon as the jitter values exceed the 2 ms mark, the delay variations become so strong, that the playback quality of the BBB video film drastically degrades. The 2 ms jitter mark will be achieved in the case of three (3) background load streams with the specified data rate of 8 Mbit/s each and in the case of four (4) background load streams already with a specified data rate of 4 Mbit/s per stream. Further, we observed that when the background load consists of more than one stream, the jitter values do not further increase after a certain specified data rate per background load stream (e.g., for three streams in the background load, the jitter values remain on the same level for the specified data rate per stream greater than or equal to 8 Mbit/s). In order to find out the reasons for the occurrence of RTP sequence errors, we analysed the traffic from the RTP streaming sessions captured at the VoD client side (and partly at the VoD server side). The close look into the traces revealed that a small fraction (less than 0.01%) of RTP packets has been lost already before their entry and the transmission in the WLAN. It can be assumed that these packet losses occurred in the access point AP itself, e.g., due to an overflow of one of its ingress buffers. In case of losses, the lost TCP segment(s) have been requested by the receiving TCP instance in the VoD client for retransmission from the VoD server (as there is no TCP protocol instance installed in the AP). The retransmission mechanism

11.3. Measurement Results and Discussion

Jitter in the RTP/TCP video stream [ms]

3.5

275

1xTCP 2xTCP 3xTCP 4xTCP 5xTCP 6xTCP

3

2.5

2

1.5

1

0.5

0 0

0.5

1

2

4

8

10

12

16

Specified data rate of a single TCP background load stream [Mbit/s]

Figure 11.5.: Jitter values in the BBB video stream observed under different number of TCP background load streams, 95% confidence intervals (own Fig.).

allowed on the one hand to avoid a sequence error in the RTP layer. On the other hand, the induced retransmission delay contributed to the total delay of the lost TCP segment at the VoD client side (and possibly other RTP packets stored in the play-out buffer in the video player software). In case the deadline of the RTP packet transported in the retransmitted TCP segment has been missed in the play-out buffer, significant errors in the video playback have been observed. When we gradually increase the number of TCP streams and the specified data rate per stream in the background load from experiment to experiment, a significant increase of the number of RTP sequence errors can be observed (cf. Fig. 11.6). At first glance it was a bit surprising because the TCP end-to-end error control mechanism should provide for the retransmission of lost segments and the initial packet order at the receiver side in case of TCP segments reordering during the transmission. Therefore, we actually did not expect any RTP sequence errors. However, we found out that with the RTCP Receiver Reports (RR) the VoD server periodically (usually every 5 s) received the current values of jitter, fraction of packet losses and time stamps for the estimation of the round-trip time and thus was able to dynamically adjust its video data rate according to the utilization level of the network. In our case study, the BBB video film has been encapsulated into the MPEG-II transport stream and the VoD server (LIVE555 Media Server software) apparently just skipped RTP frames from the stream in case of increasing

RTP sequence errors in the video stream

276

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

12000

1xTCP 2xTCP 3xTCP 4xTCP 5xTCP 6xTCP

10000

8000

6000

4000

2000

0 0

0.5

1

2

4

8

10

12

16

Specified data rate of a single TCP background load stream [Mbit/s]

Figure 11.6.: Number of RTP sequence errors in the BBB video stream observed under different number of TCP background load streams, 95% confidence intervals (own Fig.).

round-trip time. For this reason, an increasing number of RTP sequence errors has been observed on the VoD client side under increasing background loads. The fact that the VoD server skips RTP frames can be confirmed by the observation of the size of the PCAP file containing the traffic from the RTP streaming session captured on the VoD client side. For example, when the BBB video was streamed under no background load in WLAN, the resulting size of the PCAP file captured at the VoD client amounted to ca. 450 MByte while it decreased to 387 MByte under three (3) streams with 4 Mbit/s per stream and further down to 360 MByte under four (4) streams with 4 Mbit/s per stream in the background load. The plot of the RTP packet losses in Fig. 11.7 reveals that even when the BBB video stream is transmitted without any background load in WLAN, nearly 11.000 RTP packets (out of the total number of 314864 packets in the streaming session) have been reported as lost packets in the last RTCP Receiver Report (RR). These RTP packets miss their play-out deadline and, therefore, they will be just skipped in the demultiplexer module of the video player software at the VoD client. On the other hand, the H.264 video coding scheme used for our BBB video provides for some degree of error tolerance so that just a few non-significant errors can be observed during the video playback under zero background load. The number of lost RTP packets rises strongly when more than three TCP streams are included into the background load. For example, under the background load consisting of five

Lost RTP packets in the video stream

11.3. Measurement Results and Discussion

100000

277

1xTCP 2xTCP 3xTCP 4xTCP 5xTCP 6xTCP

80000

60000

40000

20000

0 0

0.5

1

2

4

8

10

12

16

Specified data rate of a single TCP background load stream [Mbit/s]

Figure 11.7.: Number of lost RTP packets in the BBB video stream observed under different number of TCP background load streams, 95% confidence intervals (own Fig.).

TCP streams each with a specified data rate of 2 Mbit/s, nearly 30.000 RTP packets are reported as lost in the RTCP Receiver Reports. Similar to our previous observations (made for the IP throughput and jitter characteristics) we note that after a certain value of the specified data rate for each TCP stream in the background load is reached, the number of lost RTP packets does not further increase when the specified data rate per stream is gradually increased from experiment to experiment in the experiment series. Such, the maximum number of lost RTP packets is reached with one TCP background load stream with a specified data rate of 8 Mbit/s and with three TCP background load streams already with a specified data rate of 4 Mbit/s (cf. Fig. 11.7). Finally we discuss the results for the characteristic number of duplicate TCP segments which is illustrated in Fig. 11.8 for the exemplary case when the background load consists of only one single TCP stream. We have been able to observe duplicate TCP segments in the captured traffic resulting from our BBB RTP streaming session because the traffic capture has been made using the TShark utility directly at the Ethernet layer, and thus before the TCP error control mechanism came into play on the VoD client (receiver’s) side. Duplicate TCP segments appeared in the WLAN when a particular WLAN MAC frame containing a TCP segment has been received by the VoD client but the corresponding WLAN MAC acknowledgement from the VoD client

Duplicate TCP segments in the video stream

278

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

60

50

40

30

20

10

1xTCP 0 0

0.5

1

2

4

8

10

12

16

Specified data rate of a single TCP background load stream [Mbit/s]

Figure 11.8.: Duplicate TCP segments in the BBB video stream observed under one TCP background load stream, 95% confidence intervals (own Fig.).

to the access point AP has been lost. As a result, after the corresponding retransmission timer in the access point has elapsed, the access point has retransmitted the lost WLAN frame containing the same TCP segment. In this case, no RTP packet duplicates could be observed at the RTP layer in the video player software because the occurred duplicate TCP segments have been eliminated already in the receiving TCP protocol instance at the VoD client side. A careful examination of the plot in Fig. 11.8 reveals that the number of TCP duplicates is relatively small but it varies very strongly. Considering the size of the 95% confidence intervals we still can observe a small increase in the number of duplicate TCP segments when the specified data rate of the background TCP streams is increased from experiment to experiment. The strong variation of the number of packet duplicates is probably due to the effects of the CSMA/CA mechanism used for channel access in the WLAN and further due to the influence of other active access points in the neighbourhood which may use WLAN channels overlapping with the channel 1 used during the experiments in our network.

11.4. Conclusions In the presented case study we observed a set of quantitative metrics (IP throughput, RTP jitter, number of RTP sequence errors, lost RTP video

11.4. Conclusions

279

packets, and duplicate video packets) of a unicast RTSP streaming session which uses RTP over TCP for the transport of video packets in a WLAN under different TCP background loads. The experiment results have shown that RTP sequence errors and packet losses can occur already at zero background load in WLAN (cf. Sec. 11.3.1). This is mainly due to the strong scene variability in the BBB video and hence in the amount of the video data (and thus in the size of video frames and the number of resulting RTP packets) to be transmitted over the WLAN. During the fragments of very high motion intensity in the BBB video the amount of video data to be transmitted peaked to approximately 18.2 Mbit/s which is far beyond the average BBB throughput of 5.8 Mbit/s. The number of RTP packets to be sent over the WLAN per second in these video fragments is also much greater than the mean packet rate of the video. The used Live555 video streaming server generated bursts of RTP packets in order to prevent the latency of the encapsulated video frames from a strong increase. In case the higher latency still led to misses of the play-out deadlines for particular video frames, the corresponding RTP packets transporting such frames are rejected by the VLC player in the VoD client and registered in the values of the quality metrics “number of RTP sequence errors” and “number of lost RTP packets”. When additional TCP background load streams are generated and injected into the experimental network, the available capacity is shared by all existing TCP streams in WLAN. The experiments have shown, that the capacity of the WLAN is exhausted already when 3 TCP background load streams with a specified required throughput of 4 Mbit/s per stream are injected into the experimental network. In this case, a further increase of the specified throughput per stream does not lead to any further changes of the values of the observed streaming quality metrics due to the round-trip time-based TCP rate control mechanisms employed by each of the background load streams and the BBB video stream itself. Under background load the estimation of the round-trip time for the observed video stream delivers higher values which are reported to the VoD server by means of the Receiver Reports (RR) from the VoD client. The used Live555 media server reacted to the increasing round-trip-time by adjusting the sending rate of the video stream, i.e. it skipped a number of RTP video packets from the stream so that a number of RTP sequence errors and corresponding RTP losses occurred at the VoD client side. Generally, we should state that due to the used H.264 video coding and RTP video packetisation scheme the observed video streaming session has been tolerable to the increasing packet delays and quite robust to

280

11. Estimation of QoS Parameters for RTP/TCP Video Streaming

the increasing jitter and number of RTP packet losses resulting from the background load in WLAN at least until some load level (e.g., 4 Mbit/s per stream using three TCP background load streams). In order to provide a smooth video playback in practice, special attention must be paid to the proper dimensioning of the RTSP and TCP buffers in the VoD client and the VoD server. We notice that the BBB video used in the case study has been chosen right because of its strong VBR nature resulting from the periods of high motion intensity. Therefore, it would be rather difficult to avoid errors during the video playback when the residual capacity (i.e. the data rate) available for the transmission of the video stream is just above the mean throughput of 5.8 Mbit/s required for the BBB video. Reserving the capacity in a WLAN which would be enough to serve the maximum throughput requirement (18.5 Mbit/s) of the BBB video for the whole period of the video playback (9 min 56 sec) can hardly be an option. In practice, the provider of the video streaming service may employ different scalable encoding schemes to avoid load peaks and/or use adaptive streaming techniques to throttle the sending rate required for the BBB stream in periods of higher motion intensity in the original video [RBV12, JSZ14, LHC15]. Further measures (like the use of forward error correction techniques, resource reservation, etc.) are possible to facilitate streaming in managed networks (e.g., IP-multicast networks used to deliver IPTV services to users) [BAB11]. Similar to the case study presented in Sec. 10, the case study presented in this section can also be extended to model advanced properties of wide area networks (e.g., variable delay, loss, duplication and re-ordering of packets) during the transmission of the observed BBB video stream. For this reason, an appropriate network emulator (e.g., netem freely available for the Linux systems at http://www.linuxfoundation.org/collaborate/ workgroups/networking/netem) can be installed in the Gigabit Ethernet segment of the case study network. Results of similar experimental studies can be found, e.g., in [WKST08, EBB10].

Part V.

Results and Conclusions

12. Summary and Outlook The accurate and realistic modelling and generation of network workload which may consist of a mix of many complex traffic sources is a difficult and challenging task. Analyses and generation of network workload, in particular in large-scale networks, can be aggravated by the heterogeneity and large number of used network devices and protocols, as well as different types of applications and services which may strongly evolve over time. Furthermore, the purpose of the workload modelling and, therefore, the objectives of the corresponding experimental tests and case studies may vary, e.g., from the performance evaluation analyses to the analyses of network neutrality and security mechanisms. Therefore, in order to keep up with the perpetually emerging new requirements and the corresponding technical challenges, networking research community needs to continuously improve the methods and tools used for workload modelling and generation.

12.1. Summary of Results At this point we summarise the obtained results and emphasize major contributions of this thesis to the research field of workload modelling and generation in IP-based networks. In this thesis, a unified approach for workload modelling and generation with general applicability in IP-based networks has been elaborated and a set of the corresponding tools for the specification and generation of synthetic (artificial) workloads has been developed. The architecture of the Unified Load Generator UniLoG proposed in the thesis can be used for the generation of realistic workloads and traffic according to various workload and traffic models at different service interfaces in IP-based networks (e.g., HTTP, TCP/UDP, or IPv4 service interfaces). An enhanced degree of flexibility of the UniLoG architecture has been achieved by means of the formal automata-based workload description technique developed in the context of this thesis and integrated into the UniLoG load generator for the specification of workload and traffic models. In this way, existing models proposed in the networking research community can be © Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3_12

284

12. Summary and Outlook

described by means of the corresponding User Behavior Automaton (UBA) and used for load generation in UniLoG. A set of workload models for VoIP, video, and HTTP traffic sources has been developed by the author of this thesis and provided in form of the corresponding UBAs for UniLoG, but the tool is in no way restricted to the use of these models. The extensibility of the UniLoG architecture is reached by means of the adapter components used in order to inject the generated workload or traffic as a sequence of requests at a real target service interface specified by the experimenter. Adapters for selected application (HTTP), transport (TCP and UDP), and network (IPv4) service interfaces have been developed in the context of this thesis and their functional set as well as the corresponding performance characteristics have been presented in the Chapters 7–9. Further, the UniLoG architecture provides a high degree of scalability in respect to the number of emulated virtual service users which can be increased using three different extension levels: 1) integrating of a number of virtual users in the single aggregated UBA, 2) using a number of load agents each executed on a different processor core in a multi-core system, or 3) mapping of the virtual users onto the different geographically distributed load agents and using the UniLoG system for geographically distributed load generation developed in the context of this thesis (cf. Chapter 6). With the embedded workload specification technique, the UniLoG load generator provides a high level of abstraction and flexibility during load modelling. Results of experimental tests and measurements of the performance characteristics of the UniLoG.IPv4 (cf. Chapter 7) and UniLoG.TCP adapter (cf. Chapter 8) confirmed that the UniLoG load generator is able to guarantee an astonishingly effective generation of traffic loads in real-time at this high level of abstraction. This is one major contribution of this thesis to the state-of-the-art in the networking research community. Furthermore, the analyses and investigations made during the experimental tests with the implemented UniLoG.IPv4, UniLoG.TCP, and UniLoG.HTTP adapters revealed several interesting aspects and peculiarities in the packet processing procedures in the Windows and Linux networking subsystems, whose understanding may be exploited in order to further improve the performance characteristics of workload generators. The following important results have been obtained in the thesis in the field of workload specification and modelling: • A formal UBA-based workload description technique has been elaborated, implemented, and integrated into the UniLoG load generator for the specification of workload and traffic models to be used for load generation.

12.1. Summary of Results

285

• An original UBA concept introduced in [WoK90] has been generalized and a set of important extensions has been proposed in order to make the UBA representation sufficiently precise, complete, and generally applicable to the specification of different state-of-the-art workload models. Such, the author introduced the elementary states, the aggregation function for the elementary states, the context of the UBA, and the context expressions as a means to describe the context changes in the automaton. • The concrete use of XML Schema Definition (XSD) for the specification of different UBA components (such as states and transitions, types of abstract requests and reactions, different UBA parameters, and context expressions) has been presented. • The workload specification technique has been implemented in a corresponding tool, called LoadSpec, which can be used for the specification of different workload models in form of UBAs to be used for load generation in the UniLoG load generator. • Finally, concrete UBA models have been developed using the LoadSpec tool for 1) voice traffic sources with different types of codecs for CBR traffic (e.g. G.711 and G.729.1) or VBR traffic (e.g., G.723.1 and iLBC), 2) video traffic sources, considering the different types of video frames, different GOP structures, and the fragments of different motion intensity in the original video, and 3) Web/HTTP traffic sources, taking into account the application level details such as the location and the structure of the requested Web pages. In particular, a comprehensive statistical analysis of the frame lengths has been conducted and concrete values for the parameters of the frame size distributions have been obtained by the author for a chosen VBR H.264-encoded video (Big Buck Bunny (BBB)). This could be a valuable contribution of the thesis to the networking research community as the obtained parameter values for the proposed Log-normal and Gamma distributions (chosen to fit the empirical distributions of the frame size) can be used by the researchers to appropriately parameterise their workload generators in this case. To emphasize how the UniLoG load generator advances the state-of-the-art in workload generation, we finally have reported several experimental results related to the study of “hot topics” like performance and QoS analysis of video streaming applications in home area or small business corporate networks. The case studies presented in Chapters 10 and 11 use a different configuration of the experimental network and different transport techniques

286

12. Summary and Outlook

for the video stream (RTP over UDP, or RTP over TCP) so that different types of background loads must be generated using UniLoG. In particular, interesting insights into the effects of TCP fairness could be obtained in the experimental tests of the case study in Chapter 11. In both case studies we presented the obtained measurements results for the video streaming quality characteristics and provided a final discussion which may be interesting for the providers of video streaming services in practice. Finally, with the collaboration of the integrated workload specification technique, a set of workload models, corresponding adapters, and the distributed load generation function, UniLoG allows one to coordinate and further combine the tasks of load modelling, specification, and generation into one single, coherent approach. Therefore, UniLoG can be expected to provide a highly universal and effective tool for workload generation in IP-based networks.

12.2. Outlook on Future Work Finally, we should mention the issues that require further research and investigation. From the point of view of the author of this thesis, potential topics of (our) future work may be, e.g.: • development of additional application-level workload models for different types of applications and services (e.g., IPTV or VoD services), • design of additional adapters for load generation at application (e.g., SIP), transport (e.g., RTP), or network (e.g., IPv6) service interfaces, • extension of the trace-based modelling function in the specification technique by the capability of directly reading and writing the packet traces in the PCAP format (in order to be able to support arbitrary payload patterns from the trace), • evaluating the use of the cooperative design of user-space and kernel-space components in the UniLoG architecture in order to further improve the performance of the load generation process, in particular on the multi-core platforms.

A. Context Expression Functions The following set of functions is provided for the definition of context expressions in the UBA: Function

Min. Args

Max. Args

Result/Comment

abs(v)

1

1

Absolute value of v. abs(-4.3) returns 4.3

mod(v, d)

2

2

Remainder of v/d. mod(5.2,2.5) returns 0.2

ipart(v)

1

1

Integer part of v. ipart(3.2) returns 3

fpart(v)

1

1

Fractional part of v. fpart(3.2) returns 0.2

min(v, . . . )

1



Minimum number passed. min(3,2,-5,-2,7) returns -5

max(v, . . . )

1



Maximum number passed. max(3,2,-5,-2,7) returns 7

pow(a, b)

2

2

Value a raised to the power b. pow(3.2,1.7) returns 3.21.7

sqrt(a)

1

1

Square root of a. sqrt(16) returns 4

sin(a)

1

1

Sine of a radians. sin(1.5) returns around 0.997

sinh(a)

1

1

Hyperbolic sine of a. sinh(1.5) returns around 2.129

asin(a)

1

1

Arc-sine of a in radians. around 0.524

cos(a)

1

1

Cosine of a radians. cos(1.5) returns around 0.0707

cosh(a)

1

1

Hyperbolic cosine of a. around 2.352

acos(a)

1

1

Arc-cosine of a in radians. acos(0.5) returns around 1.047

tan(a)

1

1

Tangent of a radians. tan(1.5) returns around 14.101

tanh(a)

1

1

Hyperbolic tangent of a. around 0.905

© Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3

asin(0.5) returns

cosh(1.5) returns

tanh(1.5) returns

288

A. Context Expression Functions

Function

Min. Args

Max. Args

Result/Comment

atan(a)

1

1

Arc-tangent of a in radians. atan(0.3) returns about 0.291

atan2(y, x)

2

2

Arc-tangent of y/x, with quadrant correction. atan2(4,3) returns about 0.927

log(a)

1

1

Base 10 logarithm of a. log(100) returns 2

pow10(a)

1

1

10 raised to the power of a. pow10(2) returns 100

ln(a)

1

1

Base e logarithm of a. ln(2.8) returns around 1.030

exp(a)

1

1

e raised to the power of a. around 7.389

logn(a, b)

2

2

Base b logarithm of a. logn(16,2) returns 4

ceil(a)

1

1

Rounds a up to the nearest integer. ceil(3.2) returns 4

floor(a)

1

1

Rounds a down to the nearest integer. floor(3.2) returns 3

rand()

0

0

Returns a uniformly distributed number between 0 up to but not including 1.

random(a, b)

2

2

Returns a uniformly distributed number between a up to and including b.

srand(a)

1

1

Seeds the random number generator with a value. Return value is unknown

randomize()

0

0

Seed the random number generator with a value based on the current time. Return value is unknown

deg(a)

1

1

Returns a radians converted to degrees. deg(3.14) returns around 179.909

rad(a)

1

1

Returns a degrees converted to radians. rad(180) returns around 3.142

if(c, t, f)

3

3

Evaluates and returns t if c is not 0.0. Else evaluates and returns f. if(0.1, 2.1, 3.9) returns 2.1

select(c, n, z[,p])

3

4

Returns n if c is less than 0.0. Returns z if c is 0.0. If c is greater than 0.0 and only three arguments were passed, returns z. If c is greater than 0.0 and four arguments were passed, returns p. select(3, 1, 4, 5) returns 5

exp(2) returns

A. Context Expression Functions

289

Function

Min. Args

Max. Args

Result/Comment

equal(a, b)

2

2

Returns 1.0 if a is equal to b. Else returns 0.0. equal(3,2) returns 0.0

above(a, b)

2

2

Returns 1.0 if a is above b. Else returns 0.0. above(3,2) returns 1.0

below(a, b)

2

2

Returns 1.0 if a is below b. Else returns 0.0. below(3,2) returns 0.0

avg(a, . . . )

1



Returns the average of the values passed. avg(3,3,6) returns 4

clip(v, min, max)

3

3

Clips v to the range from min to max. If v is less than min, it returns min. If v is greater than max it returns max. Otherwise it returns v. clip(3,1,2) returns 2

clamp(v, min, max)

3

3

Clamps v to the range from min to max, looping if needed. clamp(8.2,1.3,4.7) returns 1.4

poly(x, c1, . . . )

2



This function calculates the polynomial. x is the value to use in the polynomial. c1, c2, and on are the coefficients. poly(4,6,9,3,1,4) returns 2168 same as 6 ∗ 44 + 9 ∗ 43 + 3 ∗ 42 + 1 ∗ 41 + 4 ∗ 4 0

and(a, b)

2

2

Returns 0.0 if either a or b are 0.0. Else returns 1.0. and(2.1,0.0) returns 0.0

or(a, b)

2

2

Returns 0.0 if both a and b are 0.0. Else returns 1.0. or(2.1,0.0) returns 1.0

not(a)

1

1

Returns 1.0 if a is 0.0. not(0.3) returns 0.0

for(init, test, inc, a1, ...)

4



This function acts like a for loop in C. First init is evaluated. Then test is evaluated. As long as the test is not 0.0, the action statements (a1 to an) are evaluated, the inc statement is evaluated, and the test is evaluated again. The result is the result of the final action statement. for(x=0,below(x,11),x=x+1,y=y+x) returns 55.0 (if y was initially 0.0)

many(expr, . . . )

1



This function treats many subexpressions as a single object (function). It is mainly for the ’for’ function. for(many(j=5,k=1), above(j*k,0.001), many(j=j+5,k=k/2), 0)

Else returns 0.0.

Bibliography [3GPP2]

3rd Generation Partnership Project 2 (“3GPP2”), 3GPP2 C.R1002-0 v1.0. “cdma2000 Evaluation Methodology (V14)”, Revision 1.0, June 2003, available online http://www.3gpp2. org/public_html/specs/C.R1002-0_v1.0_041221.pdf, last requested April 29, 2016.

[Aal11]

W. M. P. van der Aalst, “Process Mining : Discovery, Conformance and Enhancement of Business Processes”, Verlag Springer, Berlin, Heidelberg, 2011.

[AbS10]

A. Abhari, M. Soraya, “Workload generation for YouTube”, Multimedia Tools and Applications, Volume 46, Number 1, January 2010, pp. 91–118.

[ACC02]

C. Amza, A. Chanda, A. L. Cox, et al., “Specification and Implementation of Dynamic Web Site Benchmarks”, Proc. of the IEEE International Workshop on Workload Characterization (WWC-5), Austin, Texas, USA, November 25, 2002, pp. 3–13.

[AcP12]

G. Aceto and A. Pescap´e, “On the recent use of email through traffic and network analysis”, Performance Evaluation Review, Volume 39, Number 4, March 2012, pp. 61–70.

[AEPV04]

S. Avallone, D. Emma, A. Pescap´e, G. Ventre, “A distributed multi-platform architecture for traffic generation”, in Proc. of the 2004 International Symposium on Performance Evaluation of Computer and Telecommunication Systems, (SPECTS’04), San Jose, California, USA, July 25–29, 2004, pp. 659–670.

[AEPV05]

S. Avallone, D. Emma, A. Pescap´e, G. Ventre, “Performance evaluation of an open distributed platform for realistic traffic generation”, Int. Journal on Performance Evaluation, Volume 60, 2005, pp. 359–392.

[AlA11]

V. A. F. Almeida and J. M. Almeida, “Internet Workloads, Measurement, Characterization, and Modeling”, IEEE Internet Computing, March/April 2011, pp. 15–18.

© Springer Fachmedien Wiesbaden GmbH 2017 A. Kolesnikov, Load Modelling and Generation in IP-based Networks, DOI 10.1007/978-3-658-19102-3

292

Bibliography

[AlD94]

R. Alur, and D. L. Dill, “A theory of timed automata”, Theoretical Computer Science, Volume 126, Number 2, 1994, pp. 183–235.

[Alexa]

Alexa service, online http://s3.amazonaws.com/ alexa-static/top-1m.csv.zip, retrieved February 10, 2015.

[AnT06]

A. H. Ang, W. H. Tang, “Probability Concepts in Engineering: Emphasis on Applications to Civil and Environmental Engineering”, John Wiley & Sons, 2. Edition, March 2006.

[APF08a]

G. Antichi, A. Di Pietro, D. Ficara, S. Giordano, G. Procissi, and F. Vitucci, “BRUNO: A High Performance Traffic Generator for Network Processor”, Proc. of 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’08), Edinburgh, Scotland, UK, June 16–18, 2008, pp. 526–533.

[APF08b]

G. Antichi, A. Di Pietro, D. Ficara, S. Giordano, G. Procissi, and F. Vitucci, “Design of a High Performance Traffic Generator on Network Processor”, Proc. of the 11th EUROMICRO Conference on Digital System Design Architectures, Methods and Tools (DSD’08), Parma, Italy, September 3–5, 2008, pp. 438–441.

[APV04]

S. Avallone, A. Pescape, and G. Ventre, “Analysis and Experimentation of Internet Traffic Generator”, Proc. of the International Conference on Next Generation Teletraffic and Wired/Wireless Advanced Networking (New2an’04), St. Petersburg, Russia, February 02–06, 2004, pp. 70–75.

[ArW97]

M. Arlitt and C. Williamson, “Internet Web Servers: Workload Characterization and Performance Implications”, IEEE/ACM Transactions on Networking, Volume 5, Number 5, October 1997, pp. 631–645.

[AWL12]

A. Abdollahpouri, B. E. Wolfinger, J. Lai, and C. Vinti, “Modeling the Behaviour of IPTV Users with Application to Call Blocking Probability Analysis”, Praxis der Informationsverarbeitung und Kommunikation (PIK), Volume 35, Number 2, June 2012, pp. 75–81.

Bibliography

293

[AX4000]

Spirent Federal Systems, AX/4000 Broadband Test System, available online http://www.spirentfederal.com/IP/ Products/AX_4000/Overview/, last requested March 29, 2016.

[BAB11]

A. C. Begen, T. Akgul, and M. Baugher, “Watching Video over the Web, Part I, Streaming Protocols”, IEEE Internet Computing, March/April 2011, pp. 54–63.

[BaC98]

P. Barford, and M. Crovella, “Generating Representative Web Workloads for Network and Server Performance Evaluation”, Proc. of SIGMETRICS’98, Madison, Wisconsin, USA, June 24–26, 1998, pp. 151–160.

[BaF07]

S. Bali and V. Frost, “An algorithm for fitting MMPP to IP traffic traces”, IEEE Communications Letters, Volume 11, Number 2, 2007, pp. 207-209.

[Bai99]

G. Bai, “Load Measurements and Modelling for Distributed Multimedia Applications in High-Speed Networks”, Dissertation, Fachbereich Informatik, Universit¨ at Hamburg, 1999.

[BAS96]

Bavarian Archive for Speech Signals (BAS), online http://www.phonetik.uni-muenchen.de/forschung/bay_ arch_sprsig/index.html.

[BASK75]

F. Baskett, K. M. Chandy, R. R. Muntz, and F. G. Palacios, “Open, closed, and mixed networks of queues with different classes of customers”, Journal of the ACM, Volume 22, Number 2, April 1975, pp. 248–260.

[BBB]

“Big Buck Bunny” – free computer animated movie from the Peach open movie project, available online http://www.bigbuckbunny.org, last requested March 29, 2016.

[BBCR06]

R. Bolla, R. Bruschi, M. Canini, M. Repetto, “A High Performance IP Traffic Generation Tool Based On The Intel IXP2400 Network Processor”, book chapter in “Distributed Cooperative Laboratories: Networking, Instrumentation, and Measurements”, Springer Berlin Heidelberg, 2006, pp. 127—142.

[BBM10]

E. Brosh, S. A. Baset, V. Misra, D. Rubenstein, and H. Schulzrinne, “The Delay-Friendliness of TCP for Real-Time Traffic”, IEEE/ACM Transactions on Networking, Volume 18, Number 5, October 2010, pp. 1478–1491.

294

Bibliography

[BDP10]

A. Botta, A. Dainotti, A. Pescap´e, “Do You Trust Your SoftwareBased Traffic Generator?”, IEEE Communications Magazine, September 2010, pp. 158–165.

[BDP12]

A. Botta, A. Dainotti, A. Pescap´e, “A tool for the generation of realistic network workload for emerging networking scenarios”, Computer Networks, Volume 56, 2012, pp. 3531–3547.

[Bei13]

A. Beifuß, “Leistungs- und Pr¨azisionssteigerung des Lastgenerierungsprozesses von UniLoG unter Verwendung echtzeitf¨ordernder Maßnahmen durch das Betriebssystem”, Fachtagung des GI/GMA-Fachausschusses Echtzeitsysteme (real-time), Boppard am Rhein, November 21–22, 2013, pp. 39–48.

[BGPS05]

N. Bonelli, S. Giordano, G. Procissi, and R. Secchi, “BRUTE: A High Performance and Extensible Traffic Generator”, Proc. of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’05), Cherry Hill, New Jersey, USA, July 24–28, 2005, pp. 839–845.

[BMPR10]

R. Birke, M. Mellia, M. Petracca, D. Ross, “Experiences of VoIP traffic monitoring in a commercial ISP”, International Journal of Network Management, Volume 20, Issue 5, September 2010, pp. 339–359.

[BMS11]

M. Butkiewicz, H. V. Madhyastha, V. Sekar, “Understanding Website Complexity: Measurements, Metrics, and Implications”, Proc. of the Internet Measurements Conference IMC’11, Berlin, Germany, November 2–4, 2011, pp. 313–328.

[BMS14]

M. Butkiewicz, H. V. Madhyastha, V. Sekar, “Characterizing Web Page Complexity and Its Impact”, IEEE/ACM Transactions on Networking, Volume 22, Number 3, June 2014, pp. 943–956.

[BMW05]

S. Boyden, A. Mahanti, C. Williamson, “Characterizing the Behaviour of Real Video Streams”, in Proc. of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’05), Cherry Hill, New Jersey, USA, July 24–28, 2005, pp. 783–791.

[BPGP12]

N. Bonelli, A. Di Pietro, S. Giordano, and G. Procissi, “Flexible high performance traffic generation on commodity multi-core platforms”, Proc. of the 4th International Conference on Traffic

Bibliography

295

Monitoring and Analysis (TMA’12), LNCS 7189, 2012, pp. 157–170. [BRUNO]

BRUTE The Browny and RobUst Traffic Engine (BRUTE), available online http://netgroup.iet.unipi.it/software/ brute/, last requested April 11, 2016.

[BRUTE]

The Browny and RobUst Traffic Engine (BRUTE), available online http://netgroup.iet.unipi.it/software/brute/, last requested April 11, 2016.

[CAIDA]

The CAIDA Anonymized Internet Traces 2015 Dataset, available online http://www.caida.org/data/passive/passive_ 2015_dataset.xml, last requested June 18, 2015.

[CaM10]

M. C. Calzarossa, L. Massari, “Analysis of web logs: challenges and findings”, in Proc. of the 2010 IFIP WG 6.3/7.3 International Conference on Performance Evaluation of Computer and Communication Systems (PERFORM’10), Lecture Notes in Computer Science, Vol. 6821, Springer, pp. 227–239.

[CCG04]

J. Cao, W. Cleveland, Y. Gao, K. Jeffay, F. Smith, and M. Weigle, “Stochastic Models for Generating Synthetic HTTP Source Traffic”, Proc. of INFOCOM’04, Hong Kong, China, March 7–11, 2004, Volume 3, pp. 1546–1557.

[Cha10]

J. Charzinski, “Traffic Properties, Client Side Cachability and CDN Usage of Popular Web Sites”, Proc. of MMB & DFT 2010, Essen, March 15-17, 2010, pp. 136–150.

[Cisco1]

Cisco IOS NetFlow, online http://www.cisco.com/c/en/us/ products/ios-nx-os-software/ios-netflow/index.html, last requested April 6, 2016.

[Cisco2]

Cisco Systems, “Traffic Analysis for Voice over IP”, Technology White Paper, available online http://www.cisco.com/c/en/us/td/docs/ios/solutions_ docs/voip_solutions/TA_ISD.html, last requested April 15, 2016.

[CMCS08]

G. Casale, N. Mi, L. Cherkasova, and E. Smirni, “How to Parameterize Models with Bursty Workloads”, SIGMETRICS Performance Evaluation Review, Volume 36, Number 2, August 2008, pp. 38–44.

296

Bibliography

[CKR09]

M. Cha, H. Kwak, P. Rodriguez, Y.-Y. Ahn, and S. Moon, “Analyzing the Video Popularity Characteristics of Large-Scale User Generated Content Systems”, IEEE/ACM Transactions on Networking (TON), Volume 17, Number 5, October 2009, pp. 1357–1370.

[CMT90]

M. Calzarossa, R. Marie, K. S. Trivedi, “System performance with user behavior graphs”, Journal Performance Evaluation, Volume 11, Issue 3, September 1990, pp. 155–164.

[Con06]

J. Cong, “Load Specification and Load Generation for Multimedia Traffic Loads in Computer Networks”, Dissertation, Fachbereich Informatik, Universit¨ at Hamburg, 2006, erschienen in: B. E. Wolfinger (Hrsg.), Berichte aus dem Forschungsschwerpunkt Telekommunikation und Rechnernetze, Shaker Verlag, Aachen, 2006.

[CrB96]

M. Crovella, A. Bestavros, “Self-similarity in world wide web traffic: Evidence and possible causes”, IEEE/ACM Transactions on Networking, Volume 5, Number 6, December 1997, pp. 835–846.

[CrD05]

N. Cranley, and M. Davis, “Performance Evaluation of Video Streaming with Background Traffic over IEEE 802.11 WLAN Networks”, Proc. of WMuNeP’05, October 13, 2005, Montreal, Quebec, Canada, pp. 131–139.

[CRCM08] M. Cha, P. Rodriguez, J. Crowcroft, S. Moon, X. Amatriain, “Watching television over an IP network”, in Proc. of the 8th ACM SIGCOMM Conference on Internet Measurement (IMC’08), Vouliagmeni, Greece, October 20–22, 2008, pp. 71– 84. [CSK11]

Y. Choi, J. A. Silvester, H. Kim, “Analyzing and Modeling Workload Characteristics in a Multiservice IP Network”, IEEE Internet Computing, March/April 2011, pp. 35–42.

[DBP07]

A. Dainotti, A. Botta, and A. Pescap´e, “Do you know what you are generating?”, Proc. of the 3rd International Conference on Emerging Networking Experiments and Technologies (CoNEXT’07), Poster Session, New York, USA, December 10-13, 2007.

Bibliography

297

[DeM93]

L. Deng and J. Mark, “Parameter estimation for markov modulated poisson processes via the em algorithm with time discretization”, Telecommunication Systems, Volume 1, Number 1, 1993, pp. 321-338.

[Den95]

S. Deng, “Traffic Characteristics of Packet Voice”, Proc. of the IEEE International Conference on Communications, Volume 3, pp. 1369–1374, 1995.

[DJCME92] P. Danzig, S. Jamin, R. C´aceres, D. Mitzel and D. Estrin, “An Empirical Workload Model for Driving Wide-Area TCP/IP Network Simulations”, Wiley Journal of Internetworking: Research and Experience, Volume 3, Number 1, March 1992. [DPDK]

Data Plane Development Kit, available online http://dpdk. org/, last requested April 7, 2016.

[DPRPV08] A. Dainotti, A. Pescap´e, P. S. Rossi, F. Palmieri, G. Ventre, “Internet traffic modeling by means of Hidden Markov Models”, Computer Networks, Volume 52, 2008, pp. 2645–2662. [DRR11]

T. Dreibholz, E. P. Rathgeb, I. R¨ ungeler, R. Seggelmann, M. T¨ uxen, R. R. Stewart, “Stream Control Transmission Protocol: Past, Current, and Future Standardization Activities”, IEEE Communications Magazine, Volume 49, Number 4, pp. 82–88, 2011.

[EBB10]

G. Eason, E. Brosh, S. A. Baste, V. Misra, D. Rubenstein, H. Schulzrinne, “The Delay-Friendliness of TCP for Real-Time Traffic”, ACM/IEEE Transactions on Networking, Volume 18, Number 5, pp. 1478–1491, 2010.

[EGRSS11] J. Erman, A. Gerber, K. K. Ramakrishnan, S. Sen, and O. Spatscheck, “Over The Top Video: The Gorilla in Cellular Networks”, Proc. of the Internet Measurements Conference IMC’11, Berlin, Germany, November 2–4, 2011, 127–136. [EGRWC15] P. Emmerich, S. Gallenm¨ uller, D. Raumer, F. Wohlfart, and G. Carle, “MoonGen: A Scriptable High-Speed Packet Generator”, in Proc. of the 2015 ACM Conference on Internet Measurement (IMC’15), Tokyo, Japan, October 28–30, 2015, pp. 275–287. [EGT11]

J. Erman, A. Gerber, M. T. Hajiaghayi, D. Pei, S. Sen, and O. Spatscheck, “To Cache or Not to Cache – The 3G Case”,

298

Bibliography IEEE Internet Computing, Volume 15, Number 2, March/April 2011, 27–34.

[ENW96]

A. Erramilli, O. Narayan, and W. Willinger, “Experimental queueing analysis with long-range dependent packet traffic”, IEEE/ACM Transactions On Networking, Volume 4, Number 2, April 1996, pp. 209–223.

[FCS08]

N. Fonseca, M. Crovella, K. Salamatian, “Long range mutual information”, ACM SIGMETRICS Performance Evaluation Review, Volume 36, Issue 2, September 2008, pp. 32–37.

[Fer72]

D. Ferrari, “Workload Characterization and Selection in Computer Performance Measurement”, IEEE Computer, Volume 5, Number 4, July 1972, pp. 18–24.

[Fer84]

D. Ferrari, “On the Foundations of Artificial Workload Design”, ACM SIGMETRICS Performance Evaluation Review, Volume 12, Number 3, August 1984, pp. 8–14.

[FGB03]

W. Feng, A. Goel, A. Bezzaz, W. Feng, J. Walpole, “TCPivo: A High-Performance Packet Replay Engine”, in Proc. of the ACM SIGCOMM 2003 Workshop on Models, Methods, and Tools for Reproducible Network Research (MoMeTools’03), Karlsruhe, Germany, August 25–27, 2003, pp. 57–64.

[FGV06]

A. M. Faber, M. Gupta, C. Viecco, “Revisiting Web Server Workload Invariants in the Context of Scientific Web Sites”, in the Proc. of the 2006 ACM/IEEE Conference on Supercomputing, Tampa, FL, USA, November 2006, article number 110.

[FlP01]

S. Floyd and V. Paxson, “Difficulties in Simulating the Internet”, IEEE/ACM Transactions on Networking, Volume 9, Number 4, August 2001, pp. 392–403.

[G.711]

G.711 : Pulse code modulation (PCM) of voice frequencies, ITU Recommendation G.711, online http://www.itu.int/ rec/T-REC-G.711/en.

[G.723.1]

G.723.1 : Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s, ITU Recommendation G.723.1, online http://www.itu.int/rec/T-REC-G.723.1/e.

[G.729.1]

G.729.1 : G.729-based embedded variable bit-rate coder: An 8-32 kbit/s scalable wideband coder bitstream interoperable

Bibliography

299

with G.729, ITU Recommendation G729.1, online http://www. itu.int/rec/T-REC-G.729.1/. [Gah08]

A. Gahr, “Bereitstellung und Einsatz von Modellen und Werkzeugen zur Erzeugung realit¨ atsnaher synthetischer Webserver-Lasten”, Diploma Thesis, Computer Science Dept., University of Hamburg, 2008.

[GaW94]

M. W. Garrett, W. Willinger, “Analysis, Modelling and Generation of Self-Similar VBR Video Traffic”, ACM SIGCOMM Computer Communication Review, Volume 24, Issue 4, October 1994, pp. 269–280.

[GCR01]

S. Gadde, J. Chase, and M. Rabinovich, “Web Caching and Content Distribution; A View from the Interior”, Computer Communications, Volume 24, Number 2, 2001, pp. 222–231.

[GJCG13]

S. Gramatikov, F. Jaureguizar, J. Cabrera, N. Garc´ıa, “Stochastic modelling of peer-assisted VoD streaming in managed networks”, Computer Networks, Volume 57, Number 9, June 2013, pp. 2058–2074 .

[GJR11]

V. Gopalakrishnan, R. Jana, K. K. Ramakrishnan, D. F. Swayne, and V. A. Vaishampayan, “Understanding Couch Potatoes: Measurement and Modelling of Interactive Usage of IPTV at large scale”, Proc. of the 11th ACM SIGCOMM Conference on Internet Measurement (IMC’11), Berlin, Germany, November 2–4, 2011.

[GrS05]

C. Grimm and G. Schl¨ uchtermann, “Verkehrstheorie in IPNetzen: Modelle, Berechnungen, statistische Methoden”, 1. Auflage, Bonn, H¨ uthig, 2005.

[GSMAMR] 3GPP Specification series, ANSI-C code for the Adaptive Multi Rate (AMR) speech codec, online, http://www.3gpp. org/DynaReport/26-series.htm. [GTC06]

L. Guo, E. Tan, S. Chen, Z. Xiao, O. Spatscheck, X. Zhang, “Delving into Internet Streaming Media Delivery: A Quality and Resource Utilization Perspective”, Proc. of the Internet Measurements Conference (IMC’06), October 25–27, 2006, Rio de Janeiro, Brazil.

300

Bibliography

[Had06]

L. Hadji, “A Unified Load Generator for Geographically Distributed Generation of Network Traffic”, Master Thesis, Dalarna University, Sweden, October 2006.

[Has06]

G. Hasslinger, “Validation of Gaussian Traffic Modeling Using Standard IP Measurement”, Proc. of the 13th GI/ITG Conference on Measurement, Modelling and Evaluation of Computer and Communication Systems (MMB’06), N¨ urnberg, Germany, March 27–29, 2006, pp. 1–16.

[Hec11]

S. Heckm¨ uller, “Einsatz von Lasttransformationen und ihre Invertierung zur realit¨ atsnahen Lastmodellierung in Rechnernetzen”, Dissertation, Fakult¨at f¨ ur Mathematik, Informatik und Naturwissenschaften, Universit¨at Hamburg, 2011, erschienen in: B. E. Wolfinger (Hrsg.), Berichte aus dem Forschungsschwerpunkt Telekommunikation und Rechnernetze, Band 7, ShakerVerlag, Aachen, 2011.

[Hef80]

H. Heffes, “A Class of Data Traffic Processes – Covariance Function Characterization and Relating Queueing Results”, Bell Systems Technical Journal, Volume 59, 1980, pp. 437–488.

[HeL86]

H. Heffes and D. Lucantoni, “A markov modulated characterization of packetized voice and data traffic and related statistical multiplexer performance”, IEEE Journal on Selected Areas in Communications, Volume 4, Number 6, 1986, pp. 856–868.

[HGB06]

H. Hassan, J. M. Garcia, and C. Bockstal, “Aggregate Traffic Models for VoIP Applications”, Proc. of the International Conference on Digital Telecommunications (ICDT’06), Cap Esterel, Cote d’Azur France, August 29–31, 2006, pp. 70-75.

[HHCW10] T. Y. Huang, P. Huang, K. T. Chen, P. J Wang, “Could Skype Be More Satisfying? A QoE-Centric Study of the FEC Mechanism in an Internet-Scale VoIP System”, IEEE Network, March/April 2010, pp. 42–48. [HLR07]

C. Huang, J. Li, and K. W. Ross, “Can Internet Video-onDemand Be Profitable?”, Proc. of the 2007 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM’07), Kyoto, Japan, August 27–31, 2007, pp. 133–144.

Bibliography

301

[HMS00]

D. Hand, H. Mannila, and P. Smyth, “Principles of Data Mining”, MIT Press, 2000.

[HoW06]

S. Hong, S. F. Wu, “On Interactive Internet Traffic Replay”, Chapter in “Recent Advances in Intrusion Detection”, Lecture Notes in Computer Science Series, Volume 3858, 2006, pp. 247–264.

[ICDD00]

A. Iyengar, J. Challenger, D. Dias, P. Dantzig, “HighPerformance Web Site Design Techniques”, Journal IEEE Internet Computing, Volume 4, Issue 2, March 2000, pp. 17–26.

[IEEE1588] Precision Time Protocol (PTP), online http://www.ieee1588. com/index.html, retrieved September 23, 2014. [IEEE802.16] IEEE 802.16 Broadband Wireless Access Working Group, “Multi-hop Relay System Evaluation Methodology (Channel Model and Performance Metric)”, proposal submitted 19.02.2007, online http://www.ieee802.org/16/relay/docs/ 80216j-06_013r3.pdf, last requested April 29, 2016. [IEEE802.3x] IEEE Standards for Local and Metropolitan Area Networks: Supplements to Carrier Sense Multiple Access With Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications - Specification for 802.3 Full Duplex Operation and Physical Layer Specification for 100 Mbit/s Operation on Two Pairs of Category 3 Or Better Balanced Twisted Pair Cable (100BASE-T2). Institute of Electrical and Electronics Engineers, ISBN 1-55937-905-7, 1997. [IhP11]

S. Ihm and V. S. Pai, “Towards Understanding Modern Web Traffic”, Proc. of the Internet Measurements Conference IMC’11, Berlin, Germany, November 2–4, 2011, pp. 295–312.

[Int14]

ARK - Intel Product Information, Desktop Processors, online http://ark.intel.com/DesktopProducts, retrieved September 22, 2014.

[iPerf3]

iPerf - The network bandwidth measurement tool, Active measurements in TCP, UDP and SCTP, online https://iperf.fr/, last requested April 17, 2016.

[IPv6S]

Google IPv6 statistics, online http://www.google.com/intl/ en/ipv6/statistics.html, retrieved January 27, 2015.

302

Bibliography

[IXIA]

Ixia Optixia 40- and 100-Gigabit Ethernet Test Module, available online http://www.ixiacom.com/products/ 40100-gbe-load-modules, last requested March 29, 2016.

[IXIA2]

“The World’s First 400GbE Test Solution”, Ixia 400GbE Load Modules, available online http://www.ixiacom.com/ products/400gbe-load-modules, last requested March 29, 2016.

[IXP2400]

“Intel IXP2400 Network Processor: Flexible, High-Performance Solution for Access and Edge Applications”, available online http://www.intel.com/design/network/papers/ixp2400. pdf, last requested April 11, 2016.

[Jah10]

S. Jahnke, “Last-/Verkehrsmessungen und realit¨atsnahe Lastgenerierung f¨ ur Web-Server-Zugriffe”, Diploma Thesis, University of Hamburg, 2010.

[Jai91]

R. Jain, “The Art of Computer Systems Performance Analysis – Techniques for Experimental Design, Measurement, Simulation, and Modeling”, Wiley-Interscience, 1991.

[JSZ14]

J. Jiang, V. Sekar, and H. Zhang, “Improving Fairness, Efficiency, and Scalability in HTTP-Based Adaptive Video Streaming With Festive”, IEEE/ACM Transactions on Networking, Volume 22, Number 1, February 2014, pp. 326–340.

[KAB08]

N. McKeown, T. Anderson, H., Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “OpenFlow: enabling innovation in campus networks”, in ACM SIGCOMM Computer Communication Review, Volume 38, Issue 2, April 2008, pp. 69–74.

[KASS14]

A. van Kesteren, J. Aubourg, J. Song, H. R. M. Steen, “XMLHttpRequest Level 1, W3C Working Draft 30 January 2014”, W3C, available online http://www.w3.org/TR/ XMLHttpRequest/, last access May 3, 2015.

[KHF06]

E. Kohler, M. Handley, S. Floyd, “Congestion Control without Reliability”, Proc. of the ACM SIGCOMM’06, Pisa, Italy, September 11–15, 2006, pp. 27–38.

[KiF08]

E. Kienzle, and J. Friedrich, “Programmierung von Echtzeitsystemen”, Carl Hanser Verlag, November 2008.

Bibliography

303

[Kin90]

P. J. B. King, “Computer and Communication Systems Performance Modelling”, Prentice Hall International (UK) Ltd, 1990.

[KLL03]

A. Klemm, C. Lindemann, M. Lohmann, “Modeling IP traffic using the batch Markovian arrival process”, Performance Evaluation Journal, Volume 54, Number 2, 2003, pp. 149–173.

[KoB13]

A. Kolesnikov, A. Beifuss, “UniLoG: Lastgenerierung unter Verwendung eines Echtzeitbetriebssystems”, in: B. E. Wolfinger, K. D. Heidtmann (Hrg.), Leistungs-, Zuverl¨ assigkeits- und Verl¨asslichkeitsbewertung von Kommunikationsnetzen und verteilten Systemen, 7. GI/ITG-Workshop MMBnet 2013, 5.6. September 2013, Hamburg, Bericht 299 des Fachbereichs Informatik an der Universit¨at Hamburg, 2013, pp. 93–94.

[KoK09]

A. Kolesnikov, M. Kulas, “Lastgenerierung an IP-Schnittstellen mit dem UniLoG.IP-Adapter”, in: B. E. Wolfinger, K. D. Heidtmann (Hrg.), Leistungs-, Zuverl¨ assigkeits- und Verl¨asslichkeitsbewertung von Kommunikationsnetzen und verteilten Systemen, 5. GI/ITG-Workshop MMBnet 2009, 10.-11. September 2009, Hamburg, Bericht 287 des Departments Informatik an der Universit¨at Hamburg, 2009, pp. 24–35.

[KoK10]

A. Kolesnikov and M. Kulas, “Load Modeling and Generation for IP-Based Networks: A Unified Approach and Tool Support”, Proc. of the 15th International GI/ITG Conference on Measurement, Modeling and Evaluation of Computing Systems, and Dependability and Fault Tolerance (MMB & DFT 2010), Essen, Germany, March 15–17, 2010, pp. 91–106.

[Kol12]

A. Kolesnikov “UniLoG: A Unified Load Generation Tool”, Proc. of the 16th International GI/ITG Conference on “Measurement, Modelling and Evaluation of Computing Systems and Dependability and Fault Tolerance”, MMB & DFT 2012, Kaiserslautern (Germany), March 19 – 21, 2012, pp. 253–257.

[K¨on12]

H. K¨onig, “Protocol Engineering”, Springer-Verlag Berlin Heidelberg, 2012.

[KoW11]

A. Kolesnikov and B. E. Wolfinger, “Web Workload Generation According to the UniLoG Approach”, Proc. of the 17th GI/ITG Conference on Communication in Distributed Systems, Kiel, Germany, March 8–11, 2011, pp. 49–60.

304

Bibliography

[KrH95]

M. Krunz, H. Hughes, “A traffic model for MPEG-coded VBR streams”, ACM SIGMETRICS Performance Evaluation Review, Volume 23, Issue 1, May 1995, pp. 47–55.

[KRL08]

R. El Abdouni Khayari, M. R¨ ucker, A. Lehmann, and A. Musovic, “ParaSynTG: A Parameterized Synthetic Trace Generator for Representation of WWW Traffic”, Proc. of 2008 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’08), Edinburgh, Scotland, UK, June 16-18, 2008, pp. 317–323.

[KUTE]

KUTE – Kernel-based Traffic Engine, online http://caia. swin.edu.au/genius/tools/kute/, last access January 19, 2016.

[K¨ uT06]

S. K¨ unzli and L. Thiele, “Generating Event Traces Based on Arrival Curves”, in Proc. of the 13th GI/ITG Conference on Measuring, Modelling and Evaluation of Computer and Communications Systems (MMB’06), N¨ urnberg, Germany, March 27–29, 2006, pp. 81–98.

[KWK09]

A. Kolesnikov, B. E. Wolfinger, and M. Kulas, “UniLoG – ein System zur verteilten Lastgenerierung in Netzen”, Fachtagung des GI/GMA-Fachausschusses Echtzeitsysteme (real-time), Boppard am Rhein, November 19–20, 2009, pp. 11–20.

[LAJ07]

L. Le, J. Aikat, K. Jeffay, and F. D. Smith, “The Effects of Active Queue Management and Explicit Congestion Notification on Web Performance”, IEEE/ACM Transactions On Networking, Volume 15, Number 6, December 2007, pp. 1217–1230.

[LBFE09]

G. Y. Lazarou, J. Baca, V. S. Frost, J. B. Evans, “Describing Network Traffic Using the Index of Variability”, IEEE/ACM Transactions On Networking, Volume 17, Number 5, October 2009, pp. 1672–1683.

[LCWT10] H. Luo, S. Ci, D. Wu, H. Tang, “End-to-end optimized TCPfriendly rate control for real-time video streaming over wireless multi-hop networks”, Journal on Visual Communication and Image Representation, Volume 21, Number 2, pp.98–106, 2010. [Lea10]

N. Leavitt, “Network-Usage Changes Push Internet Traffic to the Edge”, IEEE Computer, Volume 43, Number 10, October 2010, pp. 13–15.

Bibliography

305

[LeC08]

S. Lee, and K. Chung, “Buffer-driven adaptive video streaming with TCP-friendliness”, Computer Communications, Volume 31, 2008, pp. 2621–2630.

[LFJ97]

B. O. Lee, V. S. Frost, and R. Jonkman, “NetSpec Source Models for telnet, ftp, voice, video, and WWW traffic”, technical report ITTC-TR-10980-19, January 1997, available online www.ittc. ku.edu/netspec/usage/traffic_netspec.ps, last requested April 15, 2016.

[LHC15]

C.,F. Lai, R. H. Hwang, H. C. Chao, M. M. Hassan, and A. Alamri, “A Buffer-Aware HTTP Live Streaming Approach for SDN-Enabled 5G Wireless Networks”, IEEE Network, January/February 2015, pp. 49–55.

[LMN90]

D. Lucantoni, K. Meier-Hellstern, M. Neuts, “A single server queue with server vacations and a class of non-renewal arrival processes”, Advances in Applied Probability, Volume 22, 1990, pp. 676–705.

[loadstorm] LoadStorm PRO, A cloud load testing tool, online http:// loadstorm.com/pro/, last requested April 18, 2016. [Low03]

S. H. Low, “A Duality Model of TCP and Queue Management Algorithms”, IEEE/ACM Transactions On Networking, Volume 11, Number 4, August 2003, pp. 525–536.

[LTWW94] W. Leland, M. Taqqu, W. Willinger, D. Wilson, “On the self-similar nature of Ethernet traffic (extended version)”, IEEE/ACM Transactions On Networking, Volume 2, Number 1, February 1994, pp. 1–15. [Luc91]

D. Lucantoni, “New results on the single server queue with a batch Markovian arrival process”, Stochastic Models, Volume 7, Number 1, 1991, pp. 1–46.

[MAD04]

D. Menasc´e, V. A. F. Almeida, and L. W. Dowdy, “Performance by Design – Computer Capacity Planning by Example”, Prentice Hall, 2004.

[MAWI]

MAWI Working Group Traffic Archive, available online http: //mawi.wide.ad.jp/mawi/, last requested March 29, 2016.

[MBM09]

M. Menth, A. Binzenh¨ofer, and S. M¨ uhleck, “Source Models for Speech Traffic Revisited”, IEEE/ACM Transactions on

306

Bibliography Networking (ToN), Volume 17, Number 4, August 2009, pp. 1042–1051.

[Men03]

D. A. Menasc´e, “Workload Characterization”, IEEE Internet Computing, September/October 2003, pp. 89–92.

[MeV00]

D. A. Menasc´e, A. F. A. Virgilio, “Scaling for E-Business: Technologies Models Performance and Capacity Planning”, Prentice Hall PTR, Upper Saddle River, NJ, USA, 2000.

[Mey15]

D. Meyer, “University of Oregon Route Views Archive Project”, online http://archive.routeviews.org/, last access June 24, 2015.

[MGEN]

Multi-Generator (MGEN), U.S. Naval Research Laboratory, Networks and Communication Systems Branch, online http:// www.nrl.navy.mil/itd/ncs/products/mgen, last access January 19, 2016.

[MiG98]

V. Misra, W. B. Gong, “A Hierarchical Model for Teletraffic”, in Proc. of the 37th IEEE Conference on Decision & Control, Tampa, Florida, USA, December 1998, pp. 1674–1679.

[Mil12]

D. L. Mills, “Executive Summary: Computer Network Time Synchronization”, available online at http://www.eecis.udel. edu/~mills/exec.html, last updated May 12, 2012, retrieved October 14, 2014.

[MJS08]

A. Milani, J. Jass´o, S. Suriani, “Soft User Behavior Modeling in Virtual Environments”, in Proc. of the 3rd International Conference on Convergence and Hybrid Information Technology (ICCIT’08), Busan, Korea, November 11–13, 2008, pp. 1182– 1187.

uhleck, “MMC transition [MMC-TR] M. Menth, A. Binzenh¨ofer, and S. M¨ matrices for Source Models for Speech Traffic Revisited”, online, http://www3.informatik.uni-wuerzburg.de/TR/mmc/. [MMV05]

M. Meiss, F. Menczer, and A. Vespignani, “On the Lack of Typical Behavior in the Global Web Traffic Network”, Proc. of the 14th International World Wide Web Conference (WWW’05), Chiba, Japan, May 10–14, 2005, pp. 510–518.

[MPEG4IP] MPEG4IP - Open Streaming Video and Audio, available online http://mpeg4ip.sourceforge.net/.

Bibliography

307

[MSDN1]

“Acquiring high-resolution time stamps”, Microsoft Developer Network (MSDN), online http://msdn.microsoft. com/en-us/library/windows/desktop/dn553408(v=vs.85) .aspx, retrieved September 9, 2014.

[MSDN2]

“Scheduling Priorities”, Microsoft Developer Network http://msdn.microsoft.com/en-us/ (MSDN), online library/ms685100(VS.85).aspx, retrieved September 20, 2014.

[MSDN3]

“TCP/IP Raw Sockets”, Microsoft Developer Network (MSDN), online https://msdn.microsoft.com/en-us/library/ rewindows/desktop/ms740548\%28v=vs.85\%29.aspx, trieved January 29, 2015.

[MSDN4]

“IWebBrowser2 interface”, Microsoft Developer Network https://msdn.microsoft.com/en-us/ (MSDN), online library/aa752127\%28v=vs.85\%29.aspx, retrieved May 12, 2015.

[MSKGE11] G. Mann, M. Sandler, D. Krushevskaja, S. Guha, E. Even-Dar, “Modeling the parallel execution of black-box services”, Proc. of the 3rd USENIX conference on Hot topics in cloud computing (HotCloud’11), 2011, pp. 20–25. [MSS05]

M. Mandjes, I. Saniee, and A. L. Stolyar, “LoadCharacterization and Anomaly Detection for Voice Over IP Traffic”, IEEE Transactions on Neural Networks, Volume 16, Number 5, September 2005, pp. 1019–1026.

[Napatech] Napatech Network Management Solutions, online http:// www.napatech.com/solutions/network-management, last requested April 16, 2016. [netmap]

L. Rizzo, “netmap - the fast packet I/O framework”, available online http://info.iet.unipi.it/~luigi/netmap/, last requested April 13, 2016.

[netperf]

Netperf Homepage, online http://www.netperf.org/ netperf/, last requested April 17, 2016.

[NIST13]

NIST/SEMATECH e-Handbook of Statistical Methods, http: //www.itl.nist.gov/div898/handbook/, retrieved May 02, 2014.

308

Bibliography

[ntop1]

ntop, “Introducing PF RING DNA (Direct NIC Access)”, available online http://www.ntop.org/pf_ring/ introducing-pf_ring-dna-direct-nic-access/, last requested April 14, 2016.

[ntop2]

ntop, “Building a 10 Gbit Traffic Generator using PF RING and Ostinato”, available online http://www.ntop.org/pf_ring/ building-a-10-gbit-traffic-generator-using-pf_ ring-and-ostinato/, last requested April 14, 2016.

[Odv15]

J. Odvarko, “HTTP archive specification”, online http://www. softwareishard.com/blog/har-12-spec/, retrieved May 26, 2015.

[OhC05]

R. Ohri, E. Chlebus, “Measurement Based E-mail Traffic Characterization”, in Proc. of the 2005 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS’05), Cherry Hill, New Jersey, USA, July 24–28, 2005, pp. 761–771.

[OSPG05]

R. Pe˜ na-Ortiz, J. Sahuquillo, A. Pont, J. A. Gil, “Modeling users’ dynamic behaviour in e-business environments using navigations”, International Journal of Electronic Business (IJEB), Volume 3, Number 3-4, 2005, pp. 225—242.

[OSPG09]

R. Pe˜ na-Ortiz, J. Sahuquillo, A. Pont, J. A. Gil, “Dweb model: Representing Web 2.0 dynamism”, Computer Communications, Volume 32, Number 6, April 2009, pp. 1118–1128.

[PaF95]

V. Paxson and S. Floyd, “Wide area traffic: the failure of Poisson modeling”, IEEE/ACM Transactions on Networking, Volume 3, Number 3, 1995, pp. 226–244.

[Pax94]

V. Paxson, “Empirically derived analytic models of wide-area TCP connections”, IEEE/ACM Transactions on Networking, Volume 2, Issue 4, August 1994, pp. 316 – 336.

[PDT09]

R. S. Prasad, C. Dovrolis, and M. Thottan, “Router Buffer Sizing for TCP Traffic and the Role of the Output/Input Capacity Ratio”, IEEE/ACM Transactions on Networking, Volume 17, Number 5, October 2009, pp. 1645–1658.

[PEA05]

P. Pragtong, T. Erke, and K. Ahmed, “Analysis and Modeling of VoIP Conversation Traffic in the Real Network”, Proc. of the 5th

Bibliography

309 International Conference on Information, Communications and Signal Processing (ICICS’05), Bangkok, Thailand, December 6–9, 2005, pp. 388-392.

[PFFG06]

M. Paredes-Farrera, M. Fleury, M. Ghanbari, “Precision and accuracy of network traffic generators for packet-by-packet traffic analysis”, in Proc. of the 2nd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (TRIDENTCOM 2006), Barcelona, Spain, 2006, pp. 32–37.

[PMW10]

N. Parvez, A. Mahanti, and C. Williamson, “An Analytic Throughput Model for TCP NewReno”, IEEE/ACM Transactions on Networking, Volume 18, Number 2, April 2010, pp. 448–461.

[QGL09]

T. Qiu, Z. Ge, S. Lee, J. Wang, J. Xu, and Q. Zhao, “Modeling User Activities in a Large IPTV System”, Proc. of the 9th ACM SIGCOMM Conference on Internet Measurement (IMC’09), Chicago, Illinois, USA, November 4–6, 2009, pp. 430–441.

[Rab89]

L. R. Rabiner, “A tutorial on Hidden Markov Models and selected applications in speech recognition”, Proceedings of the IEEE, Volume 77, Number 2, 1989, pp. 257–285.

[RBV12]

H. Riiser, H. S. Bergsaker, P. Vigmostad, P. Halvorsen, C. Griwodz, “A Comparison of Quality Scheduling in Commercial Adaptive HTTP Streaming Solutions on a 3G Network”, Proc. of the MoVid’12, February 24, 2012, Chaper Hill, North Carolina, USA, pp. 25–30.

[RCW12]

A. Rajabi, D. R. Cheriton, J. W. Wong, “MMPP Characterization of Web Application Traffic”, in Proc. of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’2012), Washington, DC, USA, August 7–9, 2012, pp. 107–114.

[RDC11]

L. Rizzo, L. Deri, and A. Cardigliano, “10 Gbit/s Line Rate Packet Processing Using Commodity Hardware: Survey and new Proposals”, available online at http://luca.ntop.org/ 10g.pdf.

310

Bibliography

[Rec12]

J. Rech, “Wireless LANs: 802.11-WLAN-Technologie und praktische Umsetzung im Detail”, 4th Edition, Hannover, Heise, 2012.

[RFC791]

J. Postel, “Internet Protocol”, RFC 791, September 1981.

[RFC792]

J. Postel, “Internet Control Message Protocol”, RFC 792, September 1981.

[RFC793]

J. Postel, “Transmission Control Protocol”, RFC 793, September 1981.

[RFC1122] R. Braden, “Requirements for Internet Hosts – Communication Layers”, RFC 1122, October 1989. [RFC1962] D. Rand, “The PPP Compression Control Protocol (CCP)”, RFC 1962, Juni 1996. [RFC2003] C. Perkins, “IP Encapsulation within IP”, RFC 2003, October 1996. [RFC2326] H. Schulzrinne, A. Rao, and R. Lanphier, “Real Time Streaming Protocol (RTSP)”, RFC 2326, April 1998. [RFC2474] K. Nichols, S. Blake, F. Baker, and D. Black, “Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers”, RFC 2474, December 1998. [RFC2616] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, “Hypertext Transfer Protocol – HTTP/1.1”, RFC 2616, June 1999. [RFC2960] R. Stewart, Q. Xie, K. Morneault, C. Sharp, H. Schwarzbauer, T. Taylor, I. Rytina, M. Kalla, L. Zhang, V. Paxson, “Stream Control Transmission Protocol”, RFC 2960, October 2000. [RFC3168] K. Ramakrishnan, S. Floyd, and D. Black, “The Addition of Explicit Congestion Notification (ECN) to IP”, RFC 3168, September 2001. [RFC3448] M. Handley, S. Floyd, J. Padhye, J. Widmer, “TCP Friendly Rate Control (TFRC): Protocol Specification”, RFC 3448, January 2003. [RFC3550] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications”, RFC 3550, July 2003.

Bibliography

311

[RFC3692] T. Narten, “Assigning Experimental and Testing Numbers Considered Useful”, RFC 3692, January 2004. [RFC3951] S. Andersen, A. Duric, H. Astrom, R. Hagen, W. Kleijn, J. Linden, “Internet Low Bit Rate Codec (iLBC)”, RFC 3951, December 2004. [RFC3954] B. Claise, “Cisco Systems NetFlow Services Export Version 9”, RFC 3954, October 2004. [RFC3984] S. Wenger, M. M. Hannuksela, T. Stockhammer, M. Westerlund, D. Singer, “RTP Payload Format for H.264 Video”, RFC 3984, February 2005. [RFC4960] R. Stewart, Ed., “Stream Control Transmission Protocol”, RFC 4960, September 2007. [RFC5246] T. Dierks, E. Rescorla, “The Transport Layer Security (TLS) Protocol Version 1.2”, RFC 3984, August 2008. [RFC5348] S. Floyd, M. Handley, J. Padhye, J. Widmer, “TCP Friendly Rate Control (TFRC): Protocol Specification”, RFC 5348, September 2008. [RFC5905] D. Mills, U. Delaware, J. Martin, J. Burbank, W. Kasch, “Network Time Protocol Version 4: Protocol and Algorithms Specification”, RFC 5905, June 2010. [RFC6176] S. Turner, T. Polk, “Prohibiting Secure Sockets Layer (SSL) Version 2.0”, RFC 6176, March 2011. [RFC6184] Y.-K. Wang, R. Even, T. Kristensen, R. Jesup, “RTP Payload Format for H.264 Video”, RFC 6184, May 2011. [RFC6864] J. Touch, “Updated Specification of the IPv4 ID Field”, RFC 6864, February 2013. [RLB11]

A. Rao, Y. Lim, C. Barakat, A. Legout, D. Towsley, W. Dabbous, “Network Characteristics of Video Streaming Traffic”, Proc. of the ACM CoNEXT 2011, December 6–9, 2011, Tokyo, Japan, Article No. 25.

[ROB01]

J. W. Roberts, “Traffic Theory and the Internet”, IEEE Communications Magazine, Volume 39, Number 1, January 2001, pp. 94–99.

312

Bibliography

[Ros95]

O. Rose, “Statistical properties of MPEG video traffic and their impact on traffic modeling in ATM systems”, in Proc. of the 20th Conference on Local Computer Networks, Minneapolis, MN, USA, October 16–19, 1995, pp. 397–406.

[RRB07a]

C. Rolland, J. Ridoux, and B. Baynat, “Catching IP traffic burstiness with a lightweight generator”, Proc. IFIP NETWORKING’07, Atlanta, May 14-18, 2007, pp. 924–934.

[RRB07b]

C. Rolland, J. Ridoux, and B. Baynat, “LiTGen, a lightweight traffic generator: application to P2P and mail wireless traffic”, in Proc. of the 8th International Conference on Passive and Active Network Measurement, PAM 2007, Louvain-la-neuve, Belgium, April 5–6, 2007, pp. 52–62.

[RSR09]

F. Ramos, F. Song, P. Rodriguez, R. Gibbens, J. Crowcroft, and I. H. White, “Constructing an IPTV Workload Model”, Poster Session at the ACM SIGCOMM 2009 Conference (SIGCOMM 2009), Barcelona, Spain, August 17-21.

[RTOS]

On Time Informatik GmbH, Win32 API Compatible Embedded RTOS, online http://www.on-time.com/, last retrieved February 16, 2016.

[RUDE]

RUDE & CRUDE, Real-time UDP Data Emitter, online http: //rude.sourceforge.net/, last access January 19, 2016.

[SAMFU12] F. Schneider, B. Ager, G. Maier, A. Feldmann, and S. Uhlig, “Pitfalls in HTTP Traffic Measurements and Analysis”, Proc. of the 13th Passive and Active Measurement Conference (PAMS’2012), Vienna, Austria, March 13–14th, 2012, pp. 242– 251. [SaV01]

K. Salamatian, S. Vaton, “Hidden Markov Modeling for network communication channels”, in Proc. of the ACM SIGMETRICS 2001, Volume 29, 2001, pp. 92–101.

[Sch07]

M. Schwengel, “Verteilte Lastgenerierung, Architekturen, Realisierungen und Fallstudien”, VDM Verlag Dr. M¨ uller, Saarbr¨ ucken, 2007.

[SCK03]

W. Shi, E. Collins, and V. Karamcheti, “Modeling object characteristics of dynamic Web content”, Journal of Parallel and Distributed Computing, Number 63, 2003, pp. 963–980.

Bibliography

313

[Sie09]

G. Siegmund, “Technik der Netze 2. Neue Ans¨atze: SIP in IMS und NGN”, 6., v¨ollig neu bearbeitete und erweiterte Auflage, H¨ uthig Verlag Heidelberg, 2009.

[SKR07]

P. Svoboda, W. Karner, M. Rupp, “Modeling E-Mail Traffic for 3G Mobile Networks”, in Proc. of the 18th International Symposium on Personal, Indoor and Mobile Radio Communications, Athens, Greece, September 3–7, 2007, pp. 1–5.

[Skype]

Skype softphone application, online http://www.skype.com/ en/.

[SoB04]

J. Sommers and P. Barford, “Self-Configuring Network Traffic Generation”, Proc. of the 4th ACM SIGCOMM Conference on Internet Measurement (IMC’04), Taormina, Sicily, Italy, October 25–27, 2004, pp. 68–81.

[SPT07]

K. Sleurs, J. Potemans, J. Theunis, D. Li, E. Van Lil, A. Van de Capelle, “Evaluation of Network Traffic Workload Scaling Techniques”, Computer Communications, Volume 30, May 2007, pp. 3096–3106.

[SPV04]

P. Salvador, A. Pacheco, R. Valadas, “Modeling IP traffic: joint characterization of packet arrivals and packet sizes using BMAPs”, Elsevier Computer Networks, Volume 44, October 2004, pp. 335–352.

[Sri16]

P. Srivats, “OSTINATO - Packet Traffic Generator and Analyzer”, available online http://ostinato.org/, last requested April 13, 2016.

[SRS03]

U. Sarkar, S. Ramakrishnan, and D. Sarkar, “Modeling FullLength Video Using Markov-Modulated Gamma-Based Framework”, IEEE/ACM Transactions on Networking (TON), Volume 11, Number 4, August 2003, pp. 638-649.

[SrW86]

K. Sriram and W. Whitt, “Characterizing superposition arrival processes in packet multiplexers for voice and data”, IEEE Journal on Selective Areas in Communications, Volume 4, Number 6, September 1986, pp. 833–846.

[SSLL09]

M. E. Sousa-Vieira, A. Su´ arez-Gonz´ alez, J. C. L´ opez-Ardao, and C. L´opez-Garc´ıa, “Efficient On-Line Generation of the Correlation Structure of F-ARIMA Processes”, in Proc. of the 16th

314

Bibliography International Conference on Analytical and Stochastic Modeling Techniques and Applications (ASMTA 2009), Madrid, Spain, June 9–12, 2009, LNCS 5513, Springer-Verlag, Berlin Heidelberg 2009, pp. 131–143.

[StW98]

W. R. Stevens, G. R. Wright, “TCP/IP illustrated”, Volume 1: The protocols, Reading, Mass., Addison-Wesley, 1998.

[TCPrp]

“Tcpreplay - Pcap editing and replaying utilities”, online at http://tcpreplay.appneta.com/, last requested March 28, 2016.

[TPC-W]

TPC-W, a transactional web e-Commerce benchmark, available online http://www.tpc.org/tpcw/default.asp, last requested April 18, 2016.

[ViV06]

K. V. Vishwanath and A. Vahdat, “Realistic and Responsive Network Traffic Generation”, Proc. of the ACM SIGCOMM’06, Pisa, Italy, September 11–15 2006, pp. 111–122.

[ViV09]

K. V. Vishwanath and A. Vahdat, “Swing: Realistic and Responsive Network Traffic Generation”, IEEE/ACM Transactions on Networking, Volume 17, Number 3, June 2009, pp. 712–725.

[VNI15]

Cisco Visual Networking Index: Forecast and Methodology, 2014-2019 White Paper, available online at http://www.cisco.com/c/en/us/solutions/collateral/ service-provider/ip-ngn-ip-next-generation-network/ white_paper_c11-481360.html, last requested March 3, 2016.

[VYW02]

A. Vahdat, K. Yocum, K. Walsh, P. Mahadevan, D. Kostic, J. Chase, and D. Becker, “Scalability and accuracy in a largescale network emulator”, in Proc. of the 5th Symposium on Operating Systems Design and Implementation (OSDI), Boston, Massachusetts, USA, December 9–11, 2002.

[WAW05]

A. Williams, M. Arlitt, C. Williamson, and K. Barker, “Web workload characterization: Ten years later”, Springer, Heidelberg, 2005.

[Wei12]

M. Weinschenk, “Messung und Analyse von Lastcharakteristiken des Webverkehrs f¨ ur die Generierung realit¨atsnaher synthetischer Weblasten”, Bachelor Thesis, University of Hamburg, Department of Informatics, Hamburg, 2012.

Bibliography

315

[Wel11]

M. Welsh, “Measuring the mobile Web is hard”, available online http://matt-welsh.blogspot.com/2011/08/ measuring-mobile-web-is-hard.html, last retrieved Juli 22, 2015.

[WeX06]

J. Wei and C. Xu, “sMonitor: A Non-Intrusive Client-Perceived End-to-End Performance Monitor of Secured Internet Services”, Proc. of USENIX’06, Boston, May 30 - June 3, 2006, pp. 243– 248.

[WinPcap]

WinPcap – The industry standard windows packet capture library, online http://www.winpcap.org/, last access September 2, 2014.

[Wire]

WireShark – network protocol analyser, online http://www. wireshark.org, retrieved February 11, 2015.

[WITS]

“WITS: Waikato Internet Traffic Storage”, WAND Network Research Group, available online http://wand.net.nz/wits/ index.php, last requested March 29, 2016.

[WKST08] B. Wang, J. Kurose, P. Shenoy, D. Towsley, “Multimedia Streaming via TCP: An Analytic Performance Study”, ACM Transactions on Multimedia Computing, Communications and Applications, Volume 4, Number 2, Article 16, May 2008. [WoK90]

B. E. Wolfinger and J. J. Kim, “Load Measurements as a Basis for Modeling the Load of Innovative Communication Systems with Service Integration”, Proc. of the 2nd IEEE Workshop on Future Trends of Distributed Computing Systems, Cairo, Egypt, September 30–October 2, 1990, pp. 14–21.

[Wol99]

B. E. Wolfinger, “Characterization of Mixed Traffic Load in Service-Integrated Networks”, System Science Journal, Volume 25, Number 2, April 1999, pp. 65–86.

[WTSW95] W. Willinger, M. S. Taqqu, R. Sherman, D. V. Wilson, “SelfSimilarity Through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level”, in Proc. of the ACM SIGCOMM, Philadelphia, PA, USA, August 1995, pp. 100–113. [Wu12]

J. Wu, “Advances in K-means Clustering: A Data Mining Thinking”, Verlag Springer, Berlin, Heidelberg, 2012.

316

Bibliography

[WWT02]

W. Wei, B. Wang, D. Towsley, “Continuous-time Hidden Markov Models for network performance evaluation”, Performance Evaluation, Volume 49, Number 4, 2002, pp. 129–146.

[Xer13]

“Xerces-C++ XML Parser”, http://xerces.apache.org/ xerces-c/, The Apache XML Project, retrieved June 27, 2013.

[XLite]

CounterPath’s XLite 4.1 Softphone, available online http: //www.counterpath.com/x-lite.html, retrieved March 15, 2014.

[XML08]

“Extensible Markup Language (XML) 1.0 (Fifth Edition)”, http://www.w3.org/TR/REC-xml/, W3C Recommendation, November 26, 2008, retrieved June 27, 2013.

[XSD12]

“W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes”, http://www.w3.org/TR/xmlschema11-2/, W3C Recommendation, April 5, 2012, retrieved October 10, 2013.

[YZZZ06]

H. Yu, D. Zheng, B. Y. Zhao, and W. Zheng, “Understanding User Behavior in Large-Scale Video-on-Demand Systems”, Proc. of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems (EuroSys’06), Leuven, Belgium, April 18–21, 2006, 333–344.

[ZKA05]

S. Zander, D. Kennedy, G. Armitage, “KUTE - A High Performance Kernel-based UDP Traffic Engine”, Technical Report CAIA-TR-050118A, Centre for Advanced Internet Architectures (CAIA), Swinburne University of Technology, Melbourne, Australia, January 2005.

[ZNA03]

M. Zukerman, T. D. Neame, and R. G. Addie, “Internet Traffic Modeling and Future Technology Implications”, in Proc. of the 22nd Annual Joint Conference of the IEEE Computer and Communications IEEE Societies, (INFOCOM 2003), Volume 1, San Francisco, California, USA, March 30 – April 3, 2003, pp. 587–596.