Program Semantics, Specification and Verification: Theory and Applications

Workshop on Program Semantics, Specification and Verification: Theory and Applications is the leading event in Russia in

372 71 3MB

Russian Pages 87

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Program Semantics, Specification and Verification: Theory and Applications

  • Commentary
  • decrypted from 2E21561FD103BE1113E8738A7349ACE7 source file
Citation preview

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

8th International Computer Science Symposium in Russia

Fourth Workshop Program Semantics, Specication and Verication: Theory and Applications

Yekaterinburg, Russia, June 24, 2013

Proceedings

Valery Nepomniaschy, Valery Sokolov (Eds.)

Yaroslavl 2013

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

ÓÄÊ

519.68

ÁÁÊ

Â185.2ÿ43

Ñ30

Program Semantics, Specication Applications. Proceedings of the IV

and

Verication:

International

Theory

Workshop

PSSV

and 2013.

Yekaterinburg, Russia, June 24, 2013 / Edited by Valery Nepomnyaschy, Valery Sokolov.  Yaroslavl, Yaroslavl State University, 2013.  80 pages.

Workshop on Program Semantics, Specication and Verication: Theory and Applications is the leading event in Russia in the eld of applying of the formal methods to software analysis. Proceedings of the fourth workshop dedicated to formalisms for program semantics, formal models and verication, programming and specication languages, etc.

ISBN 9785839709355

c Yaroslavl State University, 2013

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Preface The volume contains the papers selected for presentation at the Fourth International Workshop on Program Semantics, Specication and Verication: Theory and Applications (PSSV 2013) aliated with the 8th International Symposium Computer Science in Russia (CSR 2013). The Workshop took place on June 24, 2013 in Yekaterinburg, Russia. The First, Second and Third Workshops (PSSV 2010, PSSV 2011 and PSSV 2012) were aliated with the 5th, 6th and 7th International Symposiums Computer Science in Russia (CSR 2010, CSR 2011 and CSR 2012) that were successfully held in Kazan, St. Petersburg and Nizhni Novgorod, Russia. The topics of the Workshop include formal models of programs and systems, methods of formal semantics of programming languages, formal specication languages, methods of deductive program verication, model checking method, static analysis of programs, formal approaches to testing and validation, program testing, analysis and verication tools. 11 papers have been submitted to PSSV 2013. Program Committee accepted 8 papers as regular ones. We are grateful to Irina Adrianova and Alexei Promsky for considerable eorts in preparing the proceedings. May 2013

Valery Nepomniaschy Valery Sokolov

iii

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Organization Program Chairs Valery Nepomniaschy

Institute of Informatics Systems, Novosibirsk, Russia

Valery Sokolov

Yaroslavl State University, Yaroslavl, Russia

Program Committee Natasha Alechina

University of Nottingham, UK

Sergey Baranov

St. Petersburg Institute for Informatics and Automation, Russia

Nina Evtushenko

Tomsk State University, Russia

Vladimir Itsykson

St. Petersburg State Polytechnic University, Russia

Andrei Klimov

Keldysh Institute of Applied Mathematics, Moscow, Russia

Vladimir Klebanov

Karlsruhe Institute of Technology, Germany

Victor Kuliamin

Institute for System Programming, Moscow, Russia

Irina Lomazova

Higher School of Economics, Moscow, Russia

Andrey Rybalchenko

Technical University, Munich, Germany

Nikolay Shilov

Institute of Informatics Systems, Novosibirsk, Russia

Vladimir Zakharov

Moscow State University, Russia

iv

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Table of Contents

Yet Another Defect Detection: Combining Bounded Model Checking and Code Contracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

One-dimensional Resource Workow Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Vladimir A. Bashkin and Irina A. Lomazova

A Formal Model and Verication Problems for Software Dened Networks . . . . . . . . . . . . . . . . . 21 E.V. Chemeritsky, R.L. Smelyansky and V.A. Zakharov

A Formal Approach for Generation of Test Scenarios Based on Guides . . . . . . . . . . . . . . . . . . . . . 31 P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

Common Knowledge in Well-structured Perfect Recall Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 N.O. Garanina

Automatic C Program Verication Based on Mixed Axiomatic Semantics . . . . . . . . . . . . . . . . . . 50 I.V. Maryasov, V.A. Nepomniaschy, A.V. Promsky and D.A. Kondratyev

An Algorithm for the Equivalence Problem in one Class of Gateway Recursive Program Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Molchanov A.E.

Verication of UCM-Specications of Distributed Systems Using Coloured Petri Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 N.V. Vizovitin, V.A. Nepomniaschy and A.A. Stenenko

v

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

vi

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Yet Another Defect Detection: Combining Bounded Model Checking and Code Contracts Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

Saint-Petersburg State Polytechnical University, Russia {akhin,belyaev}@kspt.icc.spbstu.ru [email protected] Abstract

Bounded model checking (BMC) of C/C++ programs is a research topic receiving much attention in the last years. In this work-in-progress paper we present our approach to this problem. It is based on combining several recent results in BMC, namely: use of LLVM as a baseline for model generation, employment of high-performance Z3 SMT solver to do the formula heavy-lifting, and use of various function summaries to improve analysis eciency and expressive power. We have implemented a basic prototype; experiment results on a set of simple test BMC problems are satisfactory. Bounded Model Checking, Satisability Modulo Theories, LLVM, Function Contracts, Function Summaries

Keywords:

1

Introduction

Nowadays, dierent kinds of software are widely used almost everywhere, from nuclear power plants and aircrafts to smart homes and kitchen appliances.

Errors in software may lead to

serious harms and losses, therefore the problem of software verication is of great current interest. Model checking in general and bounded model checking (BMC) in particular received a lot of attention in the last decade as one of the ways to solve this problem [10, 1, 12, 21]. This group of methods is based on exhaustive exploration of the state space for a given program to nd errors (e.g., memory leaks, violations of specication, etc.). While the theory behind these methods is well-established and complete, their practical application is somewhat limited due to several inherent problems. In this work-in-progress paper we describe our approach to two

1

of such problems .

Generation of software model for BMC

We use an approach very similar to LLBMC [21]

and extract a model from an internal LLVM compiler representation (LLVM IR) instead of a C/C++ source code itself. This allows us to work with a much better dened (both syntactically and semantically) language and use existing LLVM optimizations and analyses.

Interprocedural analysis

In classic BMC, nested function calls are fully inlined, which leads

to increased analysis complexity for most real-world programs. One of possible ways you can cope with this is by using function summaries  small descriptions of function behavior used in place of function calls. Our approach supports function summaries in three avors:

• •

Complete function behavior specications written in a special DSL called PanLang [17]. Partial function contracts inspired by ACSL [6].

1 This work was supported in part by the Ministry of Education and Science of the Russian Federation (project #8.8491.2013) within the government assignment for higher education establishments.

1

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC



Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

Function summaries created from Craig interpolants [22].

We have developed a proof-of-concept prototype based on our approach and applied it to a small set of BMC problems. Results show that, while being partially incomplete, it can be successfully used in practice. The rest of the paper is organized as follows. We give a brief outline of the frameworks used and of our approach in sections 2 and 3 respectfully. Section 4 is devoted to some of the more interesting implementation details. Experiment results are presented in section 5, together with plans for future work.

2

Basics

2.1 Bounded Model Checking Model checking is a well-known technique for verifying dierent correctness properties of nitestate systems.

In essence, model checking exhaustively explores the state space and nds

violations of a given correctness property. The main problem is that, for any large enough system, the state space will become intractably hard to explore completely. Several approaches have been proposed to deal with this problem, bounded model checking (BMC) being one of them [7]. BMC considers only a part of the state space by analyzing paths up to a given length bound, e.g., for software systems the number of loop iterations and/or nested function calls is usually used as a bound for BMC. After we create the bounded state space, it can be eciently converted to a propositional formula that is later checked for correctness using SAT or SMT procedure.

2.2 Satisability Modulo Theories Satisability Modulo Theories (SMT) problem is a well-known problem in mathematical logic and computer science [4]. The classic denition of SMT is that of a decision problem for logical formulae using predicates based on external theories expressed in classic rst-order logic with equality. The fact that brings the use of SMT to dierent areas of computer science is that it is

2 which is expressive enough to be easily converted to and

a well-known NP-complete problem

from other NP-complete problems and, at the same time, is simple enough to be manipulated by hand. This advantage led to creation of a large set of tools, known as SMT solvers, which can be used to solve SMT problems in reasonable time. In the most general case that is not possible, but the recent research in this area shows that, for a wide set of problems, SMT problems can be solved very eciently. An annual competition of solvers, called SMT-COMP [3], takes place every summer at dierent conferences and tests the best tools on dierent subsets of SMT. The most ecient SMT solvers according to SMT-COMP 2011/2012 [11] are: Z3 [16], CVC [5, 2], MathSAT [8] and OpenSMT [9].

2 Actually,

the complexity of the problem depends on the theories used and can be NP-hard or even undecid-

able for some theories.

2

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

2.3 LLVM Compiler System LLVM compiler system [19] is a robust and scalable modern compiling system that was created to replace another widely used compiler system, GCC. It is built around a well-dened intermediate code representation (LLVM IR), and all the analyses, transformations and optimizations in LLVM operate on LLVM IR. LLVM IR is a typed assembly language with inherent support for static single assignment (SSA) property [15]. It is a meta assembler with unlimited number of registers, designed to be the middle point between language constructs and machine code. LLVM supports any language that can be transformed to LLVM IR, hence LLVM-based tools can be potentially applied to a very wide range of input languages. LLVM also provides a framework for writing code analyses and optimizations in a clear and concise way as well as a simple way of using existing analyses and optimizations.

This

framework is based on so-called compiler passes that operate on LLVM IR without the need to access either of the source or machine code. To implement a BMC tool for C/C++, one needs to have a simple program model, preferably with SSA property, that isolates memory operations to as small number of operations as possible, and can express the full power of C/C++ at the same time. LLVM IR ts these requirements perfectly. Of course, there are some drawbacks, for example, providing the analysis results to the user in terms of the original source code (which is unavailable in LLVM IR) is not possible out-of-the-box, but it can be done by external means.

3

Bounded Model Checking

Our approach to bounded model checking of C programs has been largely inspired by LLBMC [21] and recent works on Craig interpolation and its applications to model checking [20, 22]. In our work we attempt to combine the best of these trends in BMC. Let us briey outline them and highlight the major similarities and dierences with our approach.

3.1 LLBMC LLBMC is a recent addition to the family of BMC tools like CBMC [10], SMT-CBMC [1] and ESBMC [12]. Like other BMC tools, it analyzes program executions only up to a nite length, by unrolling loops and inlining function calls, thus turning a generally undecidable problem of checking program correctness into a decidable one. Main distinguishing features of LLBMC are as follows:



LLBMC works with LLVM IR that allows it to use the full power of LLVM compiler infrastructure to simplify the analyzed program (e.g., by using a large number of prede-

3

ned IR optimizations). It also greatly reduces the analysis surface , as LLVM IR is very simple and concise compared to C/C++.



It employs a at byte-array memory model, that closely resembles one of C/C++ language. By using that instead of a more traditional typed memory model, LLBMC can successfully analyze programs with extensive use of low-level memory operations.

Our approach is very similar to LLBMC in that it is also based on LLVM and approximates memory as a continuous byte-array.

3 Number

of dierent operations analysis has to analyze.

3

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

The main dierence that sets it aside is that, instead of inlining function calls to create a single function that is analyzed as a whole, we employ function summaries that come in one of three avors: complete function specications, partial function contracts, or interpolation-based summaries. These summaries are then used in place of function calls to support interprocedural analysis.

Recent results show that function summaries can signicantly improve analysis

performance without any loss in precision or recall [22].

3.2 Interpolation-based Function Summaries When the user has not specied anything about a function, we use Craig interpolation

4 [14] to

derive a safe over-approximation for the input/output relation [20]. This eectively eliminates the need to analyze a function more than once, thus greatly reducing computational cost of the analysis. The main diculty that arises when using interpolation-based function summaries lies in how the target formula

B

for interpolation is selected.

For checking simple properties like

¾function returns positive integer value¿ the target formula

B = \return > 0

is self-evident,

but in other cases it might not be that easy to derive (e.g., ¾this function does not leak memory¿ in the presence of global variables).

3.3 User-dened Function Specications Support for user-dened function specications is another feature that distinguishes our approach from LLBMC. In a nutshell, function specications can be viewed as and used for the same purposes as interpolation-based summaries, the only dierence being that they are provided by the end-user.

Depending on whether we have complete or partial specication, we

can use it to either completely replace interpolation-based function summary or strengthen our Craig interpolants. To provide complete function specication, one can use PanLang DSL [17] which is a specially designed C-like language for specications. Here is an example of

malloc specication in

PanLang:

function void* malloc(unsigned size): { if (size < 0) { action defect; Result = 0; } else if (size == 0) { Result = 0; } else { Result = new Heap(size); } }; To specify partial function contracts, we use an ACSL-inspired language [6] where contracts are written as special comments, e.g.:

\\ @requires size > 0 \\ @ensures \result != \nullptr void* safe_malloc(size_t size) ... 4A and

4

Craig interpolant for

B.

A→B

is

I

such that

A → I, I → B

and

I

contains symbols common to both

A

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

Figure 1: High-level overview of the prototype

4

Implementation

We have developed a proof-of-concept prototype tool based on our approach. We use Clang for code lexing and parsing, LLVM infrastructure and passes for code analysis and transformation and Z3 SMT solver for logic reasoning.

A high-level overview of the prototype is shown in

Figure 1.

4.1 Predicate extraction First, we convert the analyzed program to LLVM IR. Dierent intrinsic functions are used to capture additional semantics which cannot be captured by LLVM IR. For example,

anno

function is used to preserve information about ACSL-like annotations. Calls to these intrinsics are processed separately from calls to regular functions. After LLVM IR conversion, for every function we extract a set of logic predicates that describe its (function's) behavior, combine them with existing function summaries and also

5

embed various assertion checks . The resulting SMT formula is then passed to Z3 solver for verication.

Let us illustrate this process on a simple

5 At

safe_malloc example (see Figures 2 and 3).

Every

the moment of writing, the prototype tool supports only pointer checks and contract assertions.

5

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

define i8* @safe_malloc(i32 %size) { entry: call void @"anno"( c"@requires (size > 0)") %0 = call i8* @"malloc"(i64 2048) %c = icmp eq i8* %0, null br i1 %c, label %if.e, label %if.t

// @requires size > 0 // @ensures \result != \nullptr void* safe_malloc(unsigned size) { void* res = malloc(size); if (res) return res; else exit(-1); } Figure 2:

safe_malloc

example in C

if.t: call void @"anno"( c"@ensures (result != 0)") ret i8* %0 if.e: call void @exit(i32 -1) unreachable } Figure 3:

safe_malloc

example in LLVM IR

6 are

regular LLVM IR instruction is converted to a single predicate, terminator instructions converted to a set of predicates corresponding to possible execution paths.

In the current

prototype every alternative (i.e., every path) is analyzed separately, therefore, after predicate extraction is complete, we get a set of possible predicate states at every instruction. Predicate states for

ret %0 and call @exit instructions for safe_malloc example are shown in Figures 4

and 5 respectively.

(

)

@R (size > 0) is true, %0 is malloc(2048), c is (%0 == ), @P c is false, \result_safe_malloc is %0

Figure 4: Predicate state at

(

) ret %0

@R (size > 0) is true, %0 is malloc(2048), ñ is (%0 == ), @P ñ is true

Figure 5: Predicate state at

call @exit

After all predicate states are collected, we run a set of checkers that assert program correctness. For example, at

ret %0instruction we have to check whether @ensures \result 6= \nullptr \result 6= \nullptr is always true in the corresponding predicate

contract holds; it holds if

state(s). We use the classic way of expressing such a query in SMT: the query is always true if there is no satisfying assignment for its negations; that is, we ask Z3 to check SAT/UNSAT property for the assumption

\result = \nullptr

in the given predicate state(s).

If the assumption is UNSAT, the original query is always true, i.e., the

@ensures

contract

holds and there is no error. If Z3 succeeds in nding a SAT assignment, we have a property violation which is lifted to the original query and reported as an error.

6 Instructions 6

that indicate the next basic block to execute.

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

There are several diculties that have to be resolved when trying to model C/C++ code in SMT using dierent theories (e.g., booleans or bit-vectors) even if implementing only intraprocedural analysis.

The most signicant of them are simulating memory and handling loops.

Sections 4.2 and 4.3 describe how we deal with these problems.

4.2 Simulating memory in SMT Pointers and direct access to memory have always been the most notorious parts of C/C++ that cause the biggest number of issues when implementing any kind of analysis for these languages. However, they also produce the biggest number of program defects [13], and it is imperative to support them in real-world verication and analysis tools. Two main problems with pointers are alias analysis, which is hard (that is, NP-hard), and pointer arithmetic, that does not make it any simpler. As BMC analyzes bounded executions,

7

the rst problem becomes solvable , but the second one must be solved by careful memory simulation. We represent memory as an incremental set of memory states for each program execution

8

path. A single memory state is simulated as a functional array ; the zeroth index of the array is used to represent null pointer, the last one  any other incorrect pointer. Functional arrays in SMT can be represented in three ways: uninterpreted functions (UF) theory, array theory, and anonymous lambda-functions in the analysis tool itself. Our prototype has support for all three; the experiments demonstrate that the array-based approach produces the best and the most stable results overall. Another aspect of simulating memory using arrays is how you represent memory elements themselves. They can be either atomic (every basic type maps to a single array element) or bytebased (a basic type maps to a sequence of bytes). Byte-based approach is more accurate, but is also much more resource consuming. Atomic approach is several times faster, and introduces imprecision only when a program performs low-level byte-aware tasks (e.g., binary serialization). Our tool supports both encodings, but uses the faster atomic approach by default. The nal important aspect of memory simulation is representing the initial memory state. One can use either an uninitialized (blank) memory array or a xed array (e.g., lled with illegal pointers). The former provides a more accurate representation for functions that expect the memory to be initialized (e.g., functions accepting pointers as arguments) and makes the analysis more precise. The latter loses accuracy on such functions, but is generally simpler and faster.

4.3 Handling loops Handling loops is a particularly dicult task for all kinds of program analysis. There are several known ways of dealing with loops. First, one can determine loop variants and invariants and use them to infer the number of loop iterations. Second, an approach used in classic BMC [7]

9

is to iteratively analyze the loop until an impossible number of iterations is reached .

Last,

loops can be unrolled fully or partially. While the last approach has some impact on program semantics during analysis, a carefully selected unrolling factor (i.e., BMC bound) can minimize it.

7 Here we consider only intraprocedural analysis, pointers 8 An object with load and store operations. 9 That is, loop does not have any more iterations.

to global variables still present a huge problem.

7

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

We decided to use loop unrolling as being the most simple and, at the same time, eective way of dealing with loops. To do that, we created a custom LLVM pass, because the standard loop unrolling pass supports only a limited number of loop congurations. It also allows us to implement custom annotations to perform user-guided unrolling on a loop-by-loop basis.

4.4 Prototype limitations We do not currently support full interprocedural analysis.

Out of three avors for function

summaries, we fully support only partial function contracts. PanLang specications and Craig interpolation are being developed as sub-projects and have not been integrated into the main prototype yet. The prototype also does not support some of the more advanced C/C++ features like: function pointers, oating point arithmetics, unions, etc.

We plan to investigate the impact

these features (and their support in our tool or lack thereof ) have on BMC eciency in our future work.

5

Experiments

In this section we present the results of our preliminary experiments. We compare the eciency of dierent aspects of memory representation on a set of example programs.

5.1 Test suite Our test suite consists of several dozens of example C programs (78 at the moment of writing) without any special preprocessing done to them.

For every example we also have a set of

expected defects and their locations. These are created w.r.t. prototype limitations and thus contain only those defects that our approach is able to nd. The suite can be divided into two major parts. The rst part contains the examples created by us during prototype development that represent the cases that had proven dicult for out tool to analyze. The other part is based on NECLA [18] benchmarks from NEC laboratories. Our tool cannot fully analyze the whole of NECLA benchmarks due to the limited number of supported defects and the lack of full interprocedural analysis. However, after enhancing the examples with function summaries via function contracts we have successfully veried most of them.

5.2 Comparing memory models As we mentioned in section 4.2, dierent memory representations show dierent eciency. We tested the prototype on our test suite in dierent modes; the results of this comparison are

Overall column), as Heavy column) for dierent memory

shown in table 1. We provide the aggregate results for the whole suite (the well as the results for several memory-heavy examples (the

representations (UF, AT, Lambda-based) and initial memory states (Fixed, Uninit). Experiments show that the performance of dierent representations is dependent on whether the test program contains any errors or not (i.e., SMT solver has to deduce SAT or UNSAT property). UF- and lambda-based approaches exhibit similar performance and tend to outperform array-based approach by a small margin on SAT problems, but fall behind on UNSAT. As our test suite contains a relatively small number of errors, SAT examples have minuscule impact on the results of comparison.

8

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

Overall, s

Heavy, s

UF/Fixed

23.263

11.945

AT/Fixed

19.456

8.965

Lam/Fixed

20.467

10.027

UF/Uninit

33.590

21.486

AT/Uninit

29.286

17.890

Lam/Uninit

30.755

19.168

Table 1: Results of memory representation comparison

The memory representation with uninitialized initial state does degrade performance in all cases, but the xed version leads to incorrect results for examples with functions that depend on callsite memory state. All of the xed-state variants fail to provide the correct defects in 3 out of 78 test cases. Unfortunately, we cannot present the numbers for byte-based memory models as our current test suite contains some examples too complex to analyze in reasonable time when using byte arrays. The comparative performance of dierent memory representations for byte-based arrays on a test suite subset is similar to that for atomic arrays.

6

Conclusion

In this work-in-progress paper we have presented our approach to BMC based on combination of three components to success: LLVM, Z3 and function summaries. The use of these tools and methods together allows us to successfully deal with such problems of BMC as overcomplicated C/C++ semantics or state space explosion in interprocedural analysis. We have implemented a prototype BMC tool for C/C++ and evaluated it on a set of BMC problems. Results show that our approach can be used to successfully analyze dierent C/C++ programs with satisfactory time and resources. In future work, we plan to enhance our prototype: extend the now-incomplete support for function summaries, optimize our interaction with Z3 SMT solver, broaden the support for advanced C/C++ features, and do a more complete evaluation.

References [1] Alessandro Armando, Jacopo Mantovani, and Lorenzo Platania. Bounded model checking of software using SMT solvers instead of SAT solvers. Int. J. Softw. Tools Technol. Transf., 11(1):69 83, jan 2009. [2] Clark Barrett, Christopher Conway, Morgan Deters, Liana Hadarean, Dejan Jovanovic, Tim King, Andrew Reynolds, and Cesare Tinelli. CVC4. CAV'11, pages 171177, 2011. [3] Clark Barrett, Leonardo De Moura, and Aaron Stump. SMT-COMP: Satisability modulo theories competition. CAV'05, pages 503516, 2005. [4] Clark Barrett, Roberto Sebastiani, Sanjit A Seshia, and Cesare Tinelli. Satisability modulo theories. Handbook of Satisability, 185:825885, 2009. [5] Clark Barrett and Cesare Tinelli. CVC3. CAV'07, pages 298302, 2007. [6] Patrick Baudin, Jean C. Filli atre, Thierry Hubert, Claude Marche, Benjamin Monate, Yannick Moy, and Virgile Prevosto. ACSL: ANSI/ISO C Specication Language. Preliminary Design, version 1.4, 2008., preliminary edition, may 2008. 9

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

BMC

Marat Akhin, Mikhail Belyaev and Vladimir Itsykson

[7] Armin Biere, Alessandro Cimatti, Edmund M. Clarke, and Yunshan Zhu. Symbolic model checking without BDDs. TACAS'99, pages 193207, 1999. [8] Roberto Bruttomesso, Alessandro Cimatti, Anders Franzen, Alberto Griggio, and Roberto Sebastiani. The MathSAT 4 SMT solver. CAV'08, pages 299303, 2008. [9] Roberto Bruttomesso, Edgar Pek, Natasha Sharygina, and Aliaksei Tsitovich. The OpenSMT solver. pages 150153, 2010. [10] Edmund Clarke, Daniel Kroening, and Flavio Lerda. A tool for checking ANSI-C programs. TACAS'04, pages 168176, 2004. [11] David R Cok, Alberto Griggio, and Roberto Bruttomesso. The 2012 SMT competition. 2012. [12] Lucas Cordeiro, Bernd Fischer, and Joao Marques-Silva. SMT-based bounded model checking for embedded ANSI-C software. ASE'09, pages 137148, 2009. [13] Coverity. Open source report 2011. http://www.coverity.com/library/pdf/ coverity-scan-2011-open-source-integrity-report.pdf, 2011. [Online; accessed 25March-2013]. [14] William Craig. Three uses of the Herbrand-Gentzen theorem in relating model theory and proof theory. The Journal of Symbolic Logic, 22(3):269285, sep 1957. [15] Ron Cytron, Jeanne Ferrante, Barry K Rosen, Mark N Wegman, and F Kenneth Zadeck. Eciently computing static single assignment form and the control dependence graph. ACM TOPLAS, 13(4):451490, 1991. [16] Leonardo De Moura and Nikolaj Bjørner. Z3: An ecient SMT solver. pages 337340, 2008. [17] V. Itsykson and M. Glukhikh. A language for program component specication. Scientic Journal of SPBSPU, (3):6371, 2010. [18] Franjo Ivancic and Sriram Sankaranarayanan. NECLA static analysis benchmarks. [19] Chris Lattner and Vikram Adve. LLVM: A compilation framework for lifelong program analysis & transformation. CGO'04, pages 7586, 2004. [20] K. L. McMillan. Applications of Craig interpolants in model checking. TACAS'05, pages 112, 2005. [21] Florian Merz, Stephan Falke, and Carsten Sinz. LLBMC: Bounded model checking of C and C++ programs using a compiler IR. VSTTE'12, pages 146161, 2012. [22] Ondrej Sery, Grigory Fedyukovich, and Natasha Sharygina. Interpolation-based function summaries in bounded model checking. HVC'11, pages 160175, 2012.

10

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets∗ Vladimir A. Bashkin1 and Irina A. Lomazova2,3 1

3

Yaroslavl State University, Yaroslavl, 150000, Russia v [email protected] 2 National Research University Higher School of Economics, Moscow, 101000, Russia Program Systems Institute of the Russian Academy of Science, Pereslavl-Zalessky, 152020, Russia i [email protected] Abstract

Workflow Petri nets with an additional (unbounded) resource place are studied. Resource tokens can be consumed and/or produced by transitions, hence a net can have an infinite number of different reachable states. A net with a certain initial resource is called sound if it properly terminates and, moreover, adding any extra initial resource does not violate its proper termination. An (unmarked) net is sound if it is sound for some initial resource. We prove the decidability of both marked and unmarked soundness for one-dimensional workflow nets and present an algorithm of the minimal sound resource calculation.

1

Introduction

Petri nets is a popular formalism for modeling and analysis of distributed systems. In this paper we consider workflow systems, or, to be more precise, workflow processes. To model workflow processes a special subclass of Petri nets, called WF-nets [1, 2], is used. In the context of WF-nets a crucial correctness criterion is soundness [1]. We say, that a workflow case execution terminates properly, iff its firing sequence (starting from the initial marking with a single token in the initial place) terminates with a single token in the final place (i.e. there are no “garbage” tokens after the termination). A model is called sound iff a process can terminate properly starting from any reachable marking. Soundness of WF-nets is decidable [1]. Moreover, a number of decidable variations of soundness are established, for example, k-soundness [7], structural soundness [11] and soundness of nested models [9]. One of important aspects in workflow development concerns resource management. Resources here is a general notion for executives (people or devices), raw materials, finances, etc. To take resources into account different extensions of a base formalism where introduced, coursing different versions of soundness. In [3, 4] a specific class of WFR-nets with decidable soundness is studied. In [8] a more general class of Resource-Constrained Workflow Nets (RCWF-nets) is defined. Informally, the authors impose two constraints on resources. First, they require that all resources that are initially available are available again after terminating of all cases. Second, they also require that for any reachable marking, the number of available resources does not override the number of initially available resources. ∗ This work is supported by the Basic Research Program of the National Research University Higher School of Economics, Russian Fund for Basic Research (projects 11-01-00737, 12-01-31508, 12-01-00281), and by Federal Program ”Human Capital for Science and Education in Innovative Russia” (Contract No. 14.B37.21.0392).

11

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

In [8] it is proven that for RCWF-nets with a single resource type soundness can be effectively checked in polynomial time. In [10] it is proven that soundness is also decidable in general case (by reducing to the home-space problem). In all mentioned papers resources are assumed to be permanent, i.e. they are used (blocked) and released later on. Resources are never created, nor destroyed. Hence the process state space is explicitly bounded. To study a more general case of arbitrary resource transformations (that can arise, for example, in open and/or adaptive workflow systems), in [6] we defined a notion of WF-nets with resources (RWF-nets). RWF-nets extend RCWF-nets from [8] in such a way, that resources can be generated or consumed during a process execution without any restrictions (cf. [5]). For RWF-nets we defined notions of resources and controlled resources and studied the problem of soundness-preserving resource replacement (which is important for adaptive workflows). Unfortunately, even sound RWF-nets are not bounded in general, hence existing soundness checking algorithms cannot be applied here. In [6] the decidability of soundness for RFW-nets was declared as an open problem. Here we consider a restricted case — RWF-nets with a single resource place (1-dim RWFnets). One resource type is sufficient for many practical applications (memory and money are typical examples of such resources). However, 1-dim RWF-nets cannot be reduced to finite-state WF-nets with resources, such as RCWF- or WFR-nets. In this paper we prove decidability of soundness for marked, as well as unmarked 1-dim RWF-nets. We present also an algorithm for computing the minimal sound resource for a given sound 1-dim RWF-net. The paper is organized as follows. In Section 2 basic definitions of multisets, Petri nets and RWF-nets are given. In Section 3 the class of 1-dim RWF-nets is defined and studied, algorithms for checking marked soundness, soundness and finding the minimal sound resource are given. Section 4 contains some conclusions. Proofs can be found in the Appendix.

2

Preliminaries

Let S be a finite set. A multiset m over a set S is a mapping m : S → Nat, where Nat is the set of natural numbers (including zero). For two multisets m, m0 we write m ⊆ m0 iff ∀s ∈ S : m(s) ≤ m0 (s) (the inclusion relation). The sum and the union of two multisets m and m0 are defined as usual: ∀s ∈ S : m + m0 (s) = m(s) + m0 (s), m ∪ m0 (s) = max(m(s), m0 (s)). By M(S) we denote the set of all finite multisets over S. Let P and T be disjoint sets of places and transitions and let F : (P × T ) ∪ (T × P ) → Nat. Then N = (P, T, F ) is a Petri net. A marking in a Petri net is a function M : P → Nat, mapping each place to some natural number (possibly zero). Thus a marking may be considered as a multiset over the set of places. Pictorially, P -elements are represented by circles, T -elements by boxes, and the flow relation F by arcs. Places may carry tokens represented by filled circles. For a transition t ∈ T the preset • t and the postset t• are defined as the multisets over P such that • t(p) = F (p, t) and t• (p) = F (t, p) for each p ∈ P . Similarly, for a place p ∈ P we define • p and p• as the multisets over T such that • p(t) = F (t, p) and p• (t) = F (p, t) for each t ∈ T. A transition t ∈ T is enabled in a marking M iff ∀p ∈ P M (p) ≥ F (p, t). An enabled transition t may fire yielding a new marking M 0 =def M − • t + t• , i. e. M 0 (p) = M (p) − t F (p, t) + F (t, p) for each p ∈ P (denoted M → M 0 , or just M → M 0 ). We say that M 0 is 12

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova























Figure 1: WF-net wth resources. reachable from M iff there is a sequence M = M1 → M2 → · · · → Mn = M 0 . For a Petri net N by R(N, M0 ) we denote the set of all markings reachable from its initial marking M0 . A Petri net with resources is a tuple N = (Pc , Pr , T, Fc , Fr ) s.t. • Pc is a finite set of control places; • Pr is a finite set of resource places, Pc ∩ Pr = ∅; • T is a finite set of transitions, Pc ∩ T = Pr ∩ T = ∅; • Fc : (Pc × T ) ∪ (T × Pc ) → Nat is a multiset of control arcs; • Fr : (Pr × T ) ∪ (T × Pr ) → Nat is a multiset of resource arcs; • ∀t ∈ T ∃p ∈ Pc : Fc (p, t) + Fc (t, p) > 0. In Petri nets with resources we divide Petri net positions into control and resource ones. Note that all transitions are necessarily linked to control places — this guarantees the absence of “uncontrolled” resource modifications. A marking is also divided into control and resource parts. For a multiset c + r, where c ∈ M(Pc ) and r ∈ M(Pr ), we write c|r. For a net N a resource is a multiset over Pr . A controlled resource is a multiset over Pc ∪ Pr . Workflow nets (WF-nets) are a special subclass of Petri nets designed for modeling workflow processes. To study resource dependencies in workflow systems we consider WF-nets with resources. A net with resources N is called a WF-net with resources (RWF-net) iff 1. There is one source place i ∈ Pc and one sink place o ∈ Pc s. t. • i = o• = ∅; 2. Every node from Pc ∪ T is on a path from i to o, and this path consists of nodes from Pc ∪ T. Fig. 1 represents an example of a RWF-net, where resource places r1 and r2 are depicted by ovals, resource arcs — by dotted arrows. Every RWF-net N = (Pc , Pr , T, Fc , Fr ) contains its control subnet Nc = (Pc , T, Fc ), which forms a RWF-net with the empty set of resources. 13

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

A marked net is a net together with some initial marking. A marked RWF-net (N, c|r) is called sound iff ∀s ∈ M(Pr ), ∀M ∈ R(N, c|r + s) we have: 1. ∃s0 ∈ M(Pr ) : o|s0 ∈ R(N, M ); 2. c0 |r0 ∈ R(N, M ) ⇒ c0 = o ∨ c0 ∩ o = ∅. Thus soundness for a RWF-net means, that, first, this workflow net can terminate properly from any reachable state, and, moreover, adding any extra resource does not violate the proper termination property. Let N be a RWF-net. By C(N ) we denote the set of all reachable in Nc control markings, i. e. C(N ) = R(Nc , i). Further we call a RWF-net N sound (without indicating any concrete resource), iff a marked RWF-net (N, i|r) is sound with some initial resource r. Corollary 1. [5] For a sound RWF-net N the set C(N ) of all its reachable control markings is finite and can be effectively constructed. Let N be a RWF-net, c ∈ C(N ). Denote res(c) =def {r ∈ M(Pr ) | (N, c|r) is sound} — all sound resources for c; mres(c) =def {r ∈ res(c) | 6 ∃r0 ∈ res(c) : r0 ⊂ r} — all minimal sound resources for c. Proposition 1. [5] For any sound RWF-net N and any control marking c ∈ C(N ) the set mres(c) is finite. The questions of computability of mres(c) and decidability of soundness for RWF-nets remain open. In the next section positive answers to these two questions is given for a restricted case — RWF-nets with a single resource place.

3

Soundness of 1-dim RWF-nets

Let N = (Pc , Pr , T, Fc , Fr ) be a RWF-net with Pr = {pr }, i.e. with just one resource place. By 1-dim RWF-nets we call the subclass of RWF-nets with single resources. An example of such a net is given in Fig. 2. If a control subnet of N is not sound, then N is also not sound. So, we suppose that the control subnet of N is sound, and hence bounded. It is easy to note, that a bounded control subnet can be represented as an equivalent finite automaton (a transition system). This automaton is an oriented graph with two distinguished nodes – a source node i and a sink node o. In this control automaton states are exactly the elements of C(N ), transitions — transitions of the given net N . But now it will be more convenient to consider a transition t as a pair t (c1 , c2 ) of control states, where c1 → c2 . Every transition t of the automaton is labeled by an integer δ(t), defining a “resource effect” of transition firing. A positive δ(t) means that the firing of t increments the marking of a (single) resource place pr by δ(t), a negative δ(t) means that t is enabled in a state (c|r) iff r(pr ) ≥ |δ(t)|, and that the firing of t decrements the resource by |δ(t)|. Formally,  −Fr (pr , t) for Fr (pr , t) > 0; δ(t) =def Fr (t, pr ) for Fr (t, pr ) > 0. The value δ(t) is called an effect of t (denoted Eff (t)). Note that for simplicity we exclude loops, when both Fr (pr , t) > 0 and Fr (t, pr ) > 0 — it can be simulated by two sequential transitions. 14

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova 













































Figure 2: 1-dim RWF-net N1 . 0

1

2

3

4

5

*







+

,

-

.

/

















!

"

6





#

7

8

9

:

;

















$











%

&

'

(







)



Figure 3: Control automaton for N1 from Fig. 2. A support of t is a resource, needed for t-firing. It is defined as:  0, δ(t) ≥ 0; Supp(t) =def |δ(t)|, δ(t) < 0. Thus, a 1-dim RWF-net N can be transformed to a control automaton Aut(N ), which can be considered as a one-counter net or, alternatively, a 1-dim Vector Addition System with States (VASS) with a specific workflow structure: one source state, one sink state, and every state is reachable from the source state, as well as the sink is reachable from every state. Note that Aut(N ) is behaviorally equivalent to N in the branching-time semantics. Consider an example depicted in Fig. 3. States are denoted by octagons, labelled with the corresponding control markings of the net. Transitions are labelled with the corresponding names and effects. Recall some basic notions from graph theory. A walk is an alternating sequence of nodes and arcs, beginning and ending with a nodes, where the nodes that precede and follow an arc are the head and the tail of this arc. A walk is closed if its first and last nodes coincide. A path is a walk where no arcs are repeated (nodes may be repeated). A simple path is a path where no nodes are repeated. A cycle is a closed path. A simple cycle is a closed simple path. A walk in a control automaton corresponds to some sequence of firings in 1-dim RWF-net. 15

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

Now we inductively define an effect and a support of a walk. Let t be a transition, σ – a walk, such that the tail t is a head of the first transition of σ. Let tσ denote a walk, constructed by linking t and σ. We define: Eff (tσ) =def Eff (t) + Eff (σ);

Supp(tσ) =def Supp(t) + (Supp(σ) Eff (t)),

where denotes the truncated subtraction. A positive (resp., negative) walk is a walk with a positive (resp., negative) effect. Obviously, effect of a cycle does not depend on a choice of a starting node. A node q is called a positive generator, iff there exists a simple positive path from q to q (a simple positive cycle) with a zero support. A node q is called a negative generator, iff there exists a simple negative path θ from q to q (a simple negative cycle), such that Supp(θ) = −Eff (θ). Lemma 1. Any simple positive (negative) cycle contains at least one positive (resp., negative) generator. Proof. Note, that without loss of generality we can consider only cycles of even lengths, having alternating positive and negative arcs. Then the proof is straightforward, by induction on the length of a cycle. Let (N, i|r0 ) be an initially marked 1-dim RWF-net. By abuse of notation we denote by N also the control automaton of N , represented as a one-counter net. Recall, that i ∈ C(N ) denotes the initial control state, r0 ∈ Nat denotes the initial value of a counter (the single resource place), and R(N, (i|r0 )) denotes the set of all reachable states. It is easy to see, that there are exactly two kinds of possible undesirable (not sound) behaviours of a RWF-net — a deadlock and a livelock. Definition 1. A state (c|r) ∈ C(N )×Nat is a deadlock state iff c 6= o and there is no transition t t ∈ T s.t. (c|r) → (c0 |r0 ) for some c0 , r0 . A finite set L ⊂ C(N )×Nat of states is a livelock iff |L| > 1 and σ

1. for any (c|r), (c0 |r0 ) ∈ L there exists σ ∈ T ∗ s.t. (c|r) → (c0 |r0 ); t

2. for any (c|r) ∈ L and t ∈ T s.t. (c|r) → (c00 |r00 ) we have (c00 |r00 ) ∈ L. A livelock state is a state that belongs to some livelock. Note that by definition (o|r) 6∈ L for any r. Obviously, Proposition 2. If a state (c|r) is a deadlock state then: t

1. ∀t ∈ T s.t. c → c0 for some c0 we have δ(t) < 0; t

2. r < min{|δ(t)| : c → c0 for some c0 }. So deadlocks can occur (1) just for control states with only negative outgoing transitions; (2) only for a finite number of different resources – when there are no enough resources for firing any of the subsequent transitions. Since the control subnet of RWF-net N is sound, we immediately get: Proposition 3. If L ⊂ C(N )×Nat is a livelock then there is a state (c|r) ∈ L and a negative t transition t ∈ T with c → c0 , such that δ(t) < −r. 16

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

  "!$#&%(' )+*-,.0/1324653798 :  

 



Figure 4: Modified RWF-net N . Proposition 4. The sets of deadlock and livelock states are finite. Proof. (deadlocks) The set of “potential deadlock” control states (nodes with only negative outgoing transitions) is finite. For a given “potential deadlock” control state the set of applicable deadlock states (natural numbers smaller than the smallest required resource for an subsequent transition) is also finite. (livelocks) First note, that if (c|r), (c|r + x) ∈ L then L is not a livelock. Indeed, in this case σ the transition sequence (c|r) → (c|r + x) corresponds to a positive cycle, that can generate an infinite number of states — a contradiction to a finiteness of livelocks. So, every control state can occur in a certain livelock at most once. Now assume the converse: there are infinitely many livelocks. Then there are infinitely many livelocks with the same set of control states, which differ only in their resource value. Hence, this set includes a livelock with an arbitrarily large resource, and we can take a livelock with a resource big enough to reach the final state o. This implies that o belongs to the livelock — a contradiction with the definition of a livelock. All deadlock states can be detected by checking control states with only negative outgoing transitions. All livelock states can be detected by checking finite systems of states, closed under transition firings (strongly connected components) and satisfying the property from Prop. 3. Theorem 1. Soundness is decidable for marked 1-dim RWF-nets. Proof. The following proof is similar to the proof of decidability of structural soundness in [11]. For a given 1-dim RWF-net N construct the modified RWF-net N by adding new initial place i and two new transitions, as depicted in Fig. 4. Obviously, the original 1-dim RWF-net (N, i|r) is sound iff none of deadlocks and livelocks is reachable in 1-dim RWF-net (N , i|r). Since the sets of deadlocks and livelocks are finite and computable, the problem of soundness of a marked 1-dim RWF-net can be reduced to a finite number of instances of a reachability problem for a 1-counter Petri net. This reachability problem is decidable. Proof of Theorem 1 gives us only a semidecision procedure for soundness of a net. One can check the soundness of a given initial marking, but if the answer is negative, it is not known whether there exists a larger sound marking. 17

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

Definition 2. An unmarked RWF-net N is called sound iff (N, i|r) is sound for some resource r. Prop. 2 gives us only a necessary condition of a deadlock, reachable from some initial marking. Now we prove a stronger theorem, which gives a sufficient and necessary condition for existence of a soundness-violating deadlock (i.e. a deadlock, that is reachable from an infinite number of different initial markings). Theorem 2. An unmarked 1-dim RWF-net is not sound with deadlocks (livelocks) iff there σ exist a deadlock (resp., livelock) state (c|r), a negative generator q and a simple path q → c such that Eff (σ) Supp(σ) ≤ r. Let us prove a deadlock case (a livelock can be proved in a similar way). Proof. (⇐) It is sufficient to show that for any (large enough) initial resource r0 there exists a larger initial resource r0 + x, such that a deadlock is reachable from (i|r0 + x). Consider an arbitrary (large enough) initial resource r0 s.t. τ

(i|r0 ) → (q|s) for some path τ and resource s (it is always possible to find such a resource since the control net is sound, and therefore any control state is reachable for some sufficiently large initial resource). Let θ = qc1 . . . cj q be a simple negative cycle with generator q, i.e. Supp(θ) = −Eff (θ). Denote z = s mod Supp(θ) and consider a larger initial resource r0 + z + Supp(σ). We have  i|r0 + z + Supp(σ) ↓ τ  q|s + z + Supp(σ) ↓ θ



(s+z)/Supp(θ)

 q|Supp(σ) ↓ σ c|Eff (σ) Supp(σ)



and hence a deadlock. (⇒) Assume the converse: the net is unsound with a deadlock, but for any deadlock state there is no negative generator satisfying theorem conditions. The number of deadlock states is finite, hence some deadlock state (c|r) is reachable from an infinite number of different initial states (initial resource values). Every transition sequence σ = t1 .t2 . . . . .tn from (i|r0 ) to (c|r) corresponds to a walk σ in the control automaton graph. Since there are infinitely many deadlock-generating initial states, the set of corresponding walks is also infinite. Each of these walks can be decomposed into a sequence of alternating simple cycles and acyclic simple paths: σ = τ1 (θ1 )k1 τ2 (θ2 )k2 . . . τn−1 (θn−1 )kn−1 τn . Note that this decomposition is not unique: ababa can be considered both as (ab)2 a and a(ba)2 . To fix the idea we consider “decomposition from the right to the left”, so a(ba)2 . 18

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

Let us show that among these walks there is a walk with a negative last cycle θn−1 . Indeed, if the last cycle is positive (or neutral) with an effect x, we can consider a larger initial resource r0 + x ∗ kn−1 and a shorter walk σ 0 = τ1 (θ1 )k1 τ2 (θ2 )k2 . . . τn−1 τn , having the same ending — a deadlock. Now, the new walk σ 0 can be decomposed into simple cycles and simple paths, then the last positive cycle (if it is again positive) can be removed by increasing the initial resource, and so on. At the end of this process we will obtain either a walk with a negative “last cycle” or a completely acyclic walk (simple path from i to c). There are only finitely many acyclic paths in the graph, but infinitely many deadlock-generating initial markings (and hence deadlock-generating walks from i to c), so we necessarily obtain a walk with a negative last cycle. Consider such a deadlock-walk σ 00 , ending with a suffix θk τ, where θ is a negative cycle and τ is acyclic. Let θ = c1 c2 . . . ci . . . cm c1 ,where ci is a negative generator (from Lemma 1 such ci always exists). A path (ci . . . cm c1 )τ is a simple path (remember that we decompose “from the right to the left” and hence θτ cannot contain cycles other than θ). Since the final state of the whole walk σ 00 is (c|r), for any suffix φ of σ 00 we have Eff (φ) Supp(φ) ≤ r.  It holds for (ci . . . cn c1 )τ as well. But this is a simple path, that leads from a negative generator to a deadlock control state – Q.E.D. All simple (negative) cycles can be found by Tarjan algorithm, deadlock and livelock states — by searching for states, satisfiyng Prop. 2 and Prop. 3 respectively. The set of simple paths is finite (and easily computable). Hence, Corollary 2. Soundness is decidable for unmarked 1-dim RWF-nets. Theorem 3. The minimal sound resource is computable for sound unmarked 1-dim RWF-nets. Proof. Here we propose a plain (and hence, may be, not the most effective) algorithm for computing the minimal resource r such that (N, i|r) is sound. Algorithm 1. (minimal sound resource) Input: a sound 1-dim RWF-net N . Output: the minimal resource r such that (N, i|r) is sound. Step 0: Let r := 0. Step 1: Check if (N, i|r) is sound by checking whether there are deadlock or livelock reachable in marked service net (N , i|r)(Th. 1). If “yes”, then return r, otherwise let r := r + 1 and go back to step 1.

4

Conclusion

In this paper we have investigated the soundness property for workflow nets with one (unbounded) resource place. We have proved that soundness is decidable for marked and unmarked nets, and that the minimal sound resource can be effectively computed. Our decision algorithms use the reduction to reachability problem for unbounded Petri nets (that is 19

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

One-dimensional Resource Workflow Nets

V.A.Bashkin, I.A.Lomazova

EXPSPACE-complete) and hence cannot be considered efficient. However, the inefficiency could be unavoidable, since RWF-nets are expressively rather close to ordinary Petri nets (VASS). Further research will concern decidability of soundness for the general case of RWF-nets. It is also quite interesting to apply some alternative notions of soundness to our infinite-state workflow nets. The so-called relaxed soundness is of a particular interest. Relaxed soundness has been proposed as a weaker than soundness property. In a relaxed sound WF-net, only execution sequences including each task at least once should terminate properly. Some other interesting variants are k-soundness, generalized and structural soundness.

References [1] W.M.P. van der Aalst. The Application of Petri Nets to Workflow Management. The Journal of Circuits, Systems and Computers, 8(1):21–66, 1998. [2] W.M.P. van der Aalst, K.M. van Hee. Workflow Management: Models, Methods and Systems, MIT Press, 2002. [3] K. Barkaoui, L. Petrucci. Structural Analysis of Workflow Nets with Shared Resources. In Proc. Workflow Management: Net-based Concepts, Models, Techniques and Tools (WFM98), volume 98/7 of Computing Science Reports, pages 82–95, Eidhoven University of Technology, 1998. [4] K. Barkaoui, R. Ben Ayed, Z. Sba¨ı. Workflow Soundness Verification based on Structure Theory of Petri Nets. International Journal of Computing and Information Sciences, Vol. 5, No. 1, 2007. P.51–61. [5] V. A. Bashkin, I. A. Lomazova. Petri Nets and resource bisimulation. Fundamenta Informaticae, Vol. 55, No. 2, 2003. P.101–114, [6] V. A. Bashkin, I. A. Lomazova. Resource equivalence in workflow nets. In Proc. Concurrency, Specification and Programming, 2006, volume 1, pages 80–91. Berlin, Humboldt Universitat zu Berlin, 2006. [7] K. van Hee, N. Sidorova, M. Voorhoeve. Generalized Soundness of Workflow Nets is Decidable. In Proc. ICATPN 2004, volume 3099 of Lecture Notes in Computer Science, pages 197–216. Springer, 2004. [8] K. van Hee, A. Serebrenik, N. Sidorova, M. Voorhoeve. Soundness of Resource-Constrained Workflow Nets. In Proc. ICATPN 2005, volume 3536 of Lecture Notes in Computer Science, pages 250–267. Springer, 2005. [9] K. van Hee, O. Oanea, A. Serebrenik, N. Sidorova, M. Voorhoeve, I.A. Lomazova. Checking Properties of Adaptive Workflow Nets. Fundamenta Informaticae, Vol. 79, No. 3, 2007. P.347– 362. [10] N. Sidorova, C. Stahl. Soundness for Resource-Contrained Workflow Nets is Decidable. In BPM Center Report BPM-12-09, BPMcenter.org, 2012. [11] F. L. Tiplea, D. C. Marinescu. Structural soundness of workflow nets is decidable. Information Processing Letters, Vol. 96, pages 54–58. Elsevier, 2005.

20

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems for Software Defined Networks∗ E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov Faculty of Computational Mathematics and Cybernetics, Moscow State University, Moscow, RU-119899, Russia [email protected] Abstract Software-defined networking (SDN) is an approach to building computer networks that separate and abstract data planes and control planes of these systems. In a SDN a centralized controller manages a distributed set of switches. A set of open commands for packet forwarding and flow-table updating was defined in the form of a protocol known as OpenFlow. In this paper we describe an abstract formal model of SDN, introduce a tentative language for specification of SDN forwarding policies, and set up formally model-checking problems for SDN.

1

Introduction to Software Defined Networks

Since the very beginning of computer-based telecommunication engineering networks have been built out of special-purpose devices (routers, switches, firewalls, gateways) that run sophisticated distributed algorithms to provide such functionality as topological discovering, routing, traffic monitoring and balancing, access control, etc. A typical enterprise network might include hundreds or thousands of such devices, and in the most of them hardware and software components are closed and proprietary. These networks are managed through a set of complex heterogeneous interfaces that are used to configure separately the network devices. As the size of networks increases and network protocols become more involved, management of traditional networks tends to be remarkably complex and error-prone activity. The difficulty of tuning coherently network devices is one of the most severe obstacle in the development of novel network technologies such as data centers and cloud computing. To cope with these principal difficulties a new kind of network architecture referred to as Software Defined Networks (SDN) has emerged recently. In a SDN a centralized controller manages a distributed set of switches. Switches connect with each other via their ports through datapath channels. Every switch processes packets that enter its ports; packet processing is guided by a flow-table — a set of packet forwarding rules. A rule includes a pattern, an instruction, a priority, a counter, and a time-out. A packet consists of a header and a payload, but only its header is taken into account when the packet is processed by a switch. Upon receiveing a packet a switch matches its header against the patterns of forwarding rules. If several different rules match a packet then the rule with the highest priority is triggered. If no rules match a packet then a default rule is invoked. When a rule is triggered its instruction takes effect. An instruction is a list of actions. Typical actions include forwarding the packet out of one of the ports on the switch, dropping the packet, forwarding the packet to the controller for more general processing, or rewriting some header fields. As soon as a rule completes processing of a packet it increments the counter. A packet forwarding rule is removed from the table when its timeout is exceeded. ∗ This

research is supported by the Skolkovo Foundation Grant N 79, July, 2012

21

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

The controller is a general purpose machine capable of performing arbitrary computation. It communicates with all switches in a network via a control channel. Through this channel the controller keeps track of the current configuration of the networks by receiving messages from switches and configures and manages the network by sending commands to switches. Messages the controller receives from switches include all packets forwarded to the controller, statistics (the number of packets that match specified rules), information about removal of certain rules from flow tables, and switch status information. By sending commands to switches the controller is able to add, delete and modify certain packet forwarding rules in flow tables, to request various information from a switch (configuration, capabilities, statistics), or to send specified packets from specified ports of a switch. While interacting with the switches the controller is intended to provide a forwarding policy — a desirable communication between the nodes of a network. More details on the concept of SDN and its implementation can be found in [1]. SDNs can both simplify existing network applications and serve as a platform for developing new ones (see [2]). The main advantage of this network architecture is that programmers are able to control the behaviour of the whole network by configuring appropriately the packet forwarding rules installed on the switches. Nevertheless, bugs are likely to remain problematic since the complexity of software will increase. Moreover, SDN allows multiple controllers (and, hence, multiple applications) to operate on the same network simultaneously. This opportunity may result in conflicting rules that spoil the forwarding policy of the whole system. The solution is to develop a toolset which could be able 1) to check correctness of a separate application operating on the controller w.r.t. a specified forwarding policy, 2) to check consistency of forwarding policies implemented by various applications, and 3) to monitor and check correctness and safety of the entire SDN. Strange as it may seem, but only few researchers made attempts to apply formal techniques to verification of SDN behaviour. The authors of [3] introduced a relational model of communication networks and developed a BDD-based toolkit to verify reachability properties of packet routing. Similar models of networks were considered in [4, 6, 5, 7]; the authors of these papers used other techniques (SAT solving, manipulations with DNFs) to verify the same class of reachability properties. The main deficiency of these models is that they do not take into account a controller; instead, they are aimed at checking only the snapshots of the data plane. In the models developed in [8, 9, 10], SDN is regarded as automaton (transition system) which passes from one state to another as the switches forward packets, send messages to controllers, or update their flow-tables. Automata-theoretic models are verified by means of static analysis. However, such models are in rather poor agreement with symbolic techniques which is unavoidable when the analysis of even local networks is concerned. As for specification language, the authors of all these papers used temporal logics (CTL or LTL) just to specify paths in the data plane routed by switches. We think that when reachability properties are concerned, transitive closure operator is more suitable for this purpose. Thus, our main contribution to the study of verification problems for SDNs includes: • introduction of a combined (relational and automata based) formal model which captures the most essential features of SDN behaviour; • introduction of a tentative language for specification of SDN forwarding policies which uses transitive closure operator to specify reachability properties of packet forwarding relation and temporal operators to specify behaviour of SDN as a whole; • formal setting of the model checking problem for SDNs. 22

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

2

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

A formal model of Software Defined Networks

In this section we present a formal model of SDNs. It is a finite discrete time model; i.e. our abstraction misses such issues as real time and arithmetic. As a consequence, it does not capture such features as deletion of forwarding rules from flow-tables at the expiration of the respective time-outs, requests and messages that refer to counters. For the sake of simplicity we also ignore the priorities of forwarding rules, but this is not a principal limitation. Unlike the SDN models introduced in [8, 9, 10] our model deals with paths in the data plane routed by forwarding rules (per flow model) rather than individual packets that traverse a network of switches (per packet model). The semantics of the SDN model is defined in terms of a packet forwarding relation on packet states. A packet state is specified by the header of the packet and the location of the packet in the network. A packet forwarding relation specifies how packet states can be changed while packets traverse the network. When applied to a packet, a forwarding rule changes the packet state by modifying its header and by transmitting the packet from one port of the switch to another. Packet states also change when packets are sent from one switch to another via a network channel. The packet forwarding relation allows some packets to be sent to the controller via control channels. Such packet states are regarded as messages addressed to the controller which is viewed as a transducer with a set of packet states for the input alphabet and a set of commands for the output alphabet. Upon receiving a message (i.e. a packet state) from a switch the controller moves to another control state and outputs a finite sequence of commands to switches. These commands update the flow tables of the network and, thus, modify the packet forwarding relation. An observer may consider the behaviour of SDN as an alternating sequence of messages delivered to the controller (events) and packet forwarding relations computed by the network. These concepts are defined formally as follows. To avoid ambiguity we use the term ”network” for a set of switches communicating with each other via datapath channels, and the term ”SDN” for a distributed system which consists of a network and a controller communicating via control channels. Packet header is a Boolean vector h = (h1 , h2 , . . . , hN ). All headers are assumed to have the same length N . The set of all packet headers is denoted by H, H = {0, 1}N . Components of the header h are denoted by h[i], 1 ≤ i ≤ N . The Open Flow protocol does not assign any specific meaning to components of packet headers, but other protocols (such as TCP, UDP, etc.) regard segments of a packet header as addresses, ports, flags, etc. Port of a switch is a Boolean vector p = (p0 , p1 , p2 , . . . , pk ). Its components are denoted by p[i], 0 ≤ i ≤ k. If p[0] = 1 then p is an input port, otherwise it is an output port. All switches in the network are assumed to be identical and have the same number of ports. The set of all (input,output) ports of a switch is denoted by P(IP, OP) respectively. The output port p such that p[i] = (0, 0, . . . , 0) is a drop port. It is denoted by drop; at arriving to this port the packets are dropped. The output port p = h0, 1, 1, . . . , 1i is the control output port. It is denoted by octr; at arriving to this port the packets are sent to a controller. The input port p = h1, 1, 1, . . . , 1i is the control input port. It is denoted by ictr; only commands from the controller come to this port. All switches of a network are enumerated and the name of each switch is a Boolean vector w = (w1 , w2 , . . . , wm ). Its components are denoted by w[i], 0 ≤ i ≤ m. The set of such vectors is denoted by W. A pair hh, pi, h ∈ H, p ∈ P, is called a local packet state. A pair hp, wi, p ∈ P, w ∈ W, is called a node. A triple hh, p, wi, h ∈ H, p ∈ P, w ∈ W, is called a packet state. The set of all packet states is denoted by S. 23

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

A header pattern is a vector z = (σ1 , σ2 , . . . , σN ), where σi ∈ {0, 1, ∗}, 1 ≤ i ≤ N . A port pattern is a vector y = (δ1 , δ2 , . . . , δk ), where δi ∈ {0, 1, ∗}, 1 ≤ i ≤ k. Patterns are used for the selection of appropriate rules from flow tables as well as for the updating of packet headers. Any action a is either a forwarding action OU T P U T (y), where y ∈ OP, or a header modification action SET F IELD(z), where z is a header pattern. An instruction is any finite sequence of actions. A rule is a tuple r = h(z, y), αi), where z, y are header and port patterns, and α is an instruction. A flow-table of a switch is a finite set of rules. Both the topology and the functionality of a net components are defined by means of binary relations on packet states and nodes. These relations are specified by Quantified Boolean Formulae. When dealing with patterns we use two auxiliary functions Uσ (u, v) and Eσ (u), where σ ∈ {0, 1, ∗}, and u, v are Boolean vectors, such that • if σ = ∗, then Uσ (u, v) = u ≡ v and Eσ (u) = 1, • if σ ∈ {0, 1}, then Uσ (u, v) = u ≡ σ and Eσ (u) = u ≡ σ. An action a = OU T P U T (y) sends packets without changing their header to all output ports that match a pattern y = (δ1 , δ2 , . . . , δk ). This action computes the relation Ra (hh, pi, hh0 , p0 i) =

N ^

(h[i] ≡ h0 [i]) ∧

i=1

k ^

Uδi (p0 [i], p[i])

i=1

on the set of local packet states H × P. An action b = SET F IELD(z) modifies headers of packets following a pattern z = (σ1 , σ2 , . . . , σN ): a bit h[i] in a header remains intact if ∗ is in the position i of z, otherwise it is changed to z[i]. This action computes the relation Rb (hh, pi, hh0 , p0 i) =

N ^

Uσi (h0 [i], h[i]) ∧

i=1

` ^

(p[i] ≡ p0 [i]) .

i=1

on the set H × P. An instruction α computes a sequential composition of its actions. If α is empty then a packet by default have to be dropped, i.e. sent to the port drop. Therefore, we assume that every instruction always ends with a forwarding action. The relation Rα computed by a sequence α is defined as follows: 1. if α is empty then Rα = false; 2. if α = a, β then the relation Rα is defined by one of the formulae below depending on a: (a) if a is a forwarding action then Rα (hh, pi, hh0 , p0 i) = Ra (hh, pi, hh0 , p0 i) ∨ Rβ (hh, pi, hh0 , p0 i) , (b) if a modifies packet headers then Rα (hh, pi, hh0 , p0 i) = ∃h00 (Ra (hh, pi, hh00 , pi) ∧ Rβ (hh00 , pi, hh0 , p0 i)) . 24

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

A packet forwarding rule is a triple r = (y, z, α), where y = (δ1 , δ2 , . . . , δ` ) is a port pattern, z = (σ1 , σ2 , . . . , σN ) is a header pattern, and α is an instruction. This rule applies the instruction α to all packets whose port and header match the patterns. Its effect is specified by the relation Rr on the set of local packet states H × P Rr (hh, pi, hh0 , p0 i) = precondr (hh, pi) ∧ Rα (hh, pi, hh0 , p0 i) , where precondr (hh, pi) =

` ^ i=1

N ^

Eδi (p[i]) ∧

Eσj (h[j])

j=1

is a precondition of the rule r. A flow-table tab is a pair (D, β), where D = {r1 , r2 , . . . , rn } is a set of forwarding rules and β is a default instruction. A switch applies rules from its flow-table to those packets which arrive to the input ports of a switch. If all rules from the set D are inapplicable to a packet then the default instruction β takes effect. The semantics of the flow-table tab is specified by a binary relation Rtab (hh, pi, hh0 , p0 i)

=

n W

Rri (hh, pi, hh0 , p0 i) ∨

i=1

∨ (¬(

n V

precondri (hh, pi)) ∧ Rα (hh, pi, hh0 , p0 i)) .

i=1

on the set of local packet states H × P. The set of all possible flow-tables is denoted by T ab. The topology of a network is completely defined by a packet transmission relation T ⊆ (OP × W) × (IP × W). Although our model admits an arbitrary transmission relation of the type specified above, in practice T is an injective function. Nodes that are involved in the relation T are called internal nodes of the network; others are called external nodes. We denote by In and Out the sets of all external input nodes and external output nodes of a network respectively. External nodes of a switch are assumed to be connected to outer devices (controllers, servers, gateways, etc.) that are out of the scope of the SDN controller. Packets enter a network through the input nodes and leave a network through its output nodes. When a set of switches H and a topology T are fixed then a network configuration is a total function N et : W → T ab which assign flow-tables to the switches of the network. Finally, the semantics of a network at a given configuration N et is specified by the relation RN et (hh, p, wi, hh0 , p0 , w0 i)

=

(CN et (hh, p, wi, hh0 , p0 , w0 i) ∧ Out(p0 , w0 )) ∨ ∨ ∃p00 (CN et (hh, p, wi, hh0 , p00 , wi)∧ ∧ T (hp00 , wi, hp0 , w0 i)).

on the set of (global) packet states S = H × P × W, where CN et (hh, p, wi, hh0 , p0 , w0 i) =

_ w∈W

RN et(w) (hh, pi, hh0 , p0 i)





k ^

(wj ≡ wj0 ) .

j=1

When RN et (s, s0 ) holds for a pair of packet states s = hh, p, wi and s0 = hh0 , p0 , w0 i then every packet with a header h which comes to a port p of a switch w is forwarded in one hop either to an input port h0 of a switch w0 or to an outer device connected to an external output port h0 of a switch w. 25

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

A controller is a reactive program which receives messages from switches via their control ports and generates response commands that change the content of flow-tables. A switch sends a message to the controller as a request for updating its flow-table: a message indicates that the flow-table of a switch has no appropriate rules to process a packet which arrives on an input port of the switch. Therefore, a message may be viewed as a packet state. A controller generates two types of commands to add and delete forwarding rules. A command add(w, r), where w ∈ W and r is a forwarding rule, installs the rule r in the flow-table of the switch w. We denote by C the set of all possible commands. A command del(w, z, y), where w ∈ W, and z, y are header and port patterns, deletes from the flow-table of the switch w all forwarding rules r = h(z0 , y0 ), αi when the patterns z0 , y0 match the patterns z, y respectively. Commands of both types change network configurations; we write N et0 = update(cmd, N et) to indicate that a command cmd changes a network configuration N et to a network configuration N et0 . If ω = cmd1 , cmd2 , . . . , cmdn is a finite sequence of commands then we write update(ω, N et) for update(cmdn , update(. . . , update(cmd2 , update(cmd1 , N et))). A formal model of a controller is a transducer A = (H, C, Q, q0 , ∆), where • H and C are input and output alphabets respectively, • Q is a set of control states, • q0 , q0 ⊆ Q, is an initial control state, and • ∆, ∆ ⊆ Q × H × C ∗ × Q is a transition relation. A quadruple (q, s, ω, q 0 ) from ∆ means that a controller A when receiving a message s at the control state q can generates a finite sequence of commands ω and transits to the control state q0 . At every network configuration N et a controller A may receive only such messages that are triggered by packets incoming to the network via external nodes. In this cases a message includes a modified header of such packet and an input node of the switch which sends the message to the controller. To specify the set of messages Event(N et) admissible at a network ∗ configuration N et we consider the transitive-reflexive closure RN et of the one-hop forwarding relation RN et . Then Event(N et) = {hx0 , y0 , z0 i : ∃ x, y, z, x0

(hy, zi ∈ In ∧ ∗ 0 ∧ RN et (hx, y, zi, hx , y0 , z0 i) ∧ 0 ∧ RN et (hx , y0 , z0 i, hx0 , octr, z0 i))}

A formal model of SDN is specified by the sets W, P, H of switches, their ports and packet headers, a packet transmission relation T , and a control A. A partial run of SDN M = (W, P, H, T, A) is a sequence (finite or infinite) s

s

s

si+1

si+2

1 2 run = (N et0 , q0 ) → (N et1 , q1 ) → · · · →i (N eti , qi ) → (N eti+1 , qi+1 ) → · · · (∗)

where for every i, 0 ≤ i, 1. N eti is a network configuration, qi is a control state of A, and si is a packet state, 2. si+1 ∈ Event(N eti ), 3. a transition relation of A includes a quadruple (qi , si+1 , ωi , qi+1 ) such that N eti+1 = update(ωi , N eti ). 26

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

Pairs (N eti , qi ) are viewed as the states of SDN and packet states si+1 play the role of messages sent to the controller. A complete run is a partial run which is either infinite or ends with a state of SDN (N eti , qi ) such that Event(N eti ) = ∅. Given a SDN M and network configuration N et0 we write Run(M, N et0 ) to denote the set of all complete runs of M which begin with a pair (N et0 , q0 ).

3

Specification of forwarding policies

Usually a wide range of requirements is imposed upon communication networks to guarantee their correct, safe and secure behaviour. We consider only those requirements that concern the reachability properties. Certain packets have to reach their destination, whereas some other packets have to be dropped. Certain switches are forbidden for some packets, whereas some other switches have to be obligatorily traversed. Loops are not allowed. These and some other requirements constitute a forwarding policy. One of the aims of network engineering is to provide such a loading of switches with forwarding rules as to guarantee complience with the forwarding policy. Since flow-tables of switches are updated by the controller, this raises two problems that are fundamental in software engineering: 1. verification of SDN against a forwarding policy: given a SDN M and a set of initial network configurations N check that for every network configuration N et, N et ∈ N all runs from Run(M, N et) satisfy a given forwarding policy; 2. implementation of forwarding policy: given a forwarding policy and a set of initial network configurations N build a controller A such that for every network configuration N et, N et ∈ N every run run, run ∈ Run(M, N et) of the corresponding SDN M satisfies this policy. In order to apply formal methods to these problems one needs a formal language to specify forwarding policies. In this section we present a tentative variant of a specification language for SDN forwarding policies. Since the behaviour of a SDN evolves in time and all the states of this process may be significant for the forwarding policy, it is reasonable to use temporal logics to specify the properties of the SDN behaviours. Yet the forwarding policies also refer to properties of network configurations at some stages of the SDN behaviour. These properties mostly concern the paths routed in a network by packet forwarding rules; they can be expressed in terms of one-hop packet forwarding relation RN et . We choose first-order logic with transitive closure operator (FO[TC] in symbols) to specify the properties of network configurations. But these properties are formulated in terms of relationships between packet states. Therefore, we need also a simple language for expressing such relationships. Since packet states are thought of as Boolean vectors, the best way is to choose Boolean formulae for this purpose. Now we consider this multi-level language for specification of forwarding policies in some more details. Let V ar = {X1 , X2 , . . . } be a set of variables; they are evaluated over the set S = H × P × W = {0, 1}N +k+m of packet states. A packet state specification is any Boolean formula ϕ constructed from a set of Boolean variables {Xi [j] : Xi ∈ V ar, 1 ≤ j ≤ N + k + m} and connectives ¬, ∧. The set of such formulae is denoted L0 . A language for specification of network configurations L1 uses only three predicate symbols R(2) , I (1) , O(1) for the signature. It is the smallest language which satisfies the following rules: 1. if ϕ ∈ L0 then ϕ ∈ L1 ; 27

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

2. if X, Y ∈ V ar then the atomic formulae R(X, Y ), I(X), O(Y ) are in L1 ; 3. if ψ(X, Y ) is a formula in L1 and includes exactly two free variables then T C(ϕ(X, Y )) ∈ L1 ; 4. if ψ1 and ψ2 are formulae in L1 and X ∈ V ar then the formulae (¬ψ1 ), (ψ1 ∧ψ2 ), (∃X ψ1 ) are in L1 . The semantics of L1 is defined as follows. Let N et be a network configuration, and s = hh, p, wi and s0 = hh0 , p0 , w0 i be a pair of packet states. Then 1. N et |= R(X, Y )[s, s0 ] iff (s, s0 ) ∈ RN et ; 2. N et |= I(X)[s] iff hp, wi ∈ In; 3. N et |= O(X)[s] iff hp, wi ∈ Out; The satisfiability relation for other formulae in L1 is defined straightforward. Some facts are worthy to be mentioned with respect to FO[TC]. As it follows from the results of [11], model checking problem for FO[TC] is NLOG-complete. Moreover, as it was shown in [12, 13] both µ-calculus and PDL can be translated in FO[TC] (although the size of formulae may grow exponentially). As for network model checking against L1 specifications, we proved Theorem 1. The model checking problem N et |= ψ for closed formulae in L1 is PSPACEcomplete. The proof of PSPACE-hardness of network model checking problem is based on the fact that packet headers may be viewed as configurations of linear bounded Turing machine. In this case the commands of such a machine can be simulated by some appropriate flow table rules. In fact, it is sufficient to have only one network switch with a loop to simulate by means of such a network any Turing machine operating on the tape whose size is bounded by the size of packet header. A forwarding policy, i.e. desirable properties of SDN behaviour, can be specified by means of propositional temporal logics where formulae from L1 serve as atomic propositions. Let L0 (X) and L1 (X) be the set of all those formulae from L0 and L1 respectively which have the only free variable X. Then LT L(L1 ) is the smallest language which satisfies the following rules: 1. if ψ(X) ∈ L1 (X) then ψ(X) ∈ LT L(L1 ); 2. if Φ(X), Ψ(X) ∈ LT L(L1 ) then (¬Φ(X)), (Φ(X)∧Ψ(X)), (X Φ(X)) and (Φ(X) U Ψ(X)) are in LT L(L1 ). Formulae from LT L(L1 ) are evaluated on infinite sequences of network configurations {N eti }∞ i=1 for a given packet state s as follows: • if ψ(X) ∈ L1 (X) then {N eti }∞ i=1 |= ψ(s) iff N et1 |= ψ(s), ∞ • {N eti }∞ i=1 |= X Φ(s) iff {N eti }i=2 |= Φ(s), ∞ ∞ • {N eti }∞ i=1 |= Φ(s)UΨ(s) iff {N eti }i=k |= Ψ(s) for some k, 1 ≤ k, and {N eti }i=j |= Ψ(s) for every j, 1 ≤ j < k,

28

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

• the semantics of connectives ¬ and ∧ is defined in the usual way. A language for specification of forwarding policies L2 is the set of expressions ϕ(X) ⇒ Φ(X), where ϕ(X) ∈ L0 (X), and Φ(X) is a temporal formula from LT L(L1 ). The semantics of these expressions is defined through the satisfiability relations on the runs of formal models of SDNs. Suppose that (∗) is a run of SDN M and ϕ(X) ⇒ Φ(X) is an expression from L2 . Then run |= ϕ(X) ⇒ Φ(X) iff N eti ∞ i=n |= Φ(sn ) for every n, 1 ≤ n, such that ϕ(sn ) = 1. A forwarding policy F P can be specified by a constraint ψ on initial network configurations which is a closed formula from L1 , and by finite set {ϕ1 (X) ⇒ Φ1 (X), . . . , ϕn (X) ⇒ Φn (X)} of expressions from L2 . We say that SDN M implements a forwarding policy F P iff for every network configuration N et0 such that N et0 |= ψ every run (∗) from the set Run(M, N et0 ) satisfies all requirements ϕi (X) ⇒ Φi (X), 1 ≤ i ≤ n. Thus, the model checking problem for SDNs is that of checking whether a given formal model of SDN M satisfies a specification of given forwarding policy F P .

4

Conclusion

It is worth noticing that if a controller of SDN is a finite state machine then the model checking problem, as defined above, is decidable for such models of SDN. This is due to the fact that a network have only finitely many states and, hence, all runs of SDN can be united in a finite state transition system. Thus, the model checking problem for SDN can be reduced to a finite model checking problem for PLTL. The main difficulty in using this consideration in practice is that the size of the statespace of this transition system may be double exponential on the size of respective SDN description. Till now we do not know how to cope with this problem. Nevertheless, we have built a BDD-based toolset for model checking network configurations against thier specifications, i.e. closed formulae ψ from L1 . Using this toolset we are able to check on-the-fly the simple forwarding policy specifications of the form true ⇒ G ψ, i.e. safety invariants of SDN behaviour. We would like to thank the anonymous referee for the valuable comments that help the authors to improve the paper.

References [1] Open Flow Switch Specification. Version 1.3.0, June 25, 2012, http://www.openflow.org/wp/documents/. [2] H. Kim, N. Feamster. Improving network management with software defined networking. Communications Magazine, IEEE, 2013, p. 114-119. [3] E. Al-Shaer, W. Marrero, A. El-Atawy, K. El Badawi Network Configuration in a Box: Toward End-to-End Verification of Network Reachability and Security. In the 17th IEEE International Conference on Network Protocols (ICNP’09), Princeton, New Jersey, USA, 2009. [4] H. Mai, A. Khurshid, R. Agarwal, M. Caesar, R.B. Godfrey, S.T. King Debugging of the Data Plane with Anteater. In the Proceedings of the ACM SIGCOMM conference, 2011, p. 290-301. [5] A. Khurshid, W. Zhou, M. Caesar, and P. Brighten Godfrey. VeriFlow: Verifying Network-Wide Invariants in Real Time In the Proceedings of International Conference ”Hot Topics in Software Defined Networking” (HotSDN), 2012. [6] P. Kazemian, G. Varghese, N. McKeown. Header space analysis: Static checking for networks. In the Proceedings of 9-th USENIX Symposium on Networked Systems Design and Implementation, 2012.

29

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Model and Verification Problems ...

E.V. Chemeritsky, R.L. Smelyansky, V.A. Zakharov

[7] S. Gutz, A. Story, C. Schlesinger, N. Foster. Splendid isolation: A Slice Abstraction for Software Defined Networks. In the Proceedings of International Conference ”Hot Topics in Software Defined Networking” (HotSDN), 2012. [8] M. Reitblatt, N. Foster, J. Rexford, D. Walker. Consistent updates for software-defined networks: change you can believe in!. HotNets, v. 7, 2011. [9] M. Reitblatt, N. Foster, J. Rexford, C. Schlesinger. D. Walker. Abstractions for Network Update. In the Proceedings of ACM SIGCOMM, 2012. [10] M. Canini, D. Venzano, P. Peresini, D. Kostic, J. Rexford ”A NICE way to Test OpenFlow Applications”. In the Proceedings of Networked Systems Design and Implementation, April 2012. [11] Immerman N. Languages that capture complexity classes. SIAM Journal of Computing, v. 16, N 4, 1987, p. 760-778. [12] Immerman N., Vardi M. Model checking and transitive closure logic. Lecture Notes in Computer Science, 1997, p. 291-302. [13] Alechina N., Immerman N. Reachability logic: efficient fragment of transitive closure logic. Logic Journal of IGPL, 2000, v. 8, N 3, p. 325-337.

30

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation of Test Scenarios Based on Guides 1

1

P. Drobintsev , I. Nikiforov , V. Kotlyarov 1

2

1

2

and A. Letichevsky

Saint-Petersburg State Polytechnic University [email protected] Glushkov Institute of Cybernetics, Kiev, Ukrain [email protected] Abstract

The paper describes an approach for generation of test scenarios based on usage of a formal model of the software under development and a guiding language which together allow to reduce the set of tests to be generated within the specied test coverage rate. The high rate of tests coverage is ensured by usage of both automatic branch coverage criteria and user dened path criteria. In a general case, the practice of this approach based on integration of automatic branch coverage with user scenarios coverage provides a signicant increase of the test coverage rate and due to this an increase of quality of the software under development. requirements coverage, special coverage criteria, test scenarios generation, guides for trace generation, test automation

Keywords:

1 Introduction One of the main problems in development and test automation of industrial software is handling of complicated and numerous requirements specications. Documents specifying requirements are generally written in a natural language and may contain hundreds and thousands of requirements. Therefore, the task of requirements formalization to unambiguously describe behavioral scenarios with a sucient level of detail used for development of automatic tests or manual test procedures is characterized as a task of large complexity and huge eort consuming. Applicability of formal methods in the software industry is determined to a great extent by how adequate the formalization language is with respect to accepted engineering practices which involve not only code developers and testers but also customers, project managers of dierent levels, marketing, and other specialists. It is clear that no logic language is suitable for an adequate formalization of requirements which would keep the semantics of the application under development and at the same time would satisfy all concerned people [1]. In modern project documentation the formulation of initial requirements is either construc-

tive, when the checking procedure or scenario of requirement coverage checking can be extracted from the text of this requirement in a natural language, or unconstructive, when the functionality described in the requirement does not contain hints on how to check that the requirement is satised.

2 Requirement Coverage Checking The procedure of requirement checking is an exact sequence of causes and results of some activities (coded with actions, signals, states), the analysis of which can demonstrate that the

31

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

given requirement is either completely covered or not. Such checking procedure can be used as a coverage criterion for a specic requirement; i.e., it can become a so-called criteria procedure. In the text below a sequence or a chain of events will be used as such criteria procedure. Tracking facts of criteria procedure coverage in a system behavioral scenario (hypothetical, implemented in the model or real system), one can assert that the corresponding requirement is satised in the system being analyzed. A procedure for requirement checking (a chain) is formulated by providing the following information for all chain elements (events):



conditions (causes), required for activating of some activity;



the activity itself, which shall be executed under current conditions;



consequences  observable (measurable) results of activity execution.

Causes and results are described with signals, messages or transactions, commonly used in communications between reactive system instances [2], as well as with variables states in form of values or limitations on acceptable values. Tracking state changes, produced by chain activities, lets observe the coverage of corresponding chains. While analysis it is acceptable to consider a direct transition from a state into a state with a null activity, and in case of non-determinism  alternative variants of state changes. Problems with unconstructive formulations of requirements are resolved by development of requirement coverage checking procedures on user or intercomponent interfaces. Thus, chains containing sequences of events can appear as criteria of requirements coverage; in addition, it is possible that criteria of some requirement coverage is specied not with one, but with several chains. In VRS/TAT technology [3] Use Case Maps (UCM) notation [4] is used for high-level description of an application model, while tools for automation of checking and generation work with a model in the basic protocols language [5]. A UCM model (Fig.1) contains a model description of two interacting instances. Each path on the graph from the event start to the event end represents one behavioral scenario. Each path contains a certain number of events (Responsibilities). Events in the diagram are marked with the

×

symbol, while Stub elements which encode inserted diagrams  with the

As a result, each scenario contains a sequence of events.



symbol.

A variety of possible scenarios is

specied with a variety of such sequences. In these terms a chain is dened as a subsequence of events which are enough to make a conclusion that the requirement is satised. A path in the UCM diagram, containing a sequence of events of some chain, is called a trace, covering the corresponding requirement. Based on a given trace, tests can be generated which are needed for experimental evidence of requirement coverage.

3 Traceability Matrix Verication project requirements formalization starts with Traceability matrix (TRM) [1] creation (TRM for a sample project is presented in a table format in Fig.2). The Identier and Requirements columns contain the requirement identier, used in the initial document with requirements, and text of the requirement in the natural language to be formalized. The Traceability column contains chains of events sucient for checking of corresponding requirement coverage and the Traces column  traces or behavioral scenarios used for test code generation.

32

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

Figure 1: UCM model of two instances: Receiver and UserPC

Figure 2: TRM  Traceability matrix

For example, there are 2 chains in the Traceability column for covering FREQ_GWR.3 requirement in the third row of the TRM. To satisfy the requirement it is enough to trace ACM_CAP signal sending in one of two possible scenarios:

FREQ_GWR.3-1: start,

recACMCAP_SL recfwdACM_CAP_IP recACM_CAP_IP , good_new_cap_table,

format_mpeg2, no_chanes, end FREQ_GWR.3-2: start,

,

,

good_new_cap_table, format_multicast, end

It should be noted, that during formulating of criteria chains a model of veried functionality is being created which introduces a lot of state variables, types, agents, instances, etc.

33

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

4 Generating and Selecting Scenarios which Satisfy Specied Coverage Criteria Trace generation is performed by symbolic and concrete trace generators of the VRS system, which implements eective algorithms of Model Checking. The main problem of trace generation is explosion of variants combinations while generating traces from basic protocols [6], which formalize scenario events, conditions of their implementation and corresponding change of the model state after their implementation. Our solution here is ltering of variants being generated based on numerous limitations specically dened before the trace generation cycle. There are general and specic limitations. For example, commonly used general limitations are the maximal number of basic protocols used in a trace and the maximal number of traces generated in a single generation cycle.

Also limitations on Goal and Visited states can be

dened. Specic limitations are dened by sequences of events in the UCM model (so-called Guides) which guide the process of generation in the user-preferable model behavior. A guide is dening in terms of a state model which is presented in form of a transition

hQ, q0 , T, P, f i, where Q is a set of states, q0 ∈ Q is P is a set of agents, ànd f : Q → P is a mapping the state Q.

system. A transition system TS is a tuple initial state,

T

is a set of transition names,

which denes the current set of agents in

To simplify, let us assume that events in the model are mapped with names of TS transitions and agents can be presented as one process or set of processes. A path in TS from a state

ti+1 (ai+1 )

qi+1 −−−−−−−→ qi+2 . . . qj

qi

, where

to a state

qj

is dened as a sequence of transitions

qk ∈ Q ∧ tk ∈ T ∧ ak ∈ f (qk ) for each k ∈ i..j . t0 (a0 ), t1 (a1 ), . . . tn (an ) . . . such that

A trace in TS is an ordered sequence

t0 (a0 )

t1 (a1 )

ti (ai )

qi −−−→

there exists a

tn (an )

q0 −−−−→ q1 −−−−→ . . . −−−−→ qn+1 . . . a.n  transition a on the maximal distance n, which allows a set of traces {a, X1 a, . . . , X1 . . . Xn a}, where X1 , . . . , Xn are any not empty symbols from {L\a}, ∼ a is a prohibition of transition a, allows any symbol from {L\a}, a; b (where a, b  guides) is a concatenation of guides, allows a set of traces {ab}, a ∨ b (where a, b  guides) is a non deterministic choice of guides, allows a set of traces {a, b}, akb is a parallel composition of guide a, which dene the language Z a with the alphabet of b a set X , and b, which dene language Z with the alphabet of a set Y , and X ∩ Y = ∅, because the sets of agents in X and Y do not intersect. Then parallel composition of guides is a set akb a b presented by the language Z = Z⇑Y ∩ Z⇑X , join(a1 , . . . , an ) is a set {Sn } of all permutations of guides a1 , . . . , an , loop(a) is an iteration of guide a, i.e. {aa∗ }. path

A guide is

There are two steps of test scenarios generation by Guides.

On the rst step guides are

created which guarantee that the specied criteria of system behavior coverage are satised. On the second step guides in the UCM notation (Fig.3a) are translated into guides in the basic protocols language (Fig.3b) and control trace generation process. It is important, that only main control points in a behavior are specied in guides, while the trace generated from the guide contains a detailed sequence of behavioral elements. Such approach signicantly reduces the impact of combinatorial explosion on the generation time during exploring a behavior UCM model of system of the system under development. Usage of the method in real industrial projects reveals a set of problems, which are not trivial from the theoretical point of view. For instance, in case of a multi-search, when guides describe traces which belong to far branches of the behavioral tree, the storage of covered states grows

34

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

(à)

(b)

Figure 3: Guides: (à) in UCM notation,(b) in basic protocols language (VRS Guide Language)

very fast and the search process slows down accordingly. For such cases an ecient solution is partitioning the initial set of guides into subsets of dierent experiments. Another problem is overstating the distance between transitions. In such cases the breadthrst search and greedy algorithms can help. To avoid permutations unnecessary from the test scenario point of view, the join operator is used. This operator allows permutation of arguments but prohibits interleaving.

5 Automatic and Manual Processes of Guides Creation Two approaches to guides creation from high-level system description in UCM language are possible: manual and automatic. The automatic approach allows to generate numerous guides to cover system behavior with branches criteria [7]. Each guide contains information about key points of a behavioral scenario, starting from the initial model state modeled by the StartPoint element and ending in a nal state modeled by the EndPoint element. Guides are generated by the UCM-to-MSC generator [8]. The automatic approach to guides creation can be considered as a fast way to obtain a test set which satises branches criteria; however. this approach is not always sucient. Some functionality can be checked only using paths criteria. As this criteria often deals with uncertanties in selection of a test set sucient for the specied requirements coverage, it is usually partly applied only to cover specic requirements. Guides creation for covering a subset of requirements by paths criteria requires more information during generation which is added manually. Besides that, the customer and the test engineer may want to check specic scenarios of the system behavior for checking some specic requirements. Such scenarios are specied manually by and they are created with the UCM Events Analyzer (UCM EVA) tool [8]. An imprortant point in guides creation for industrial systems is minimization of the related manual work. The GDL (Guides Denition Language) language [9] developed for guides descrtiption is supposed to be used for that. As a result, guides specied with GDL provide automated translation into test scenarios and then into tests code.

35

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

6 Guides Denition Language Let us recall that a UCM diagram is a directed graph with nodes from

U

set  diagram elements

and arcs which specify accessibility of one diagram elements from another:

U = Ustart ∪ Uend ∪ Uint , where

Uend

U

is a set of UCM model elements,

Ustart

is a set of UCM start elements (StartPoint),

is a set of UCM end elements (EndPoint), and

Uint

is a set of UCM intermediate elements.

A path (scenario) on a UCM diagram is dened as:

π = u1 → u2 → . . . → un , ui ∈ U, i ∈ [1..n] Accessibility relation of from the node

πij .

Herewith

uj ∈ Uint

ui

uj is dened through existence of a path πustart uj and πuj uend exist. For an arbitrary UCM uj ∈ Uint are accessible, based on the UCM diagram to the node

is accessible, if paths

diagram let's assume that all elements creation principles.

A branch in a UCM diagram is dened as a path which satises the following:



Each branch starts either from a start point of the UCM diagram, or from the start of an alternative behavior element (OrFork), or from the start of a concurrent behavior element (AndFork).



Each branch nishes either in an end point element (EndPoint) of the UCM diagram, or in the nearest end of an alternative behavior element (OrJoin), or in the nearest end of a concurrent behavior element (OrFork).

Consider that:

BRui RBui BUui Uui B

ui ;

is a set of UCM branches accessible from the element is a set of UCM branches from which the element

ui

can be accessed;

ui ; ui can not be ∪ BUui ∪ Uui B .

is a set of UCM branches not accessible from the element is a set of UCM branches from which the element

Thus, the whole set of branches is

U B = BRui ∪ RBui

accessed.

The GDL language is based on three types of specications. 1. M specications allow to specify a UCM model subset by listing (selecting in the diagram) a number of UCM elements. So, the M specication describes only those UCM elements, which will be used for analysis. Let us dene the set of user selected UCM elements as

i ∈ [1..m].

M = {u1 , u2 , . . . um }, ui ∈ U ,

Then the result of guide generator execution (for presenting each guide as a

MSC [10]) with M specications can be described as following:

where

U CM → M SC : (U, M ) → (GD),  GD = {π1 } ∪ {π2 } ∪ . . . {πn } is a set of paths such that  ui ∈ M → ui ∈ {π1 } ∪ {π2 } ∪ . . . {πn } , i ∈ [1..m]

and each branch of the UCM is covered at least once in the set of paths included in GD. Herewith each path starts in a diagram StartPoint and ends in a diagram EndPoint.

36

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

Figure 4: Example of S specication

2. Specications of alternative choice S allow dening a subset of M by pointing at a single element.

Herewith to M set all elements will be included which are accessible

from u element and which lie on paths from s start points to u element.

S = s1 (u1 ), s2 (u2 ), . . . , sk (uk ) ⊆ U The algorithm of guides generation using S specications can be described as following:

U CM → M SC : (U, S) → (GD), where GD = {π1 } ∪ {π2 } ∪ . . . {πg } is a set of paths where each BRui ∪ RBui occurs in at least one of the paths π1 , π2 , . . . πg and the set BUui ∪ Uui B belongs to any of the paths π1 , π2 , . . . πg . 

element from the set no one element from

In Fig.4 an example of S usage is presented. In this example formula S1 is equal to generate_output and in this case only two responsibilities initialize" and generate_output and the stub cong will be included into the M set and used for further analysis; all other stubs will be ignored. 3. R specications allows to dene user specied scenarios.

This give the user an op-

portunity to generate additional guides used for description of special system behaviors. An algorithm of guides generation using R specication can be described as follows:

U CM → M SC : (U, S) → (GD).

R specication has the following format:

R = (u1 ∗ n1 → u2 ∗ n2 → . . . → ur ∗ nr ), ui ∈ U, i ∈ [1..r] where integers

ni ∈ [1..N ]

shall comply with limitations imposed by M and S specica-

tions, represented within the T specication that combines them. Also each path obtained during guide generation shall contain elements described in the R specication exactly in the order specied in the current specication. In Fig.5 an example of R usage is presented. In this example formula R1 is equal to de-

→ unsuccessful_change, in this case only a path which leads through → detect_CNI_change → multicast → inform_CNI_change → u_request_new_mod_code → unsuccessful_change → end).

tect_CNI_change

these two responsibilities will be included into the set of generated guides (start

37

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

Figure 5: Example of R specication usage

Figure 6: Example of system behavior model

T specications are dened as

T = {M, S, R}.

Formulas of this type specify composition of

rules described in M, S, R specications, which will be used in concrete test set generation. Let us consider an example of how the GDL language is used for specication. Assume that the system behavior is described by a specication represented in Fig.6. From the branches coverage point of view 4 test scenarios will be generated in the automatic mode: A,B  to cover "B"; A,F  to cover A and F; A,C(2),D  to cover C and D; A,C(2),E  to cover E, where a number in parentheses is the automatically counted distance in terms of the number of UCM arcs between two responsibilities in a guide before parentheses, which allows to hide alternative ways of the system behavior and thus to reduce the number of generated guides. As shown in the tests above, each system branch will be covered at least once. Therefore, the whole behavioral tree will be explored within the trace adjusting process. Assume that the user wants to reduce the time of generation and not to consider the whole

38

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

behavioral tree, which is usual when working with large software projects. To solve this, M and S specications can be used. For example (Fig.6), remove "F" branch from consideration. To do that, the specication M1=A,B,C,D,E shall be dened, herewith the sequence of elements is not important as this specication denes a submodel of the system. Also dene the T1=M1 specication for a test set. As a result, 2 tests will be generated: A,B,C,D and A,B,C,E which together cover all diagram elements, except the element F. Similarly, this task can be solved using the specication S1=B. This is preferable as it requires less user eorts for formulating specications. Thus, M and S specications allow us to solve problems with partial model exploration, to reduce system behavioral tree exploration in case of scenario adjustment, and to reduce the test set. In the tests above only linear scenarios of system behavior were described. Let us create tests with a cyclic behavior. Note that considering this can be important for quality assurance as the user may be interested in the system behavior at the maximal number of cycles which is dened in metadata inside diagram elements (assume that this number is 5 for this example). To create such test set the specication R=C,D*5,E shall be used. This specication denes a vefold cycle iteration and an exit from the cycle which is dened with the character *. Two test sequences: A,B,C,D,D(2),C,E and A,F,C,D,D(2),C,E will be obtained after test set generation from this specication. These two sequences cover a vefold cycle passing through all possible paths starting in the start point. The GDL language supports compositions of M, S and R specications. This feature provides exibility in dening result specication of a test set and allows the user to specify a test set with a set of rules. Assume that for the example considered above the user wants to check the system behavior in B and C, and that sequence of these behaviors is important, and it is required to limit the behavioral tree in order to reduce the search time. In this case the user can dene specicatoins R1=B and R2=C for the sequences of behaviors and S1=B and S2=E to exclude F and D branches from consideration. Thus the resulting specication of the test set is the following: T=R1,R2,S1,S2. As a result, the behavioral sequence A,B,C,E will be obtained from this specication and this sequence can be used for test generation. Unlike test sets covering system behavior with branches criteria, using specied tests while testing presented system allows the user to reveal defects with incorrect data handling during iterative cycle passing.

7 Conclusions The table below summarizes dierence between the conventional approach where tests can be created manually and the described approach based on the GDL language with automatic tests generation. The comparative analysis uses the following criteria:



C1. Generation of a test suite which covers all branches. This criterion is important to obtain a simple test suite without manual work.



C2. Ability to automatically cover a manually dened sub-model of a UCM. In case of large scale projects, generation of tests for the whole model may be of high complexity because of resource limitations and in such situation the solution is to generate a set of test suites.



C3. Coverage of branches between two points manually selected by the user in the UCM model. In some cases testing of the whole model is not needed and a possible solution

39

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

Table 1: Results of approaches comparing Criterion

Conventional

GDL approach

approach C1

+

+

C2

-

+

C3

-

+

C4

-

+

C5

-

+

is to generate a relatively small test suite, which covers only particular selected points of the system behavior.



C4. Manual selection of branches in alternatives to be covered. A lot of software systems have a set of operation modes.

These modes are usually dened with a small set of

particular alternatives and if the user wants to cover only one alternative, then branches which belong to it need to be selected for respective test generation.



C5. Ability to generate a complex test suite based on criteria C1C4.

Thus, using specications during the test sets creation process allows the user to solve the following problems which arise while developing specied test scenarios:



Specifying user scenarios by dening behavioral points of interest on the UCM diagram, herewith the path between the points is calculated with a verication system which guarantees its correctness.



Reducing behavioral tree while it's exploring in order to cover the behavior with branches criteria.



Dening specic scenarios for cyclic constructions and parallel system behavior.



In general case user scenarios coverage provides a signicant increase of requirement coverage and due to this an increase of the quality of the developed software.

References [1] Baranov S., Kotlyarov V., Letichevsky A. Industrial Technology of Mobile Devices Testing Automation Based on Veried Behavioral Models of Requirements Project Specications // "Space, Astronomy and Programming"  SPbSU, St.Petersburg.  2008.  pp. 134145. (in Russian). [2] Karpov Y.G.. Automata Theory.  Piter, St.Petersburg,  2003.  208 p. (In Russian). [3] Baranov S., Kotlyarov V., Letichevsky A., Drobintsev P. The technology of Automation Verication and Testing in Industrial Projects. / Proc. of St.Petersburg IEEE Chapter, International Conference, May 1821, St.Petersburg, Russia, 2005.  pp. 8186. [4] Z.151: User Requirements Notation (URN)  Language Denition http://www.itu.int/rec/T-REC-Z.151-200811-I/en [5] Letichevsky A., Kapitonova J., Letichevsky A. jr., Volkov V., Baranov S., Kotlyarov V., Weigert T. Basic Protocols, Message Sequence Charts, and Verication of Requirements Specications. Proc of ISSRE04 Workshop on Integrated-reliability with Telecommunications and UML Languages (ISSRE04:WITUL), 02 Nov 2004: IRISA, Rennes, France.  pp.3038. 40

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

A Formal Approach for Generation

P. Drobintsev, I. Nikiforov, V. Kotlyarov and A. Letichevsky

[6] Letichevsky A. jr., Kolchin A. Test Scenarios Generation Based on Formal Model // Programming Problems.  2010.  N 23.  pp. 209215. (in Russian). [7] Baranov S., Kotlyarov V., Weigert T. Veriable Coverage Criteria For Automated Testing. SDL2011: Integrating System and Software Modeling // LNCS.  2012.  Vol.7083.  P.79 89. [8] Anureev I., Baranov S., Beloglazov D., Bodin E., Drobintsev P., Kolchin A., Kotlyarov V., Letichevsky A., Letichevsky A. jr., Nepomniaschy V., Nikiforov I., Potienko S., Pryima L., Tyutin B. Tools for Supporting Integrated Technology of Analysis and Verication of Specications for Telecommunication Applications // SPIIRAS Proc.  2013.  N 1  28 P. (in Russian). [9] Drobintsev P., Kotlyarov V., Chernorutsky I. Test Automation Based on User Scenarios Coverage. "Scientic and Technical Bulletin", St.Petersburg University, vol.4(152)  2012, pp.123126. (in Russian) [10] Recommendation ITU-T Z.120. Message Sequence Chart (MSC ), 11/2000.

41

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems∗ N.O. Garanina A.P. Ershov Institute of Informatics Systems, Russian Academy of Science 6, Lavrentiev ave., 630090, Novosibirsk, Russia [email protected]

Abstract We investigate a model checking problem for the logic of common knowledge and fixpoints µPLCn in well-structured multiagent systems with perfect recall. In this paper we show that a perfect recall synchronous environment over a well-structured environment forms a well-structured environment provided with a special PRS-order. This implies that the model checking problem for the disjunctive fragment of µPLCn is decidable.

1

Introduction

Combinations of traditional program logics [14, 4] with logics of knowledge [5] are a well-known formalism for reasoning about multi-agent systems [9]. The focus of a number of studies is the development of model checking techniques for multi-agent systems specified by means of combined logics [2, 3, 11, 15, 16]. We investigate the model checking problem in trace-based synchronous perfect recall multiagent systems for various combinations of logics of knowledge with logics of time and actions. In such systems agents have a memory: their knowledge depends on states passed and on the previous actions. We can describe this kind of agents because semantics of knowledge is defined on traces, i.e. finite sequences of states and actions, and every agent can distinguish traces with different sequences of information available for it. Each element of a trace represents a state of the system at some moment of time. The problem was under study in [7, 18, 19, 8]. It has been demonstrated in [7] that the model checking problem in the class of finitely-generated trace-based synchronous systems with perfect recall is undecidable for logics Act-CTL-Cn , µPLKn , and µPLCn (where n > 1), but is decidable for logic of knowledge and actions Act-CTL-Kn (with a non-elementary lower bound). The paper [18] presents a direct (update+abstraction)-algorithm for model checking Act-CTL-Kn in perfect recall synchronous environments. This algorithm checks formulas in a special model whose elements are “knowledge” trees. In the paper [19] we demonstrate that this model, provided with a special sub-tree partial order, forms a well-structured labeled transition system [1, 6], where every formula of Act-CTL-Kn can be characterized by a finite computable set of maximal trees that enjoy the property. This feature of a tree model allows us to decrease the time upper bound of the algorithm, but it is still huge. In paper [8] a new form of knowledge trees of the tree-model makes possible to reduce the time complexity of the previous Act-CTL-Kn model checking algorithm exponentially. ∗ The research has been supported by Russian Foundation for Basic Research (grant 13-01-00643-a) and by Siberian Branch of Russian Academy of Science (Integration Grant n.15/10 “Mathematical and Methodological Aspects of Intellectual Information Systems”).

42

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina

But the model checking problem for more powerful logics with common knowledge is undecidable and, to the best of our knowledge, even for fragments of these logics no model checking algorithm has been suggested. We think it is reasonable to try to use some special kind of multiagent systems in order to check some properties with common knowledge for perfect recall agents. In this paper we suggest using well-structured labeled transition systems [1, 6] extended with agents. This is a basis that serves to combine a knowledge-based approach to data transmission protocols [10] and well-structured transition systems as models for concurrent systems with unreliable communication [6]. In the given paper, a perfect recall synchronous multiagent system (PRS-environment) is generated from a well-structured multiagent system with compatibility of agents and system actions. We show that this PRS-environment provided with a special order forms a wellstructured system also. As a corollary, the model checking problem for a disjunctive fragment of logic of common knowledge and fixpoint µPLCn is decidable.

2

Background Logics and Models

Let us recall slightly modified definitions of a combined Propositional Logic of Common Knowledge and Fixpoints µPLCn from [7]. This logic is a fusion of Propositional Logic of Common Knowledge (PLC) [5] and µ-Calculus (µC) [13]. Semantics of µPLCn is defined in terms of a satisfiability relation |= in environments that are a special kind of labeled transition systems. Let {true, f alse} be Boolean constants, P rp and Act be disjoint finite alphabets of propositional variables and action symbols, and let a finite set of natural numbers [1..n] represent names of agents (n ∈ N). Definition 1. (µPLCn syntax) Syntax of µPLCn consists of formulas that are composed of Boolean constants, propositional variables, connectives ¬, ∧, ∨, and the following constructions. Let i ∈ [1..n] be an agent, G ⊆ [1..n] be a group of agents, a ∈ Act be an action, x ∈ P rp be a propositional variable, and ϕ be a formula of µPLCn . Then • knowledge modalities: Ki ϕ and Si ϕ (they are read as ‘agent i knows ϕ’ and ‘agent i supposes ϕ’); • common knowledge modalities: CG ϕ and JG ϕ (they are read as ‘agents from G have common knowledge ϕ’ and ‘agents from G have joint hypothesis ϕ’); • action modalities: [a]ϕ and haiϕ (they are read as ‘after action a fact ϕ is necessary’ and ‘after action a fact ϕ is possible’); • fixpoints: µx.ϕ and νx.ϕ (they are read as ‘the least fixpoint of ϕ’ and ‘the greatest fixpoint of ϕ’). Our definition of an environment differs from the usual definitions in [5, 7] and others in that we include group indistinguishability relations in the definition, though they are expressed by individual indistinguishability relations. Definition 2. (Environment) An environment is a tuple E = (D, ∼, ≈, I, V ), where • the domain D is a non-empty set of states (or worlds); 43

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina i

• for every agent i ∈ [1..n], the indistinguishability relation ∼ is an equivalence relation on D ∼: [1..n] → 2D×D ; G

• for every group of agents G ⊆ [1..n], the group indistinguishability relation ≈ is an S G i equivalence relation on D ≈: 2[1..n] → 2D×D with ≈= ( i∈G ∼)∗ ; • the interpretation of actions I is a total mapping I : Act → 2D×D ; • the valuation V is a total mapping V : P rp → 2D . Every indistinguishability relation that is not an equality expresses the fact that an agent (a group of agents) has incomplete information about system states. The definition of an environment immediately implies the fact that a group of agents G do not distinguish worlds G

w and w0 of E, i.e. w ≈ w0 iff there exists a finite sequence of worlds ws ∈ D+ and a finite sequence of agents as ∈ G+ , such that w is the first world in ws, w0 is the last world in ws, asj |ws| = |as| + 1 and wsj ∼ wsj+1 for all j ∈ [1..|as|]. Semantics of µPLCn follows semantics of logics PLC [5] and µC [13]. Definition 3. (µPLCn semantics) A satisfiability relation |= between environments, worlds, and formulas is defined inductively with respect to a structure of formulas. For Boolean constants, propositional variables, and connectives a satisfiability relation is standard. For the knowledge, common knowledge and action modalities, and fixpoints we define the semantics as follows. Let E be an environment, w ∈ D, i ∈ [1..n], G ⊆ [1..n], a ∈ Act, ϕ be a formula of µPLCn . If S ⊆ D, x ∈ V ar, let us denote by ES/x an environment which agrees with E everywhere, but V (x) = S. For a formula φ without negative instances of x, λS. ES/x (φ) : S 7→ ES/x (φ) is a monotonous non-decreasing mapping on 2D with the least and the greatest fixpoints µ(λS. ES/x (φ)) and ν(λS. ES/x (φ)). Then i

• w |=E Ki ϕ iff for every w0 : w ∼ w0 implies w0 |=E ϕ; i

• w |=E Si ϕ iff for some w0 : w ∼ w0 and w0 |=E ϕ; G

• w |=E CG ϕ iff for every w0 : w ≈ w0 implies w0 |=E ϕ; G

• w |=E JG ϕ iff for some w0 : w ≈ w0 and w0 |=E ϕ; • w |=E [a]ϕ iff for every w0 : (w, w0 ) ∈ I(a) implies w0 |=E ϕ; • w |=E haiϕ iff for some w0 : (w, w0 ) ∈ I(a) and w0 |=E ϕ; • w |=E µx.ϕ iff w ∈ µ(λS. ES/x (ϕ)); • w |=E νx.ϕ iff w ∈ ν(λS. ES/x (ϕ)). Semantics of a formula ϕ in an environment E is a set of all worlds of E that satisfies this formula ϕ: E(ϕ) = {w | w |=E ϕ}. Let us notice that modalities Ki (Si ) is a syntactic sugar for group modalities CG (JG ) with G = {i}. Further we consider just µPLCn normal formulas in which negation is used for propositionals only. Every µPLCn formula is equivalent to some normal formula due to “De Morgan” laws. 44

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina

We investigate trace-based perfect recall synchronous (PRS) environments generated from background finite environments. In PRS environments (1) states are sequences of worlds of initial environments with a history of generating actions; (2,3) an agent (a group of agents) does not distinguish such sequences if the background system performs the same sequence of actions, and the agent (the group) can not distinguish the sequences world by world; (4) there are transitions from one sequence to another with an action a by extending the sequence with a state reachable by a from the last state of the sequence; (5) propositionals are evaluated at the last state of sequences with respect to their evaluations in the background environment. Definition 4. (PRS-environment) Let E be an environment (D, ∼, ≈, I, V ). A trace-based Perfect Recall Synchronous environment generated by E is another environment EP RS = (DP RS(E) , p∼rs , p≈rs , IP RS(E) , VP RS(E) ):1 (1) DP RS(E) is a set of all pairs (ws, as), where non-empty ws ∈ D∗ , as ∈ Act∗ , |ws| = |as|+1, and (wsj , wsj+1 ) ∈ I(asj ) for every j ∈ [1..|as|]; let (ws, as), (ws0 , as0 ) ∈ DP RS(E) : i

(2) for every i ∈ [1..n]: (ws, as) p∼rs (ws0 , as0 ) iff i

as = as0 and wsj ∼ ws0j for every j ∈ [1..|ws|]; G

(3) for every G ⊆ [1..n]: (ws, as) p≈rs (ws0 , as0 ) iff G

as = as0 and wsj ≈ ws0j for every j ∈ [1..|ws|]; (4) for every a ∈ Act: ((ws, as), (ws0 , as0 )) ∈ IP RS(E) (a) iff as0 = as∧ a, ws0 = ws∧ w0 , and (w|ws| , w0 ) ∈ I(a); (5) for every p ∈ P rp: (ws, as) ∈ VP RS(E) (p) iff ws|ws| ∈ V (p). In PRS-environments agents have some kind of a memory because an awareness expressed by an indistinguishability relation depends on the history of the system evolution. We examine the model checking problem for µPLCn in perfect recall synchronous environments generated from finite environments [18]. Definition 5. (Model checking problem for µPLCn ) The model checking problem for µPLCn in perfect recall synchronous environments is to validate or refute (ws, as) |=P RS(E) ϕ, where E is a finite environment, (ws, as) ∈ DP RS(E) , ϕ is a formula of µPLCn .

3

Well-structured Perfect Recall Systems

We are interested in well-structured environments. The essence of an infinite-state model checking technique [1, 6] of well-structured transition systems is that we could check formulas in some representative sets of model states, not in all space. Let us recall base definitions from [19]. Definition 6. (Well-preordered transition system) Let D be a set. A well-preorder is a reflexive and transitive binary relation ≤ on D, where every infinite sequence d1 , . . . di , . . . of elements of D contains a pair of elements dm and dn so 1 In the definition, for every set S let S ∗ be the set of all finite sequences over S and the operation for the concatenation of finite words.



stand

45

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina

that m is less then n and dm ≤ dn . Let (D, ≤) be a well-preordered set with a well-preorder ≤. An ideal (synonym: cone) is an upward closed subset of D, i.e. a set {C ⊆ D | ∀d0 , d00 ∈ D : d0 ≤ d00 ∧ d0 ∈ C ⇒ d00 ∈ C}. Every d ∈ D generates a cone (↑ d) ≡ {e ∈ D | d ≤ e}. For every subset S ⊆ D, a basis of S is a subset {B ⊆ S | ∀s ∈ S ∃b ∈ B : b ≤ s}. A well-preordered transition system (WPTS) is a triple (D, ≤, I) such that (D, ≤) is a well-preordered set and (D, I) is a Kripke frame. We would like to consider well-preordered transition systems in which a well-preorder and an interpretation are decidable and compatible. The standard decidability condition for the well-preorder is straightforward: a relation ≤⊆ D × D is decidable. Definition 7. (Ideal-based model) Let (D, ≤, I) be a WPTS. • Decidability (tractable past) condition: there exists a computable total function BasP re : D × Act → 2D such that for every w ∈ D, for every a ∈ Act, BasP re(w, a) is a finite basis of {u ∈ D | (u, v) ∈ I(a) and w ≤ v}. • Compatibility condition: the preorder R is compatible with the interpretation I(a) of every action symbol a ∈ Act, i.e. I(a) I(a) ∀s1 , s2 , s01 ∃s02 : s1 −→ s01 ∧ s1 ≤ s2 ⇒ s2 −→ s02 ∧ s01 ≤ s02 . An ideal-based model is a labeled transition system I = (D, ≤, I, V ) with the decidable preorder ≤, I meets tractable past and compatibility conditions, and V interprets every propositional variable p ∈ P rp by a cone. In some sense, here we consider individual and group knowledge relations as ‘knowledge’ actions (or transitions) because they are associated with the corresponding modalities. We also introduce a very natural condition of backward compatibility of indistinguishability relations and actions which reflects the fact that each action a ∈ Act transforms state local information of an agent in the same way. Definition 8. (Compatibility of indistinguishability) For every action a ∈ Act, for every pair of worlds (w1 , w10 ) ∈ I(a) and every world w20 such that w10 p∼rs w20 there exists w2 such that (w2 , w20 ) ∈ I(a) and w1 p∼rs w2 . It is obvious that if actions are compatible with individual indistinguishability, then they are compatible with group indistinguishability. Definition 9. (Ideal-based environment) An ideal-based environment is a tuple E = (D, ≤, ∼, ≈, I, V ), where E = (D, ∼, ≈, I, V ) is an environment in which I meets compatibility of indistinguishability and I = (D, ≤, ∼ ∪ ≈ ∪ I, V ) is an ideal-based model. Let further E be an ideal-based environment and EP RS be its PRS-environment. We define a binary relation (PRS-order)  on DP RS(E) . Definition 10. (PRS-order) For all traces of equal length (ws, as) and (ws0 , as0 ) in DP RS(E) let us write (ws, as)  (ws0 , as0 ) iff ws|ws| ≤ ws0|ws| . Theorem 1. A binary relation  is a partial order on traces such that a PRS-environment EP RS provided with this partial order is an ideal-based environment. 46

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina

Sketch of the proof. The proof is rather technical. Hence let us present the basic ideas of the proof. First,  is a partial order. It is also a well-preorder and a decidable relation due to the relation ≤ having same properties. Second,  enjoys tractable past because we could find a preimage of every tree for every ‘action’ transition, for every ‘knowledge’ and ‘common knowledge’ transition (defined by IP RS(E) (a), ∼ prs , and ≈ prs , respectively for a ∈ Act, i ∈ [1..n], and G ⊆ [1..n], in Definition 4) because every trace has finite length and the background environment E has tractable past. Third,  is compatible with all ‘action’ transitions and all ‘(common) knowledge’ transitions. (1) For every action a ∈ Act and for every pair of traces (ws1 , as1 )  (ws01 , as01 ) there exists 0 0∧ ∧ ∧ 0 0 some next trace (ws02 , as02 ) = (ws0∧ 1 w , as1 a) such that (ws1 w, as1 a)  (ws2 , as2 ) by definition of PRS-order and compatibility of actions in the background environment E. (2) For every i ‘knowledge’ action ∼ (i ∈ [1..n]), and for every pair of traces (ws1 , as1 )  (ws01 , as01 ) with prs (ws1 , as1 ) p∼rs (ws2 , as2 ) there exists some trace (ws02 , as02 ) p∼rs (ws01 , as01 ) such that (ws2 , as2 )  (ws02 , as02 ) because every trace has finite length, and compatibility of indistinguishability and compatibility of actions in the background environment E take place. (3) Every ‘common G

knowledge’ action ≈ (G ⊆ [1..n]) is compatible with  for the same reason as ‘knowledge’ prs i actions p∼rs (i ∈ [1..n]). Fourth, EP RS is an ideal-based model. It is obvious that a valuation of every propositional variable forms a cone with a basis consisting of traces with the last state from a basis of the corresponding cone valuation of the variable in E. It is easy to prove that we can express common knowledge modalities by individual knowledge modalities and fixpoints using the following common knowledge equivalencies [7]: V W − CG ϕ ↔ νx.(ϕ ∧ ( i∈G Ki x)); − JG ϕ ↔ µx.(ϕ ∨ ( i∈G Si x)). Let us recall results from [12]. Definition 11. (Disjunctive fragment of µ-Calculus) A context-free definition of disjunctive µ-Calculus formulas is the following: ϕ ::= p | (ϕ ∨ ϕ) | haiϕ | µx.ϕ. The following theorem is proved in [12]. Theorem 2. The model checking problem is decidable for the ideal-based models and the disjunctive formulas of the propositional µ-Calculus. It is also decidable for the disjunctive formulas of the intuitionists modal logic with least fixpoints µFS in the models with tractable past. Definition 12. (Disjunctive fragment of µPLCn ) A context-free definition of disjunctive µPLCn formulas is the following: ϕ ::= p | (ϕ ∨ ϕ) | haiϕ | Si ϕ | JG ϕ | µx.ϕ. Corollary 1 is a straight consequence of Theorem 1 from this paper, Theorem 2 from [12] and common knowledge equivalencies above: Corollary 1. The model checking problem is decidable for an ideal-based PRS-environment and disjunctive formulas of µPLCn .

4

Conclusion

In this paper we have shown that perfect recall synchronous environments over well-structured environments with compatibility of indistinguishability and a cone interpretation of propositional variables provided with PRS-order form an ideal-based environment. This fact makes 47

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina

possible model checking of a disjunctive fragment of logic of common knowledge and fixpoints µPLCn . In the paper [18] the model checking method is suggested for incomparable fragment of µPLCn , namely, Act-CTL-Kn consisting of formulas with bounded knowledge depth. This method is advanced in works [19] and [8], but this technique excludes common knowledge modalities. Further research includes the following topics: (1) extending the disjunctive fragment of µPLCn to more interesting formulas with real knowledge; (2) combining a knowledge-based approach to data transmission protocols [10] and well-structured transition systems with unreliable communication [6]; (3) model checking experiments with real well-structured systems.

References ˆ ans K., Jonsson B., and Tsay Yih-Kuen. Algorithmic analysis of programs with [1] Abdulla P.A., Cerˆ well quasi-ordered domains. Information and Computation, v.160(1-2), 2000, p.109-127. [2] Bordini R.H., Fisher M., Visser W., Wooldridge M. Verifying Multi-agent Programs by Model Checking. Autonomous Agents and Multi-Agent Systems 12(2), 2006, p. 239-256 [3] Cohen M., Lomuscio A. Non-elementary speed up for model checking synchronous perfect recall. Proceeding of the 2010 conference on ECAI 2010, IOS Press Amsterdam, p. 1077-1078. [4] Emerson E.A. Temporal and Modal Logic. In: Handbook of Theoretical Computer Science. v.B, Elsevier and MIT Press, 1990, p. 995–1072. [5] Fagin R., Halpern J.Y., Moses Y., Vardi M.Y. Reasoning about Knowledge. MIT Press, 1995. [6] Finkel A., Schnoebelen Ph. Well-structured transition systems everywhere! Theor. Comp. Sci., v.256(1-2), 2001, p.63-92. [7] Garanina N.O., Kalinina N.A. and Shilov N.V. Model checking knowledge, actions and fixpoints. Proc. of Concurrency, Specification and Programming Workshop CS&P’2004, Germany, 2004, Humboldt Universitat, Berlin, Informatik-Bericht Nr.170, v.2, p.351-357. [8] N.O. Garanina. Exponential Improvement of Time Complexity of Model Checking for Multiagent Systems with Perfect Recall.// Programming and Computer Software, 2012, Vol. 38, No. 6, pp. 294-303, Pleiades Publishing, Ltd., 2012. [9] Halpern J.Y., van der Meyden R., Vardi M.Y. Complete Axiomatizations for Reasoning About Knowledge and Time. SIAM J. Comp. 33(3), 2004, p. 674-703. [10] J. Y. Halpern and L. D. Zuck. A little knowledge goes a long way: knowledge-based derivations and correctness proofs for a family of protocols. Journal of the ACM, 39(3):449478, 1992. [11] Huang X., van der Meyden R. The Complexity of Epistemic Model Checking: Clock Semantics and Branching Time. Proc. of 19th ECAI, Lisbon, Portugal, August 16-20, Frontiers in Artificial Intelligence and Applications, vol. 215, IOS Press, 2010, p. 549-554 [12] Kouzmin E.V., Shilov N.V., Sokolov V.A. Model Checking µ-Calculus in Well-Structured Transition Systems. Proc. of 11th International Symposium on Temporal Representation and Reasoning (TIME’04), France, IEEE Press, p. 152-155. [13] Kozen D. Results on the Propositional Mu-Calculus. Theoretical Computer Science, v.27, n.3, 1983, p.333-354. [14] Kozen D., Tiuryn J. Logics of Programs. In: Handbook of Theoretical Computer Science, v.B., Elsevier and MIT Press, 1990, p. 789–840. [15] Kwiatkowska M.Z., Lomuscio A., Qu H. Parallel Model Checking for Temporal Epistemic Logic. Proc. of 19th ECAI, Lisbon, Portugal, August 16-20, Frontiers in Artificial Intelligence and Applications, vol. 215, IOS Press, 2010, p. 543-548

48

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Common Knowledge in Well-structured Perfect Recall Systems

N.O. Garanina

[16] Lomuscio A., Penczek W., Qu H.: Partial order reductions for model checking temporal epistemic logics over interleaved multi-agent systems. Proc. of 9th AAMAS, Toronto, Canada, May 10-14, 2010, Volume 1, IFAAMAS, 2010, p. 659-666 [17] van der Meyden R., Shilov N.V. Model Checking Knowledge and Time in Systems with Perfect Recall. Lect. Notes Comput. Sci., 1999, v.1738, p. 432–445. [18] Shilov N.V., Garanina N.O., and Choe K.-M. 2.7. Update and Abstraction in Model Checking of Knowledge and Branching Time. Fundameta Informaticae, v.72, n.1-3, 2006, p.347-361. [19] Shilov N.V., Garanina N. O. Well-structured Model Checking of Multiagent Systems. Proceedings of 6th International Conference on Perspectives of System Informatics, Novosibirsk, Russia, June 27-30, 2006. - Lecture Notes in Computer Science. - Vol. 4378. - 2006.

49

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Automatic C Program Verification Based on Mixed Axiomatic Semantics∗ I. V. Maryasov1 , V. A. Nepomniaschy1 , A. V. Promsky1 and D. A. Kondratyev2 1

A. P. Ershov Institute of Informatics Systems, Novosibirsk, Russia {ivm, vnep, promsky}@iis.nsk.su 2 Novosibirsk State University, Novosibirsk, Russia [email protected]

Abstract The development of C-light project results in the application of new formalisms and implementation techniques which facilitate the verification process. The mixed axiomatic semantics proposes a choice between simplified and full-strength deduction rules depending on program objects and their properties. The LLVM infrastructure helps greatly in writing C-light program analyzer and translator. The semantical labeling technique, proposed earlier, can now be safely kept in verification conditions during their proof. Two programs from the well-known verification benchmarks illustrate the applicability of our prototype system. Keywords: C-light, ACSL, LLVM, Simplify, mixed semantics, specification, verification

1

Introduction

At the present time C program verification is a topical problem because in system programming C language is widely applied. Let us mention here two C program verification projects which are ideologically similar to our. First, a promising approach has been proposed within the framework of INRIA project WHY [7]. In fact, WHY is a platform appropriate for verification of many imperative languages. An intermediate language of the same name WHY is defined, and input programs are translated into it. This translation is aimed at the generation of verification conditions (VC) independent of theorem provers. The WHY platform serves as a base for the toolset Frama-C that provides static analysis and deductive verification. Second, the VCC project is being developed in Microsoft Research [5]. Programs are translated into logical formulas using the Boogie tool which combines an intermediate language, Boogie PL and VCs generator. VCs are validated in SMT solver Z3. Boogie PL is not limited to the C language support only. For example, it is used in the Spec# project. However, translation into the different language could be a disadvantage since no correctness proof is presented (the same is true for the WHY project). At the moment, the VCC developers are focused on the verification of the Hyper-V hypervisor, so information about other case studies is insufficient. C-light language is a powerful subset of the C language. To verify C-light programs the two-level approach [16, 15] and mixed axiomatic semantics method [1] were suggested earlier. On the first stage, we translate the source C-light program to C-kernel one. C-kernel language [15] is a subset of C-light. On the second stage, the VCs are generated by applying the rules of mixed axiomatic semantics. ∗ This

50

research is partially supported by RFBR grant 11-01-00028.

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Automatic C Program Verification

I. V. Maryasov, V. A. Nepomniaschy, A. V. Promsky, D. A. Kondratyev

Figure 1: The C-light verification system Our two-level C program verification method has theoretical justification. Theorems of correctness of translation of C-light into C-kernel and soundness of the C-kernel axiomatic semantics are proven [11]. Another advantage of the method is thorough operational and axiomatic formalizations of the C-light language. On the one hand, this allows us to express fine properties (for example, memory sharing). On the other hand, this leads to cumbersome VCs. To overcome this problem, the mixed axiomatic semantics method [1, 11], that is a combination of the two-level C program verification method with the C-kernel mixed axiomatic semantics, is used. The word “mixed” means that there can be several inference rules for the same program construction which are unambiguously applied depending on its context. In many cases the use of specialized inference rules allows us to simplify VCs. In [13] the design of extendable multilanguage analysis and verification system SPECTRUM was described. Here we present the system which implements our methods in the context of C program verification. The following diagram shows the design of C-light verification system. At the proof stage, the automatic theorem prover Simplify [6] is used. Extra axioms can be provided by the user in case the prover has failed to check whether a VC is true. If all verification conditions have been proven, then the program is partially correct. Otherwise, the user has to modify the program or its specification and to repeat verification process in C-light verification system.

2

Translation from C-light into C-kernel

During the development of the project, a new approach has been selected for the implementation of the translator from C-light into C-kernel. It was decided to use existing tools for parsing and for the construction of the internal representation of annotated C-light programs. The choice was made in favor of C++ API provided by Clang compiler and virtual machine LLVM. This tool has the following advantages: 1. API allows the use of object-oriented analysis and design in the development of the compiler. This simplifies the development and gives an opportunity to easily make changes to already the implemented translator. The task of making changes is relevant in connection with our plans to expand the C-light language. 51

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Automatic C Program Verification

I. V. Maryasov, V. A. Nepomniaschy, A. V. Promsky, D. A. Kondratyev

2. The C++ programming language allows us to provide code specifications for class methods defined in the implementation of the compiler, similar to the ACSL-specifications of annotated C-light programs. Therefore, it is possible to conduct a formal verification of a translator. As an illustration of the compiler implementation let us consider the classes of Clang API, objects of which are used, as well as classes on which the realization of the translation rules is based [14]. In the Clang API internal representation of the program is called AST. API provides a large set of classes for AST. The most suitable for the translation task are those that correspond to the implementation of the design pattern Visitor. In the classic book [8] the purpose of Visitor pattern is described as follows: “it defines an operation applicable to each object from some structure”. From the rather large number of classes responsible for the implementation of Visitor pattern in Clang API we used the following: RecursiveASTVisitor is a class that implements the depth-first search for the whole AST allowing us to visit each node. This class can be inherited from and its methods can be overridden to handle the AST nodes. SourceLocation is a class responsible for the object locations in the source code. Note that the implementation of translation rules using class SourceLocation can remember parts of the code in which a changes took place. So, we are able to create a protocol allowing to return back from a C-kernel program to the instructions of C-light program. This protocol is important for the task of locating errors [17]. Clang API also makes it easy to implement a translator from C-kernel into a prefix form. The prefix form is a linear recording of a tree structure in which every program can be presented. It is necessary that the VC generator should pay minimal attention to syntactic features of the programming language, so that it gets a simplified text of the program. The prefix form easily meets these requirements. To implement the translator from C-kernel into the prefix form it is sufficient to visit all AST, starting from declarations of the highest level. In those AST nodes which represent function declarations it is possible, using the API, to get immediately the list of constructions that make up the body of the function. For a given annotated C program, it is important to know whether the mixed axiomatic semantics can be used. This can be easily checked with the help of the Clang API. The verification process must know whether the program variables are referred to. During the AST traversal the classes of Clang API provide the means to check whether the node represents the address operation.

3

Generation of VCs

The VCs generator is based on mixed axiomatic semantics of C-kernel [1, 11]. It is implemented using C++. The input file of the generator is the output file of the translator from C-light to C-kernel, which contains an annotated C-kernel program in the prefix form. The program and its specification is stored in tree-type structure so it is easy to make necessary substitutions according to mixed axiomatic semantics rules. The rules of mixed axiomatic semantics are designed in the way that: 1. We move from the program beginning to its end and eliminate the leftmost operator (on the top level) applying the corresponding rule (forward tracing). 2. On each step we can apply one and only one rule (unambiguity of inference). 52

Copyright ОАО «ЦКБ «БИБКОМ» & ООО «Aгентство Kнига-Cервис»

Automatic C Program Verification

I. V. Maryasov, V. A. Nepomniaschy, A. V. Promsky, D. A. Kondratyev

Also the algorithm of invariant translation from [1] C-light to C-kernel was implemented. The idea of mixed axiomatic semantics consists in designing several variants of inference rules for the same program construction, which are applied depending on its context. For example, we have two rules for the assignment statement. The first one is for non-shared variables (i. e. their values are not accessed by pointers), the second is for shared ones (i. e. their values are accessed by pointers). In order to distinguish between such variables and to select the corresponding variant, the algorithm [2] of detecting shared and non-shared variables was implemented. Consider a general rule for the simple assignment statement: E ` {∃M D0 P (M D ← M D0 ) ∧ M D = upd(M D0 , addr(val(e, M D0 ), M D0 ), cast(val(e0 , M D0 ), type(e0 ), type(e)))}A; {Q} E ` {P } e = e0 ; A; {Q} Here e0 does not contain any function calls and cast operators. The record P (V ← V 0 ) denotes the replacement of all occurrences of the metavariable V in the formula P by V 0 . The variant of this rule when x is a non-shared variable has the following form: E ` {∃x0 P (x ← x0 ) ∧ x = cast(val(e0 (x ← x0 ), M D), type(e0 ), type(x))}A; {Q} E ` {P } x = e0 ; A; {Q} On the last stage, all VCs are written to the output file in the form which satisfies the Simplify input format.

4

Examples

To demonstrate the verification process in our system, we have chosen two programs from the COST IC0701 verification competition [4] and from the 1st Verified Software Competition [9]. The specific feature of Verification Tool Benchmarking is that tools are not being compared solely by productivity (measured in (milli)seconds). The question whether a tool is able to verify specific classes of programs is of the primary importance. As we will see, our tool is powerful enough to work with them.

4.1

Finding the Maximum in an Array

Given a non-empty integer array a, the function max() should return an index of the maximal element in a. Here we use the specifications proposed by team Dafny [10]. We also represent them in the form directly admissible by Simplify. However, in practice the ACSL annotations are translated into such syntax by program from Section 2. The annotated C-kernel program has the form: // (AND (NEQ int max(int* { auto int auto int

a |@NULL|)(> length 0)) a, int length) x = 0; y = length - 1;

/* (AND (