Advances in Image and Graphics Technologies: 10th Chinese Conference, IGTA 2015, Beijing, China, June 19-20, 2015, Proceedings 3662477904, 978-3-662-47790-8, 978-3-662-47791-5, 3662477912

This book constitutes the refereed proceedings of the 10th Chinese Conference on Advances in Image and Graphics Technolo

298 12 21MB

English Pages 456 [469] Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Advances in Image and Graphics Technologies: 10th Chinese Conference, IGTA 2015, Beijing, China, June 19-20, 2015, Proceedings
 3662477904, 978-3-662-47790-8, 978-3-662-47791-5, 3662477912

Table of contents :
Front Matter....Pages I-XII
Viewpoints Selection of 3D Object Recognition Based on Manifold Topological Multi-resolution Analysis Method....Pages 1-9
Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs....Pages 10-17
Semantic Description of Fish Abnormal Behavior Based on the Computer Vision....Pages 18-27
Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern....Pages 28-36
Single Training Sample Face Recognition Based on Gabor and 2DPCA....Pages 37-44
Vibe Motion Target Detection Algorithm Based on Lab Color Space....Pages 45-54
Image Data Embedding with Large Payload Based on Reference-Matrix....Pages 55-62
A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair Histogram....Pages 63-71
A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching....Pages 72-79
Approach for License Plate Location Using Texture Direction and Edge Feature....Pages 80-86
Object-Based Binocular Data Reconstruction Using Consumer Camera....Pages 87-95
A Parallel SIFT Algorithm for Image Matching Under Intelligent Vehicle Environment....Pages 96-103
Object Tracking Based on Particle Filter with Data Association....Pages 104-112
A Cosegmentation Method for Aerial Insulator Images....Pages 113-122
A Study on H.264 Live Video Technology with Android System....Pages 123-132
Solar Wafers Counting Based on Image Texture Feature....Pages 133-141
Robust Image Feature Point Matching Based on Structural Distance....Pages 142-149
A Novel MRF-Based Image Segmentation Approach....Pages 150-157
Flexible Projector Calibration in the Structured Light 3D Measurement System....Pages 158-166
Computational Complexity Balance Between Encoder and Decoder for Video Coding....Pages 167-175
Application of Gravity Center Track in Gait Recognition Robust to Influencing Factors....Pages 176-189
Data Virtualization for Coupling Command and Control (C2) and Combat Simulation Systems....Pages 190-197
Ship Recognition Based on Active Learning and Composite Kernel SVM....Pages 198-207
Extraction of Plaque Characteristics in Iris Image....Pages 208-213
Research of Remote Sensing Image Compression Technology Based on Compressed Sensing....Pages 214-223
A Novel Camera-Based Drowning Detection Algorithm....Pages 224-233
The Extraction of Rutting Transverse Profiles’ Indicators Using 13-Point Based Lasers....Pages 234-242
A 3D Reconstruction Method for Gastroscopic Minimally Invasive Surgery....Pages 243-250
Learning Based Random Walks for Automatic Liver Segmentation in CT Image....Pages 251-259
Corner Detection Algorithm with Improved Harris....Pages 260-271
Fusion of Contour Feature and Edge Texture Information for Palmprint Recognition....Pages 272-281
Design of Interactive System for Digital Splash-color Painting based on the Mobile Platform....Pages 282-291
Micro-video Segmentation Based on Histogram and Local Optimal Solution Method....Pages 292-299
A Merging Model Reconstruction Method for Image-Guided Gastroscopic Biopsy....Pages 300-307
A Novel Terrain Rending Method for Visual Navigation....Pages 308-316
Decision Mechanisms for Interactive Character Animations in Virtual Environment....Pages 317-323
A Fast Optical Method to Estimate Melanin Distribution from Colour Images....Pages 324-332
An Adaptive Detection Algorithm for Small Targets in Digital Image....Pages 333-339
A Leaf Veins Visualization Modeling Method Based on Deformation....Pages 340-352
Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks....Pages 353-365
Paper Currency Denomination Recognition Based on GA and SVM....Pages 366-374
Simulation Research for Outline of Plant Leaf....Pages 375-385
Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation....Pages 386-395
Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images Using Landsat-8 OLI....Pages 396-407
Fast Image Blending Using Seeded Region Growing....Pages 408-415
Research on the Algorithm of Multidimensional Vector Fourier Transformation Matrix....Pages 416-422
A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation and Confidence Analysis....Pages 423-432
Rapid Recognition of Cave Target Using AirBorne LiDAR Data....Pages 433-437
Robust Visual Tracking via Discriminative Structural Sparse Feature....Pages 438-446
Research on Images Correction Method of C-arm Based Surgical Navigation....Pages 447-454
Erratum to: A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair Histogram....Pages E1-E1
Back Matter....Pages 455-456

Citation preview

Tieniu Tan · Qiuqi Ruan Shengjin Wang · Huimin Ma Kaichang Di (Eds.)

Communications in Computer and Information Science

Advances in Image and Graphics Technologies 10th Chinese Conference, IGTA 2015 Beijing, China, June 19–20, 2015 Proceedings

123

525

Communications in Computer and Information Science

525

Editorial Board Simone Diniz Junqueira Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Rio de Janeiro, Brazil Phoebe Chen La Trobe University, Melbourne, Australia Alfredo Cuzzocrea ICAR-CNR and University of Calabria, Cosenza, Italy Xiaoyong Du Renmin University of China, Beijing, China Joaquim Filipe Polytechnic Institute of Setúbal, Setúbal, Portugal Orhun Kara TÜBİTAK BİLGEM and Middle East Technical University, Ankara, Turkey Igor Kotenko St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia Krishna M. Sivalingam Indian Institute of Technology Madras, Chennai, India Dominik Ślęzak University of Warsaw and Infobright, Warsaw, Poland Takashi Washio Osaka University, Osaka, Japan Xiaokang Yang Shanghai Jiao Tong University, Shangai, China

More information about this series at http://www.springer.com/series/7899

Tieniu Tan Qiuqi Ruan Shengjin Wang Huimin Ma Kaichang Di (Eds.) •



Advances in Image and Graphics Technologies 10th Chinese Conference, IGTA 2015 Beijing, China, June 19–20, 2015 Proceedings

123

Editors Tieniu Tan Chinese Academy of Sciences institute of Automation Beijing China Qiuqi Ruan Beijing Jiaotong University Beijing China

Huimin Ma Tsinghua University Beijing China Kaichang Di Chinese Academy of Sciences Beijing China

Shengjin Wang Tsinghua University Beijing China

ISSN 1865-0929 ISSN 1865-0937 (electronic) Communications in Computer and Information Science ISBN 978-3-662-47790-8 ISBN 978-3-662-47791-5 (eBook) DOI 10.1007/978-3-662-47791-5 Library of Congress Control Number: 2015942483 Springer Heidelberg New York Dordrecht London © Springer-Verlag Berlin Heidelberg 2015 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. Printed on acid-free paper Springer-Verlag GmbH Berlin Heidelberg is part of Springer Science+Business Media (www.springer.com)

Preface

It was a pleasure and an honor to have organized the 10th Conference on Image and Graphics Technologies and Applications. The conference was held during June 19–20, 2015, in Beijing, China. The conference series is the premier forum for presenting research in image processing and graphics and related topics. The conference provides a rich forum for sharing the latest advances in the areas of image processing technology, image analysis and understanding, computer vision and pattern recognition, big data mining, computer graphics and VR, image technology application, with the generation of new ideas, new approaches, new techniques, new applications, and new evaluation. The conference is organized under the auspices of the Beijing Society of Image and Graphics, in Tsinghua University, Beijing, China. The conference program included keynotes, oral presentations, posters, demos, and exhibitions. We received 138 papers for review. Each of these was assessed by no fewer than two reviewers, with some papers being assessed by three reviewers; 50 submissions were selected for oral and poster presentation. We are grateful for the efforts of the people who helped make this conference a reality. We would like to thank the reviewers for completing the reviewing process in time. The local host, Tsinghua University, handled many of the local arrangements for the conference. The conference continues to provide a leading forum for cutting-edge research and case studies in image and graphics technology. We hope you enjoy reading the proceedings.

June 2015

Shengjin Wang

Organization

General Conference Chairs Wang Sheng-jin Ruan Qiu-qi

Tsinghua University Beijing Jiaotong University

Executive and Coordination Committee Tan Tie-niu Wang Run-sheng Wang Guo-ping Chen Chao-wu Zhou Ming-quan Jiang Zhi-guo

Institute of Automation, Chinese Academy of Sciences China Aero Geophysical Survey and Remote Sensing Center for Land and Resources Peking University The First Research Institute of the Ministry of Public Security of P.R.C. Beijing Normal University Beihang University

Program Committee Chair Ma Hui-min

Tsinghua University

Organizing Committee Chair Huang Kai-qi

Institute of Automation, Chinese Academy of Sciences

Organizing Committee Liu Yue Yang Lei Di Kaichang

Beijing Institute of Technology Communication University of China Institute of Remote sensing and Digital Earth, Chinese Academy of Sciences

Program Committee Zhou Li Zhao Hui-jie Zhao Dan-pei Zhang Feng-jun Zhan Bing-hong Yuan Xiao-ru Yu Neng-hai

Beijing University of Technology Beihang University Beihang University Institute of Software, Chinese Academy of Sciences Beijing Institute of Fashion Technology Peking University University of Science and Technology of China

VIII

Organization

Yang Lei Yang Cheng Yan Jun Xu Zhi-yong Xu Guang-you Xia Shi-hong Wu Zhong-ke Tao Lin-mi Sun Feng-jie Di Kaichang Su Guang-da Lv Xue-qiang Lu Han-qing Liu Wen-ping Liu Jian-bo Lin Hua Liang Xiao-hui

Communication University of China Communication University of China Journal of Image and Graphics ImageInfo Co., Ltd Tsinghua University Institute of Computing Technology, Chinese Academy of Sciences Beijing Normal University Tsinghua University North China Electric Power University Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences Tsinghua University Beijing Information Science and Technology University Institute of Automation, Chinese Academy of Sciences Beijing Forestry University Communication University of China Tsinghua University Beihang University

Contents

Viewpoints Selection of 3D Object Recognition Based on Manifold Topological Multi-resolution Analysis Method . . . . . . . . . . . . . . . . . . . . . . Xiang Wang, Huimin Ma, and Jiayun Hou Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs . . . . Tao Chen, Yiguang Liu, Jie Li, and Pengfei Wu

1 10

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang Xiao, Wei-kang Fan, Jia-fa Mao, Zhen-bo Cheng, and Hai-biao Hu

18

Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhihua Xie

28

Single Training Sample Face Recognition Based on Gabor and 2DPCA . . . . Jun Yang and Yanli Liu

37

Vibe Motion Target Detection Algorithm Based on Lab Color Space . . . . . . Zhiyong Peng, Faliang Chang, and Wenhui Dong

45

Image Data Embedding with Large Payload Based on Reference-Matrix. . . . Yan Xia, Zhaoxia Yin, and Liangmin Wang

55

A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair Histogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fusheng Yang and Tiegang Gao

63

A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhi Dou, Yubing Han, Weixing Sheng, and Xiaofeng Ma

72

Approach for License Plate Location Using Texture Direction and Edge Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fujian Feng and Lin Wang

80

Object-Based Binocular Data Reconstruction Using Consumer Camera . . . . . YunBo Rao, Bojiang Fang, and Xianshu Ding A Parallel SIFT Algorithm for Image Matching Under Intelligent Vehicle Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hui-qi Liu, Yuan-yuan Li, and Tian-tian Li

87

96

X

Contents

Object Tracking Based on Particle Filter with Data Association . . . . . . . . . . Peng Li and Yanjiang Wang

104

A Cosegmentation Method for Aerial Insulator Images . . . . . . . . . . . . . . . . Yincheng Qi, Lei Xu, Zhenbing Zhao, and Yinping Cai

113

A Study on H.264 Live Video Technology with Android System . . . . . . . . . Jie Dong, Jitao Xin, and Peng Zhuang

123

Solar Wafers Counting Based on Image Texture Feature . . . . . . . . . . . . . . . Qian Zhang, Bo-quan Li, Zhi-quan Sun, Yu-jun Li, and Chang-yun Pan

133

Robust Image Feature Point Matching Based on Structural Distance. . . . . . . Maodi Hu, Yu Liu, and Yiqiang Fan

142

A Novel MRF-Based Image Segmentation Approach . . . . . . . . . . . . . . . . . Wei Liu, Feng Yu, and Chunyang Gao

150

Flexible Projector Calibration in the Structured Light 3D Measurement System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Haitao Wu, Biao Li, Jiancheng Zhang, Jie Yang, and Yanjun Fu

158

Computational Complexity Balance Between Encoder and Decoder for Video Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Shuting Cai and Zhuosheng Lin

167

Application of Gravity Center Track in Gait Recognition Robust to Influencing Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chengyi Chen, Xin Chen, and Jiaming Xu

176

Data Virtualization for Coupling Command and Control (C2) and Combat Simulation Systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiangzhong Xu, Jiandong Yang, and Zaijiang Tang

190

Ship Recognition Based on Active Learning and Composite Kernel SVM . . . Bin Pan, Zhiguo Jiang, Junfeng Wu, Haopeng Zhang, and Penghao Luo

198

Extraction of Plaque Characteristics in Iris Image. . . . . . . . . . . . . . . . . . . . Jing Yu and Hong Tian

208

Research of Remote Sensing Image Compression Technology Based on Compressed Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tong Yu and Shujun Deng A Novel Camera-Based Drowning Detection Algorithm . . . . . . . . . . . . . . . Chi Zhang, Xiaoguang Li, and Fei Lei

214 224

Contents

XI

The Extraction of Rutting Transverse Profiles’ Indicators Using 13-Point Based Lasers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tian-tian Li, Xue Wang, and Hui-qi Liu

234

A 3D Reconstruction Method for Gastroscopic Minimally Invasive Surgery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dong Yang, Yinhong Zhao, Jiquan Liu, Bin Wang, and Huilong Duan

243

Learning Based Random Walks for Automatic Liver Segmentation in CT Image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pan Zhang, Jian Yang, Danni Ai, Zhijie Xie, and Yue Liu

251

Corner Detection Algorithm with Improved Harris . . . . . . . . . . . . . . . . . . . Li Wan, Zhenming Yu, and Qiuhui Yang

260

Fusion of Contour Feature and Edge Texture Information for Palmprint Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Gang Wang, Weibo Wei, Zhenkuan Pan, Danfeng Hong, and Mengqi Jia

272

Design of Interactive System for Digital Splash-Color Painting Based on the Mobile Platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wang Yan, Yongjing Wang, and Ou George

282

Micro-video Segmentation Based on Histogram and Local Optimal Solution Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bin Zhang and Yujie Liu

292

A Merging Model Reconstruction Method for Image-Guided Gastroscopic Biopsy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Juan He, Yinhong Zhao, Jiquan Liu, Bin Wang, and Huilong Duan

300

A Novel Terrain Rending Method for Visual Navigation . . . . . . . . . . . . . . . Liyun Hao, Jiao Ye, Wu Lingda, and Cao Rui

308

Decision Mechanisms for Interactive Character Animations in Virtual Environment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Du Jun and Qiang Liang

317

A Fast Optical Method to Estimate Melanin Distribution from Colour Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tang Chaoying and Biao Wang

324

An Adaptive Detection Algorithm for Small Targets in Digital Image . . . . . Shumei Wang

333

A Leaf Veins Visualization Modeling Method Based on Deformation. . . . . . Duo-duo Qi, Ling Lu, and Li-chuan Hu

340

XII

Contents

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zixi Jia, Linzhuo Pang, Jiawen Wang, and Zhenfei Gong

353

Paper Currency Denomination Recognition Based on GA and SVM . . . . . . . Jian-Biao He, Hua-Min Zhang, Jun Liang, Ou Jin, and Xi Li

366

Simulation Research for Outline of Plant Leaf . . . . . . . . . . . . . . . . . . . . . . Qing Yang, Ling Lu, Li-min Luo, and Nan Zhou

375

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Wenqi Lu, Jinming Duan, Weibo Wei, Zhenkuan Pan, and Guodong Wang

386

Automated Cloud Detection Algorithm for Multi-Spectral High Spatial Resolution Images Using Landsat-8 OLI . . . . . . . . . . . . . . . . . . . . . . . . . . Yu Yang, Hong Zheng, and Hao Chen

396

Fast Image Blending Using Seeded Region Growing . . . . . . . . . . . . . . . . . Yili Zhao and Dan Xu

408

Research on the Algorithm of Multidimensional Vector Fourier Transformation Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yue Yang, Aijun Sang, Liyue Sun, Xiaoni Li, and Hexin Chen

416

A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation and Confidence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . Hang Li, Ze Wang, Tian Tian, and Hui Huang

423

Rapid Recognition of Cave Target Using AirBorne LiDAR Data . . . . . . . . . Guangjun Dong, Zhu Chaojie, Haifang Zhou, and Sheng Luo

433

Robust Visual Tracking via Discriminative Structural Sparse Feature . . . . . . Fenglei Wang, Jun Zhang, Qiang Guo, Pan Liu, and Dan Tu

438

Research on Images Correction Method of C-arm Based Surgical Navigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jianfa Zhang, Shaolong Kuang, Yu Yaping, Liling Sun, and Fengfeng Zhang

447

Erratum to: A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair Histogram . . . . . . . . . . . . . . . . . . . Fusheng Yang and Tiegang Gao

E1

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

455

Viewpoints Selection of 3D Object Recognition Based on Manifold Topological Multi-resolution Analysis Method Xiang Wang, Huimin Ma(), and Jiayun Hou Department of Electronic Engineering, Tsinghua University, Beijing, China [email protected], [email protected]

Abstract. Viewpoints selection is a key part in 3D object recognition since 3D object can be represented by a set of 2D projections. In this paper, we discuss a new method to select viewpoints of 3D object recognition. Based on manifold topological multi-resolution analysis method (MMA), manifold information which represents the intrinsic feature of 3D objects is used, thus the viewpoints selected are more distinctive. We compared with “7 viewpoints method” which provides us a simple and effective way to select viewpoints. Experiments demonstrate that the method based on MMA is effective and performances better than “7 viewpoints method” in 3D object recognition. Keywords: 3D object recognition · Manifold topological multi-resolution analysis · Viewpoint · LLE · 2D projections

1

Introduction

3D object recognition is a hard task since object looks differently in different viewpoints [3, 6]. A real 3D object can be represented as a set of 2D projections, so it’s very important to choose some robust viewpoints. In [1], a simple and effective method has been proposed, which choose 7 viewpoints based on a simple view set. They constructed seven view sets based on graph matching method [2], containing 13 possible views for each model. These possible views including three side views, four corner views and six edge views. Fig.1 is an example which shows two directions of each view type of a cube. Fig.2 shows a 3D airplane and some 2D projections from different views [1]. Based on the 13 possible views, they constructed seven “view sets” by combining different type of viewpoints. Table 1 shows the detail about the combination of the seven view sets [1]. Many experiments demonstrate that the combination between 3 side views and 4 corner views performs best. So they proposed to choose 3 side views and 4 corner views as viewpoints for 3D object recognition, which we called it “7 viewpoints method” in this paper. Manifold topological multi-resolution analysis method (MMA) [3] is proposed on basis of locally linear embedding (LLE) [7, 8]. In this paper, we select viewpoints of 3D object recognition based on the manifold topological multi-resolution analysis © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 1–9, 2015. DOI: 10.1007/978-3-662-47791-5_1

2

X. Wang et al.

method. We conducted som me experiments and found that the method based on MM MA is consistent with “7 viewp points method” when applied to simple 3D object such aas a cube. What’s more, since the method based on MMA takes advantage of maniffold information, when applied to other objects, the viewpoints selection method basedd on MMA performs better than n “7 viewpoints method” which demonstrates that this m method is effective. The remaainder of this paper proceeds as follow. In section 2, we introduce the viewpoints seelecting method by using MMA. In section 3, we comppare the method based on MMA A with “7 viewpoints method”, both consistence and diffference are discussed. In sectiion 4, some recognition experiments have been conduccted to prove the advantage of viewpoints v selection method based on manifold topologgical multi-resolution analysis method. m And conclusion is made in section 5.

Fig. 1. An n example of three direction types. (From [1])

Fig. 2. A 3D airplane and some 2D projections from different view. (From [1])

Viewpoints Selection of 3D Object Recognition

3

Table 1. The detail of the seven view sets. (From [1])

Database name S C E SC SE CE SCE

2

3 Siide

4 Corner

6 Edge

√ √ √ √ √ √

√ √ √

√ √ √

Total 3 4 6 7 9 10 13

Viewpoints Selection Method

As in [3], we select viewpo oints by using manifold topological multi-resolution anaalysis method (MMA). A reall 3D object can be represented as a set of 2D projectiions [4], so a common method for f 3D object recognition is to choose a set of effective 2D projections. And the key point p lies on the method for choosing projections, namely, viewpoints. An example iss shown in Fig.3, Fig.3a is a model of an airplane choosen from PSB [5], and Fig.3b shows a set of 2D projections of that airplane with eqqual sample intervals. MMA tak kes advantage of manifold information, the study of MM MA in [3] indicates that, by on nly selecting the local extremas of a manifold, the origiinal manifold can be represented d by them using linear interpolation. So the local extrem mas can be selected as distincttive viewpoints. While “7 viewpoints method” selectting viewpoints by fixed directio on, this method selects viewpoints using manifold inform mation. Thus, the viewpoints selected by this method are more distinctive and perfoorm better as we can see in the experiments. e

Fig. 3. (a) Viewpoint space off an airplane. (b) 2D Projection images of the airplane samppled from equal intervals. (From [3])

4

X. Wang et al.

3

Comparison

In this section, we compare the viewpoint selection method based on MMA with the “7 viewpoints method” in [1]. First, we show that these two methods are consistent to some extent. In consideration of the way of selecting possible viewpoints in the “7 viewpoints method” [1], we choose a cube as an example for brevity. The method of obtaining 2D projections is the same as in Fig.3. By using MMA method, we get the bottom 6 eigenvectors which is shown as in Fig.4.

a

b

c

d

e

f

Fig. 4. a ~ f are the bottom 6 eigenvectors in order

When we choose 7 viewpoints by finding its local extremas in the bottom 2 eigenvectors, we find out that these 7 viewpoints are almost the same as those in the “7 viewpoints method”. It shows us these two viewpoint selection methods are consistent for simple 3D object and also verifies the effectiveness of the method based on MMA. However, in most cases, the viewpoints chosen by “7 viewpoints method” and MMA are not identical. Table 2 shows some examples. We find out that these two methods both choose side views, but MMA selects the viewpoints more freely while the “7 viewpoints method” is fixed. Take the “Pig” as an example, in “7 viewpoints method”, the 3rd to 7th viewpoint are very similar, they are all side views. However,

Viewpoints Selection of 3D Object Recognition

5

in our method, they look very v different since different views are selected. Thus, the method based on MMA tak kes advantage of information about manifold structure effectively which is helpful fo or object recognition.

4

Recognition Exp periments

The “7 viewpoints metho od” is simple and efficient while the method based on MMA takes advantage of information i about manifold structure which is helpful for object recognition. In this section, s we will show the effectiveness of MMA by soome recognition experiments. First, F we select distinctive viewpoints using two methhods respectively. Then we com mpare all viewpoints with 3D objects and classify them m to the nearest class, namely, class c with minimum feature distance. Table 3 shows soome results of the recognition test. These experiments demonstrate that the viewpoints selection method based on MMA performs better than “7 viewpoints method” .Figg. 5 shows the thumbnails of thee 3D objects in the test. Tablle 2. Some results of the recognition test

3D objects

“7 viewpoints method”

MMA

M100 M101 M102 M110 M111

11.42% 5.79% 7.25% 7.25% 13.66%

13.66% 11.19% 10.80% 7.41% 17.28%

Fig. 5. Thumbnails of the 3D objects in the test

5

Conclusion

3D object can be representeed by a set of 2D projections. Since objects look differenntly in different views, it’s very y important to select distinctive viewpoints. “7 viewpoints method” [1] and method baased on MMA [3] are both effective in selecting distincttive viewpoints. In this paper, we w compared these two methods in 3D object recognitiion. Although “7 viewpoints method” provides us a simple and useful way to select ddistinctive viewpoints, the vieewpoints selection method based on MMA performs beetter since it takes advantage of manifold information. In conclusion, viewpoints selecttion method based on MMA is effective e in 3D object recognition.

6

X. Wang et al. Tab ble 3. Examples of viewpoints selection

3D

The botto om

“7 viewpoints me-

object

6 eigenvecttors

thod”

Cuboid

MMA

Viewpoints Selection of 3D Object Recognition Table 3. (Continued)

Plane

7

8

X. Wang et al. Table 3. (Continued)

Pig

Acknowledgements. This wo ork was supported by National Natural Science Foundationn of China (No. 61171113).

Viewpoints Selection of 3D Object Recognition

9

References 1. Min, P.: A 3D model search engine. Diss. Princeton University (2004) 2. Cyr, C.M., Kimia, B.B.: 3D object recognition using shape similiarity-based aspect graph. In: Proceedings. Eighth IEEE International Conference on Computer Vision, ICCV 2001, vol. 1. IEEE (2001) 3. You, S., Ma, H.: Manifold topological multi-resolution analysis method. Pattern Recognition 44(8), 1629–1648 (2011) 4. Schiffenbauer, R.D.: A survey of aspect graphs (2001). URL http://citeseer.ist.psu.edu/ schiffenbauer01survey.html 5. Shilane, P., et al.: The princeton shape benchmark. In: Proceedings, Shape modeling applications. IEEE (2004) 6. Ma, H., Huang, T., Wang, Y.: Multi-resolution recognition of 3D objects based on visual resolution limits. Pattern Recognition Letters 31(3), 259–266 (2010) 7. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000) 8. Saul, L.K., Roweis, S.T.: Think globally, fit locally: unsupervised learning of low dimensional manifolds. The Journal of Machine Learning Research 4, 119–155 (2003)

Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs Tao Chen, Yiguang Liu(), Jie Li, and Pengfei Wu Vision and Image Processing Laboratory, College of Computer Science, Sichuan University, Chengdu 610065, People’s Republic of China [email protected]

Abstract. The Phase Correlation(PC) method demonstrates high robustness and accuracy for measuring the very subtle disparities from stereo image pairs, where the baseline (or the base-to-height ratio) is unconventionally narrowed. However, this method remains inherently computationally expensive. In this paper, an adaptive PC based stereo matching method is proposed, aiming to achieve higher speed and better stereo quality compared to the existing methods, while also preserving the quality of PC. Improvement was achieved both algorithmically and architecturally, via carefully dividing the computing tasks among multiprocessors of the GPUs under a novel global-pixel correlation framework. Experimental results on our hardware settings show that the method achieves as high as 64× and 24× speedup compared to single threaded and multi-threaded implementation running on a multi-core CPU system, respectively. Keywords: Stereo matching · Narrow baseline · Phase Correlation(PC) · CUDA

1

Introduction

Stereo vision is an attractive topic in the realm of computer vision, while stereo matching [3], targeting at extracting disparity information from a pair of images, is the corner stone of the entire task. Though wide-baseline stereo matching [6], [10] is commonly used because of its high estimation accuracy, its narrow-baseline counterpart is, in contrast, much more challenging due to narrow triangulation. It is nevertheless worth addressing because it can alleviate the occlusion problem (Fig. 1), while requiring a smaller disparity search range. To attain high accuracy, PC was integrated into narrow-baseline stereo matching [1], [8], [11]. The PC based method is more robust in illumination changes than simple correlation function based matching, while able to measure the very subtle disparities that result from low base-to-height ratio (e.g., less than 0.1). Thus potentially allowing applications, such as digital elevation models (DEMs), to be derived from images that previously might not have been considered suitable for stereo vision. However, the PC based method remains inherently computationally expensive. With the increasing of image resolutions, the computational time may even become prohibited. © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 10–17, 2015. DOI: 10.1007/978-3-662-47791-5_2

Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs

wide baseline

11

narrow baseline

Fig. 1. Differences of occlusion zones. Occlusions in the left image of the stereo pair are signaled by horizontal lines, occlusions in the right image of the stereo pair are signaled by slanted lines. We see that in a wide-baseline system, the occlusion differences are much more critical than in a narrow-baseline system.

To attain fast speed, stereo on graphics processing units (GPUs) is an attractive trend. Specially, CUDA (Compute Unified Device Architecture) is a modern GPU architecture designed for writing and running general-purpose applications on the nVIDIA GPUs. Utilizing the horsepower of massive parallel processors, CUDA is effective to accelerate stereo algorithms by exploiting their potential parallelism. Several recent methods have reached fast speed on CUDA. Both Sarala [2] and Zhu [12] implemented a Normalized Cross Correlation (NCC) approach and obtained significant improvement in terms of computational efficiency. Their approach fails to maintain the matching quality, however, if the baseline is unconventionally narrowed [8]. Kentaro [7] suggested to perform parallel PC within a single image-block pair. It offers great speed improvement, only if the image-block size is sufficient large. Unfortunately, this is not the case in stereo matching. In most applications, the size of image-block is fairly small while the number of blocks is very large. Due to the limitation of memory bandwidth, this approach does not have a noticeable effect on the run-times. Similarly, the PC method accelerated for image fusion (e.g. Falk [9]), is also not suitable for stereo matching. In this paper, the PC method is re-examined, and a novel stereo matching framework based on CUDA especially optimized for narrow-baseline scenario is proposed. Using both algorithmic and architectural means, we carefully divide the task among multiprocessors of the GPUs and exploit its texture memory. Furthermore, we compare our results against single and multi-threaded CPU based implementation. Experimental results demonstrate the significant speedup of our approach. The remainder of this paper is organized as follows. Section 2 briefly introduces the PC algorithm for narrow-baseline stereo matching. Section 3 gives a detailed description of the proposed framework of CUDA PC. The algorithm’s performance are analyzed in detail in Section 4. Finally, we conclude in Section 5.

2

PC for Narrow-Baseline Stereo

Based on the well-known Fourier shift property [5], the PC method is developed to estimate the translation displacement between two images. Consider two

12

T. Chen et al.

2.4 2.3 2.2 2.1 2.0 1.9 1.8 a

b

c

Fig. 2. (a) Stereo pair of the stone lion with B/H ratio = 0.001. (b) Disparity map after meddian filtering. (c) Textured reconstrruction result.

images I1(x,y), I2(x,y) that arre offset by simple translation a and b such that I2(x,y) = I1(x − a,y − b) ,

(1)

Ts) Î1 and Î2 are related by the Fourier shift property, suuch their Fourier transforms (FT that (2) Iˆ2(u,v) = Iˆ1(u,v)e−2πi(au+bv) . This can be re-written as (3) mplex conjugate of Î2, and P(u,v) is referred to as the norm maWhere denotes the com m (NCPS) of the two signals. There are two possible w ways lized cross power spectrum of solving (3) for (a,b) [4]. One is to work in the Fourier domain directly. Employying singular value decompositiion (SVD) and robust 2-D fitting algorithm, [8] solve the he Fourier domain and achieved sub-pixel disparity m meaphase difference within th surement. The second possiible approach is to first transform the NCPS back into sspample matter to determine(a,b), since from (3) the resullt is tial domain. It is then a sim δ(x − a,y − b) which is a Dirac delta function centered at (a,b). The sub-pixel trannslag interpolation-based approached after determining the m max tion can be estimated using peak of the correlation surfaace on the integer-accuracy grid [4]. Both the Fourier domain and the spatial domain methods have been reportedd to achieved up to 1/20th pixel accuracy. The latter is employed in our implementation due d simplicity. As expected, this method is capable of precissely to its relative efficiency and and directly measuring the fractional disparities that result from unconventionally nnarges which would otherwise not be considered suitable for row baseline images, imag conventional stereo processiing. Fig. 2(a) shows the resulting stereo pair. Due to the vvery low B/H ratio, the images arre geometrically very similar and thus enjoy high corresppondence with little occlusion. This work focuses on tackling the challenge of fast steereo ne scenario, as described in the next section. matching for narrow-baselin

Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs

3

13

CUDA PC

The CUDA environment exposes the single instruction multiple data (SIMD) architecture of the GPUs by enabling data-parallel computation for millions of threads. These threads are organized into a grid of thread-blocks. The highest performance is achieved when the threads avoid divergence and perform the same operation on their data elements. Overall, the prime computational challenge

Fig. 3. Workflow of the proposed algorithm

in disparity estimation lies in the PC calculation for a huge number of image block pairs, aiming to find the translation displacement of each pixel. In this implementation, we use the CUFFT function of CUDA library for 2D FTs. The global correlation is employed to co-register the right image to the left via shift change. Thus allowing a smaller image-block size for pixel correlation, which is helpful to improve the estimation accuracy while reducing the computational cost. A parallel block-cut procedure is designed to make the PC calculation for each pixel independent to other pixels. And then the translation displacement of each pixel can be computed in parallel. Fig. 3 shows the principle workflow of the proposed algorithm. 3.1

Global Correlation.

For narrow-baseline images, it is worth to note that the disparities are likely to be in a small range, which is less than the size of image-block. Thus the process can be efficient as no search region is ever required. The only question remains of how to locate the corresponding block in the right image for each block in the left. We therefore employ a global correlation procedure to estimate the global translation relationship. Although, we assume that the images have been acquired with or resampled close to epipolar geometry.

14

T. Chen et al.

Given image resolution W × H, the global correlation consists of four steps: F FTs for input images, NCPS callculation, inverse FTs and global shift calculation. We use the highly optimized CUFF FT function of the CUDA for FTs and inverse FTs. NC CPS calculation is parallelized with w W × H thread. The threads are organized into a 2D ggrid and the thread-block size iss 32×32. Each thread takes care of computing an indeppendent coefficient at a given frequency f component. Finally, the so-called phase correelation surface (PCS) can be ob btained as p(x,y). In this step, an integer-level estimationn is sufficient, so the global shifft parameters (xg,yg) is given by (xg,yg) = argmax|p(x,y)| ,

(4)

x,y

which can be linear processed on the CPU. 3.2

Pixel Correlation

The difference between glo obal correlation and pixel correlation is subtle, but esssentially different, especially at the parallel level. The novel block-cut proccedure plays a crucial role for improving the parallel levvel, which is also parallelized on o the GPU, as depicted in Fig. 4. A 2D grid with W × H threads is created for each image, and every B × B threads takes care of copying a B × B section centered at a pixell to an individual memory space, where B(pixel) means the side length of the image-bllock. Considering global shift (xg,yg), an image-block ccentered at (x,y) in the left imag ge is then paired with a image-block taken at (x + xg,y + yg) in the right. Also, we use texture memory for accessing images due to the fact that our copy procedure exhibit 2D locality, which provides a great scope for texture cachiing. Thus before block-cut, two o input images are mapped to the texture memory sppace previously. Then, Npar independent block b pairs are prepared for the following FTs, NCPS ccalculation, and inverse FTs prrocedures, with . Finally, we get Npar PCSs at a time. Unliike global correlation, the NCP is approximated as [4] (5) Where α < 1, the peak posittion of the function corresponds to the global displacem ment between the two images, an nd the α corresponds to the degree of correlation betweeen the two images. After locatiing the main peak at some coordinates

Fig. 4. block-cut procedure

Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs

15

(xm,ym) and two side-peaks at (xs,ym) and (xm,ys) where xs = (xm ± 1)%B and ys = (ym ± 1)%B, the sub-pixel offset (∆xp,∆yp) are decided by linear weighting such that (6) For a single PCS, the major computing task lies in searching for the main peak, which is a linear process and shows greater advantage when processed on CPU. In spite of this, we distribute the computing task to GPU. The reason is that a Npar times parallelism not only can offset the inferiority on GPU, but also performs far more efficiently than a sequential process on CPU. Afterwards, the disparity(dx,dy) is decided such that dx = xg + xm + ∆xp ,dy = yg + ym + ∆yp .

(7)

Repeat the above steps, we extract the disparity information for each pixel. In addition, small disparity outliers are filtered using a median filter, resulting in the disparity map shown in Fig. 2(b) and the absolute final reconstruction result in Fig. 2(c).

4

Experiments

The hardware environment is based on Intel CoreI5-3470 CPU @ 3.2 GHz and NVIDIA GTX770 graphics card. In order to evaluate the efficiency, the proposed method was compared against both the single-threaded and 4-threaded CPU implementation. We have used stereo pairs of different sizes with image-block size set to 16 × 16 and 32 × 32 respectively. The timing results are summarized in table 1. Our results demonstrate that the proposed method outperforms the CPU based implementation by a huge factor. For a stereo pair of size 1024×768 with image-block size set to 16 × 16, the new method takes 0.6 seconds, bringing an impressive 64× speedup with respect to the single-threaded CPU implementation. Even compared to the 4-threaded CPU implementation, our method can Table 1. Timing results for stereo matching CPU-1 thread(s) CPU-4 threads(s) 16 × 16 32 × 32 16 × 16 32 × 32 320 × 240 3.2 5.5 1.1 2.2 640 × 480 14.0 29.5 5.0 10.4 1024 × 768 38.6 78.3 14.9 29.5 1600 × 1200 91.0 193.0 30.7 75.5 2048 × 1536 150.9 345.1 58.4 121.1 3024 ×2016 332.7 756.8 126.4 276.5 Image Size

This method(s) 16 × 16 32 × 32 0.2 0.8 0.3 1.6 0.6 2.7 1.5 4.8 2.6 8.0 5.6 19.4

16

T. Chen et al.

Fig. 5. Plot of the speedups achieved compared to the two CPU implementations

perform 24 times faster. Fig. 5 shows a detailed plot of the speedups achieved compared to the two CPU implementations. Not surprisingly, the speedup ratio increases along with the increase in image size. Theoretically, it can be explained that a larger image size means more image-block pairs parallelized at a time, with the relationship of . Then the speedup ratio comes to fluctuate in a small scope, corresponding to the full capacity for our NVIDIA GTX770 GPU. It is also obvious that reducing the image-block size helps to increase the parallel number. Nevertheless, the interval [16,32] for the side length of image-block has been empirically proved to be suitable for the narrow-baseline scenario, with regard to matching accuracy. Here, we see that the excellent performance on efficiency of our method lies in a high-level parallelization, not in a common speed-accuracy tradeoff. Hence the quality of PC in the narrow-baseline scenario is well maintained. With high resolution stereo image pairs, our method has the ability to provide disparity information instantly with high accuracy.

5

Conclusion

In this paper, a novel stereo matching framework based on CUDA especially optimized for narrow-baseline scenario is proposed. We employed global correlation to improve the estimation accuracy while reducing the computational cost. Via a crucial block-cut procedure, we carefully divide the task among multiprocessors of the GPUs in a high parallel level. Texture memory is also used, providing a great scope for accessing images. Experimental results demonstrate that the proposed method outperforms the CPU based implementation by a huge factor, which is capable of instantly and precisely measuring the fractional disparities in narrow-baseline scenario. Acknowledgments. The authors thank the editors and anonymous reviewers for their insights. This work is supported by NSFC under grants 61173182 and 613111154, funding from Sichuan Province (2014HH0048, 2014HH0025) and the Science and Technology Innovation seedling project of Sichuan (2014-033, 2014034, 2015-046, 2015-095).

Fast Narrow-Baseline Stereo Matching Using CUDA Compatible GPUs

17

References 1. Arai, T., Iwasaki, A.: Fine image matching for narrow baseline stereovision. In: 2012 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 2336–2339. IEEE (2012) 2. Arunagiri, S., Jaloma, J.: Parallel gpgpu stereo matching with an energy-efficient cost function based on normalized cross correlation. In: IS&T/SPIE Electronic Imaging, International Society for Optics and Photonics, pp. 86550X–86550X (2013) 3. Brown, M.Z., Burschka, D., Hager, G.D.: Advances in computational stereo. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 993–1008 (2003) 4. Foroosh, H., Zerubia, J.B., Berthod, M.: Extension of phase correlation to subpixel registration. IEEE Transactions on Image Processing, 11(3), 188–200 (2002) 5. Kuglin, C.: The phase correlation image alignment method. In: Proc. Int. Conf. Cybernetics and Society, September 1975, pp. 163–165 (1975) 6. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vision Comput. 22(10), 761–767 (2004) 7. Matsuo, K., Hamada, T., Miyoshi, M., Shibata, Y., Oguri, K.: Accelerating phase correlation functions using gpu and fpga. In: NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2009, pp. 433–438. IEEE (2009) 8. Morgan, G.L.K., Liu, J.G., Yan, H.: Precise subpixel disparity measurement from very narrow baseline stereo. IEEE Transactions on Geoscience and Remote Sensing 48(9), 3424–3433 (2010) 9. Schubert, F., Mikolajczyk, K.: Benchmarking GPU-based phase correlation for homography-based registration of aerial imagery. In: Wilson, R., Hancock, E., Bors, A., Smith, W. (eds.) CAIP 2013, Part II. LNCS, vol. 8048, pp. 83–90. Springer, Heidelberg (2013) 10. Tola, E., Lepetit, V., Fua, P.: Daisy: An efficient dense descriptor applied to widebaseline stereo. IEEE Transactions on Pattern Analysis and Machine Intelligence 32(5), 815–830 (2010) 11. Yiguang, L., Chenhui, Z., Ronggang, H., Baofeng, D.: Rectification-free 3dimensional reconstruction method based on phase correlation for narrow baseline image pairs( ). Journal of University of Electronic Science and Technology of China 43(2), 262–267 (2014) 12. Zunshang, Z., Zhen, G., Shengyi, C., Xiaoliang, S., Yang, S.: Research on cudabased image parallel dense matching. In: Chinese Automation Congress (CAC), 2013, pp. 482–486. IEEE (2013)

高精度窄基线三维重建算法

勿需图像矫正的

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision Gang Xiao, Wei-kang Fan(), Jia-fa Mao, Zhen-bo Cheng, and Hai-biao Hu Zhejiang University of Technology, Academy of Computer Science and Technology, Hangzhou 310023, Zhejiang, China {GangXiao530332017,Wei-kangFan530332017,Jia-faMao530332017, Zhen-boCheng530332017,Hai-biaoHu530332017}@qq.com

Abstract. Biological water quality monitoring is an emerging technology. The change of water quality can be quickly tested by using the sensitiveness of aquatic organisms to water environmental change. However, how to extract semantics of fish behavior from the video data is the key technical point of achieving water quality testing. On the base of quantifying fish behavioral characteristics, the essay puts forward the semantic descriptive model of fish behavior. By grouping the parameter of the amount and average height of multitarget fish movement and extracting the semantics of each group, the semantic descriptive network of fish behavior characteristics and water quality finally was set up. The experimental data show that the semantic descriptive network can better characterize the water quality in high temperatures. This provides the theoretical base for the applications of biological water quality tests by the behaviors of fish groups. Keywords: Biological water quality monitoring · Fish behavior · Semantic descriptive model

1

Introduction

The method of water quality test by using fish is a comparatively popular study in recent years [1]. The change of fish behavior can be easily observed when its living environment has changed. Some verification has been made that fishes are sensitive to pressure from their environment [2]. Therefore, the changes of water environments can be tested indirectly by observing whether fish behavior is normal or abnormal. The traditional testing method is that a single fish is usually used for water quality test, for example, fish behavior is quantified by the parameter which is obtained through watching fish swimming speed, acceleration, swimming trace, swing frequency, etc[3-5]. The advantage of this monitoring method is relatively simple to process. But the behavior of a single fish will vary, for different individual fish is different to different water resistance, and surely this will lead to the decrease of the forecast accuracy. So more and more researches focus on the multi-fish behavior. It is convenient for us to use computer vision technology to record the fish activities, but how to extract © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 18–27, 2015. DOI: 10.1007/978-3-662-47791-5_3

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision

19

the eigenvalue from the records is still challenging. Lu et al. [6] designed a kind of fish behavior automatic monitoring software (system) based on computer vision, by which changes in body color and swimming speed of fish group can be timely monitored to determine the unusual behavior of fish group. Michael J. Barry[7] demonstrate a novel method for measuring the effects of toxicants on the behavior of groups of unmarked fish, absolute swimming velocity, by using Ctrax the fish’s forward swimming velocity, rate of change in orientation and distance to wall were calculated in a small arena. However the target fish is always moving in a three-dimensional space, so it is obviously a difficult job for you to track and mark the activities of every fish, and then to calculate their speed or acceleration. Many new methods have been developed to solve the problem in the latest studies, and among those methods visible implant fluorescent or elastomeric tag[8] are more used to mark every target fish so that we can accurately follow the target fish while monitoring. However, these monitoring methods have some potential risks; that is to say, those labeling substances may affect the living habits of fishes implicitly. The second way to solve the problem is to analyze the recorded video by the correlative algorithm to trace the target [9-10]. Although a good calculation method can be used to trace many multi-fish targets at the same time, it requires that the density of fish group should not be too high. When the density exceeds a certain level, the obscuration among target fishes becomes so often that it will cause the tracking accuracy to be greatly reduced. To avoid the influence caused by the tracking accuracy, B. Sadoul et al. [11] proposed a non-dependent tracking method, which indirectly characterizes these parameters such as speed, fish distribution which are difficult to measure by calculating the overall activity amount and contours of the whole fish group. This method can avoids such an embarrassing situation that the calculation cannot be proceeded because of the target losing when tracking more fish, what’s more, this method has some limitations --- a more accurate value can be figured out only in case of moderate fish densities. To sum up, all the methods mentioned above have higher requirements of tracking algorithms. The characteristic quantities such as the target speed, acceleration, trajectory and so on are calculated after tracked, and then the quantities will be used to determine whether the behavior of the fish group is normal or not. Therefore, the tracking accuracy directly influence on the result of the anomaly detection. The essay tried to improve the algorithm of activity quantity by adding average-height as a new characteristic parameter. By studying the semantic meaning that the two different characteristic values output, combined with people's prior knowledge, we can ultimately obtain the semantic description of the multi-target fish abnormal behavior.

2

Materials and Method

2.1

Introduction to the Acquisition Platform

The essay mainly quantifies the two parameters of the activity amount and average height of multi-fish as the eigenvalue of fish behavior. In order to collect the height information of fish in water, the paper presents a collection platform shown in Figure 1. The whole platform is sealed in a box in order to prevent interference with the

20

G. Xiao et al.

external environment. A plane mirror is placed on the left side of the aquarium/tank (forming an angle of 450 between the horizontal plane and the mirror), so that the camera can collect the data of the fish both in the tank and in the mirror when it is shooting. On one hand this kind of design can easily achieve the fish-in-water height information, and on the other hand the amount of fish activity obtained from the two angles will be more accurate. This is because the quantity change of fish activity in the vertical direction of the camera cannot be extracted simply from the directly captured image perspective.

e an Pl r ro ir m

Fig. 1. Video Acquisition Platform

2.2

Quantification of Activity Amount/Volume

Activity amount refers to the measurement in the state change of fish in a time unit. The faster fish moves, the larger the activity amount is and the more obviously its position changes. B. Sadoul et al. [11] presents a frame difference method for the extraction of activity amount. He argues that the remaining area can be obtained by subtracting the gray values in the former and latter frame images of the fish video, and then the remaining area is divided by the time between the two frames, we can obtain a result. The result is regarded as measures of fish activity amount. The essay tries to explore a new way to improve this method. Firstly, the two images are binarized. After that we process the binarization images by XOR operation to get the remaining area. Finally the remaining area is divided by the time between the two images to get a figure. The figure is used as the activity amount of fish group. Its calculation process is shown in Figure 2. The remaining area of images obtained by XOR operation of the two frames is shown in Figure 3. And its formula is as follows:

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision

21

m

A=

ft

 area

j

j =1

( f t XORf t −1 ) (1)

Δt

refers to the binarization image of Image t; function of

area

culate the remaining area after XOR operation of the two images;

is used to cal-

j refers to the

number of the remaining area, j = 1...m .

Fig. 2. Original and binarization image of Frame i and Frame i+2.

Fig. 3. The result obtained from the frame difference

2.3

Quantification of Average Height

Different fishes stays in different water layers and the height that they are in water is different as well. When their living environment changes, fishes usually float. The height that fishes stay increase at the moment. Therefore the height information as the eigenvalue of fish behavior can perfectly described the status of the fish behavior. The paper tries to get the fish height information by setting a plane mirror on the left side of the tank. When shooting from up to down, the images can be attained from two angles. As Figure 4 shows, the image on left is the one in the mirror and the one

22

G. Xiao et al.

on right is a bird-view of the tank. The line of intersection of the left and right images is the tank bottom. The closer the fish in the image is to the left, the farther it is away from the bottom and the higher its height is. If the real height of the is accurately calculated, the principles such as water refraction, plane mirror reflection, camera imaging, etc. must be take into consideration.

Fig. 4. Experimental platform imaging model

As Figure 4 shows, according to the principle of camera imaging, the four points OPSQ in the space can be deduced to be coplanar, and furthermore, the plane crosses the Z-axis. Therefore, the general formula of this plane can be described as follows:

aX+bY=0 (2) After the coordinates Q' and S are substituted into Formula (2), and the proportional relationship can be obtained between the two points: After simplified, the formula is as following:

 x s = u q ( H − h) / f   y s = v q ( H − h) / f

(3)

Among them, xs, ys, zs are the intersection coordinate of PQ and the plane. uq, vq, H-f physical coordinate of Q in the image, f refers to the camera focal length, and H and h refer to the camera height.

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision

23

According to the plane mirror imaging principle, the relationship between the image in the mirror and the real is symmetry. So in the coordinate system OXYZ, coordinate R and coordinate Q have the following relationship:

zQ =

u r Hf + u r u q H

(4)

ur uq + f 2

zQ is just coordinate Z in the coordinate system OXYZ. The system uses tank bottom as plane XOY, so zQ equals to the height of an objective in water.

3

Semantic Description of Fish Abnormal Behavior

Generally speaking, information description in the video can be divided into 3 levels[12]: (1) the low-level features, namely data as pixel values, veins, etc.; (2) the temporal and spatial information of the physical object, that is to say, what appears in the image; (3) semantic information refers to what human perceive when watching video, namely what is happening in the video. Currently the eigenvalues such as fish speed, acceleration, tail frequency, trajectory and so on are more commonly described based on the second level of fish abnormal behavior. But the single parameter descriptions sometimes cannot fully describe all abnormal situations and may have a larger error. In this paper the semantic description of group behaviors of fish shoals within the known range are processed by grouping the parameters of activity and average height output at different thresholds. The specific process is shown in Figure 5.

Video data Activity amount

Average height

Low-level Information

Feature Information

Combinations of the feature signal amount Semantic information

Semantic Information

Fig. 5. Semantic Output Process of Fish Behavior

While thresholding, the activity amount and the average height are divided into three threshold levels: low, medium and high. According to the different output value group of the activity amount and the average height, we get the figure of the semantic network of the fish behavior described as Figure 6.

24

G. Xiao et al.

Fig. 6. Semantic network of the fish behavior: H = High, M = Middle, L = Low.

Each output node of the semantic description network represents the current behavior state of a fish, for instance, when the activity amount output is low while the average height output is high, the semantic information that “a fish swims near the surface of the water at a low speed” is obtained, but in fact a fish can’t swim normally close to the water surface at a low speed according to the priori knowledge. Then abnormal behavior can be determined at the moment (i.e. death or floating). Semantic network can be used to convert non-obvious low-level video characteristics into an intuitive semantic information so that monitoring will be made convenient. In addition, the dual-parameter combination can effectively reduce the situation of false statements.

4

Analysis of Experiments

The paper studies the fish behavior changes in different water temperature by using different water temperature to test the target fish, and observes and analyzes the output of the activity amount and average height; finally the result obtained is used in the semantic network. Specifications of red carps used for the experiment are 2~3cm long, 0.9±0.2g weight, and domesticated water temperature for the experiment is 27℃ with pH value is between 6.8~7.2. In order to dispel the chlorine in water, per 10L water should be added a piece of “Yulebao”, a kind of disinfection purifying agent, and per 100L water added 0.5g of rhubarb powder. Before the experiment, the red carps are normally fed for 2 to 3 days, and then 4 fish are placed in the experimental tanks, testing in 27℃, 35℃ and 40℃ water separately, finally we collect 20 minutes’ experimental video data. The changes of multi-fish activity amount at different water temperatures in the first two minutes are recorded in Figure 7.

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision

25

Fig. 7. Variation Curve of Activity Amount at Different Temperatures versus Time

From Figure 7, it can be evidently found out that the average of fish activity amount increases as the water temperature rises, and the change range of activity amount also increases. Fishes act comparatively stably when water temperature is 27℃, and most fish swim slowly when observe in the video at the moment. When the temperature goes up to 35℃, the fish activity amount increases comparatively but the change range is still steady. Observed in the video, the swimming speed of all the fish is obviously higher. When the temperature reaches 40℃, there are dramatic increases and serious fluctuation in the activity amount. Fish activities are in disorder, and fast swimming, rolling, floating can be observed in the video this time, and even some fish mortality appears. Consequently, the feature of activity level can more accurately characterizes the changes in fish behavior.

Fig. 8. Variation Curve of Average Height at Different Temperatures versus Time

26

G. Xiao et al.

Figure 8 shows the change curve of the fish at different temperature versus time. The fish average height changes according to water temperature. When water temperature is 27°, the average height is low and fishes almost swim at the bottom. When the temperature is 35°, the average height is evidently higher and some fish float. When it is 40° which is lethal to red carps, we find out that the fish average height fluctuates sharply, and such fluctuation increases as time lasts. This time some fish are obviously watched floating and dying when we observe the video. Table 1. Semantic Information Output at Different Temperatures T1 T2

T=0s

T=30s

T=60s

40°

(M,M)-> Abnormal existed

(H,H)->feed/ Abnormal

(L,M)-> Normal

35°

(L,M)-> Normal

(M,L)-> Frolic/ Abnormal

(H,L)-> Abnormal

27°

(M,L)-> Frolic/ Abnormal

(L,L)-> Normal

(L,L)-> Normal

T=90s

T=120s

(L,H)-> Dead

(L,H)-> Dead

(M,M)-> Abnormal existed (M,L)-> Frolic/ Abnormal

(M,M)-> Abnormal existed (L,L)-> Normal

(Notes: T1 = Time; T2 = Temperature; H = High, M = Middle, L = Low;) Line refers to temperature; Column refers to the points of time. The combination of average height and activity amount at a certain time is presented as the level of the average height and the level of the average. The latter result indicates the output of the semantic network From the table, we can conclude that the semantics of fish behavior in different combinations can be well described via the semantic network, and the fish abnormal behavior is output accurately. Of course it cannot be denied that there are false positives existing in the normal cases; such problems require further studies in the future.

5

Conclusion

In this paper, the information of the fish swimming speed and layer location has been changed into the two parameters of the activities amount and average height, and then it is used to describe the fish behavior. The validity of these two features has been verified by water temperature environmental experiments. The semantic descriptive network is tentatively proposed according to the combination of these two features in the different threshold values. This network can possibly describe the semantics of fish behavior in different temperature environments. Meanwhile this method has some disadvantages such as false positives when extracting the semantics of fish behavior normally. On one hand, more semantics of fish behavior and multi-dimensional parameters are needed in the behavior description to solve this error, on the other hand, this paper is limited to the semantics in high temperature applications, and how to make the semantic model universal is still a focus in the further studies.

Semantic Description of Fish Abnormal Behavior Based on the Computer Vision

27

References 1. Xiao, G., Jin, Z., Chen, J., Gao, F.: Application Of Artificial Immune System And Machine Vision Anomaly Detection Of Water Quality[J]. INTERNATIONAL JOURNAL OF SOFTWARE ENGINEERING AND COMPUTING. 1(2), 47–51 (2009) 2. Atchison, G.J., Henry, M.G., Sandheinrich, M.B.: Effects of metals on fish behavior: a review[J]. Environmental Biology of Fishes. 18(1), 11–25 (1987) 3. Xiao, G., Zhang, W., Zhang, Y.L., Chen, J.-J., Huang, S.-S., Zhu, L.-M.: Online monitoring system of fish behavior[R]. In: 11th International Conference on Control, Automation and Systems. October 26–29, 2011. in KINTEX, Gyeonggi-do, Korea 4. Jiu-jun, C., Gang, X., Fang, Y.X., Fei, G., Bin, Z.H.: Fish Activity Model Based on Tail Swing Frequency [J]. Journal of Image and Graphics 14(10), 2178–2180 (2009) 5. Kristiansen, T.S., Ferno, A., Holm, J.C., Privitera, L., Bakke, S., Fosseidengen, J.E.: Swimming behaviour as an indicator of low growth rate and impaired welfare in Atlantic halibut (Hippoglossus hippoglossus L.) reared at three stocking densities[J]. Aquaculture 230, 137–151 (2004) 6. Huan-da, L., Ying, L., Liang-zhong, F.: Design and implementation of fish behavior automatic monitoring system based on computer vision[J]. Fisheries Research. 38(1), 19–23 (2011) 7. Barry, M.J.: Application of a novel open-source program for measuring the effects of toxicants on the swimming behavior of large groups of unmarked fish[J]. Chemosphere 86, 938–944 (2012) 8. Doupé, R.G., Partridge, G.J., Lymbery, A.J.: Visible implant fluorescent elastomer tags as pedigree markers for applied aquaculture: an evaluation using black bream Acanthopagrus butcheri[J]. Aquaculture Research. 34, 681–683 (2003) 9. Butail, S., Paley, D.A.: Three-dimensional reconstruction of the fast-start swimming kinematics of densely schooling fish[J]. J. R. Soc. Interface. 9, 77–88 (2012) 10. Catarina. Behavioural indicators of welfare in farmed fish[J]. Fish Physiol Biochem 38, 17–41 (2012) 11. Sadoul, B., EvounaMengues, P., Friggen, N.C., Prunet, P., Colson, V.: A new method for measuring group behaviours of fish shoals from recorded videos taken in near aquaculture conditions[J]. Aquaculture. 430, 179–187 (2014) 12. Wang, Yu., Li-Zhu, Z., Chun-Xiao, X.: Video Semantic Models and Their Evaluation Criteria[J]. Chinese Journal Of Computers. 3(30), 337–351 (2007)

Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern Zhihua Xie() Key Lab of Optic-Electronic and Communication, Jiangxi Sciences and Technology Normal University, Nanchang, Jiangxi, China [email protected]

Abstract. Infrared face recognition, being light- independent, and not vulnerable to facial skin, expressions and posture, can avoid or limit the drawbacks of face recognition in visible light. Local binary pattern (LBP), as a classic local feature descriptor, is appreciated for infrared face feature representation. To extract compact and principle information from LBP features, infrared face recognition based on LBP adaptive dominant pattern is proposed in this paper. Firstly, LBP operator is applied to infrared face for texture information. Based on the statistical distribution, the variable dominant pattern is attained for different infrared faces. Finally, dissimilarity metrics between the adaptive dominant pattern features is defined for final recognition. The experimental results show the adaptive dominant patterns in infrared face image have a lower feature dimensionality, and the proposed infrared face recognition method outperforms the traditional methods based on LBP uniform and discriminant patterns. Keywords: Local binary pattern · Adaptive dominant pattern · Uniform pattern · Dimensionality reduction · Infrared face recognition

1

Introduction

Compared with the traditional gray and color face imaging, infrared imaging can acquire the intrinsic temperature information of the skin, which is robust to the impacts of illumination conditions and disguises [1, 2]. Infrared face recognition is an active research area during last years. However, the challenges of infrared face recognition mainly come from the external environment temperature, low resolution and other factors [3 4]. Based on the property of infrared face image, exploring a robust feature extraction method is a key step in infrared face recognition system. Many feature extraction methods are proposed for infrared face recognition [1]. Those methods are mainly divided into two categories: holistic extraction methods and local extraction methods [2]. Due to low resolutions of infrared face image, the local feature extraction is more appreciated for infrared face feature extraction, which can be used to get more local discriminant information. In 2006, the method based on local binary pattern was applied to infrared face recognition by Li et al [4], which get a better performance than statistical methods such as PCA and LDA. Local Binary Pattern (LBP) has received noticeable attention over the past few years [5-9]. LBP encodes the relative intensity



© Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 28–36, 2015. DOI: 10.1007/978-3-662-47791-5_4

Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern

29

magnitude between each pixel and its neighboring pixels, which can describe the micro-patterns of the image such as flat areas, spots, lines and edges. The main advantage of this approach lies in the robustness of LBP to monotonic photometric changes and in its computational simplicity [7]. Since the impact of external environment temperature on infrared face image is almost a monotonic transform, the LBP can improve the performance of infrared face recognition under different environment situations. However, the main drawback of LBP is that the dimension of LBP pattern features is relatively high and is not suitable for real-time recognition system. To reduce the dimensionality of traditional LBP features, far infrared face recognition based on LBP discriminant pattern was proposed by Xie et al [10], which gets a fixed subset from all LBP patterns. With regarding to infrared face recognition, an adaptive and fundamental feature extraction from all LBP patterns still is a big challenge [11, 12]. To address this issue, the adaptive dominant pattern is introduced to represent principle information in different infrared face images. Based on this novel dominant pattern, we construct the dissimilarity metric to depict the distance between two adaptive dominant pattern features. This paper is organized as follow. Section2 introduces the feature extraction based on LBP operator. Section3 gives adaptive dominant pattern algorithm. The Distance metrics of the adaptive features is constructed in Section 4. Section5 describes the main framework of our proposed infrared face recognition method. Section6 presents the experimental results including the methods in the comparison study. Section7 gives the conclusions of this paper.

2

Feature Extraction Based on LBP

Local binary patterns were introduced by Ojala et al as fine scale texture descriptor [6]. Ever since then it has shown excellent performance in biometrics studies, in terms of speed and discriminant performance. In its simplest form, an LBP description of a pixel is created by threshold the values of the 3 × 3 neighborhood of the pixel against the central pixel and interpreting the result as a binary number [8]. The basic idea of this approach is demonstrated in Fig 1. LBP code for center point

gc can be

represented as P −1

LBPP , R ( g c ) =  2i ⋅ S ( g i − g c )

(1)

1, gi − g c ≥ 0 S ( gi − g c ) =  0, gi − gc < 0

(2)

i =0

Where gc is the gray value of the central pixel,

gi is the value of its neighbors, P

is the total number of involved neighbors and R is the radius of the neighborhood. The parameters P, R can be (8,1), 8, 2 and 16, 2 etc.

( )

( ) ( )

30

Z. Xie

Fig. 1. Original Infrared Face Vs LBP Representation

For LBP based infrared face feature extraction, original infrared face is partition into non-overlapping regions [10]. In each region, LBP patterns occurrence histogram is computed by: N −1 M −1

H (r ) =   f ( LBPP, R ( x , y ), r )

(3)

1, LBPP,R (x , y ) = r f (LBPP,R (x , y ), r) =  0, otherwise

(4)

xc =2 yc =2

Where N is the length of region, M is the width of region, r is the pattern label,

( x, y ) is the coordinate of a pixel in one region. The dimensionality of histogram P from LBP is 2 . All the histograms of all local regions are concatenated into one feature vector to build the final infrared face features. Figure 2 gives the occurrences of different LBP bins in those infrared face images in Figure1.

Occurentences

500 400 300 200 100 0

0

50

100 150 Patterns of LBP

200

250

Fig. 2. Histogram Occurrence and SD Distribution of different patterns

The distribution of LBP bins from those infrared face images focuses on several special bins. Therefore, it is interesting to discard those LBP bins with small percent of occurrences for infrared face recognition.

Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern

3

31

Adaptive Dominant Pattern Features

Although the uniform LBP can effectively capture the fundamental information of visible images, it is possibly problematic for infrared face representation [12, 13]. The reason is that the uniform LBP pattern features extracted from infrared face images are not necessary to have principle proportions. Furthermore, the uniform and discriminant LBP patterns have the same subset bins for different parts from one infrared face or infrared faces from different people, which can not make full use of statistical information in those objects [12]. To extract compact and fundamental features from infrared face, adaptive dominant pattern is defined be the most frequently occurred patterns. Therefore, for two infrared face images from different people, the dominant pattern can be of different lengths and different types. In this paper, around 80% of the total pattern occurrences can effectively captures the fundamental information for infrared face classification tasks [13]. Suppose the dominant pattern feature of local region i is set Ri .

Ri = {[li1 , H (li1 )],[li 2 , H (li 2 )],[lini , H (lini )]}

(5)

Where ni is the dimensionality of dominant pattern feature from the local region i,

lij is pattern bin of LBP. To get the dominant pattern feature, the LBP histogram

H (ri ) from all patterns is

constructed and the histogram bins are sorted by descending order. Then, we can set a percent threshold T ∈ [0,1] , the variable length k of dominant pattern features set

Ri

should meet the following condition:

 k = arg min(

k j =1

H (lij )



k

2 P −1 i=0

H (i )

)≥T

H (li ( j −1) ) ≥ H (lij ) ≥ H (li ( j +1) )

4

(6) (7)

The Distance Metrics of The Adaptive Pattern features

The traditional distance metrics between two LBP histograms are based on the same length and pattern labels [8, 9]. However, the adaptive dominant pattern feature is of different lengths and labels, the classic distance is unsuitable for those dominant features evaluation. To cope with this problem, the distance metrics of variable length dominant patter is constructed in this section. 1

2

Assuming two infrared faces I and I are partitioned into N regions, their corresponding extraction features are:

V 1 = {R11 , R21 , , RN1 }

(8)

32

Z. Xie

V 2 = {R12 , R22 , , RN2 } 1

(9)

2

Where Ri and Ri are the variable length dominant pattern features from region i in the two infrared faces. The main idea of our distance metrics is the expansion of original adaptive features to make use of traditional histogram metrics algorithms with the same length. Since 1

2

the pattern bins in Ri are not the same to the pattern bins in Ri , we define the pattern 1

2

1

2

bins sets Si and Si . The Union set of Si and Si are

Si

Si = Si1  Si2

(10) 1

2

To make the two adaptive features be of the same length, Ri and Ri are expanded 1

2

to ERi and ERi in the union set Si :

ERi1 = Ri1  {(t , 0) | t ∉ Si1 , t ∈ Si2 }

(11)

ERi2 = Ri2  {(t , 0) | t ∉ Si2 , t ∈ Si1}

(12)

1

2

After this expansion, the pattern bins in ERi are the same with the bins in ERi . Therefore, we can use traditional dissimilarity metrics to compute to the distance between the two features. In this paper, the metrics based on chi-square statistic [9] is 1

2

used to compute the distance between ERi and ERi . m

sim( ER , ER ) = 1 i

2 i



bin =1

( H (bin) − H ( bin ) ) 1

2

H 1 (bin) + H 2 (bin)

2

(13)

Where m is the pattern number of the new expansion feature. 1

2

Finally, the distance between Ri and Ri is

d ( Ri1 , Ri2 ) = sim( ERi1 , ERi2 )

5

(14)

The Proposed Infrared Face Recognition

In this section, the detail realization of our proposed infrared face recognition is introduced. The main steps in our method are listed as follow: Stage one: Infrared face detection and normalization [3]. After normalization, the value of infrared face data ranges from1 to 255 and the resolution of infrared face image is the same. Stage two: The LBP is applied on the normalized infrared face to get LBP image. Stage three: Each LBP image is divided into non-overlapping regions. Stage four: Adaptive dominant pattern features are extracted for each region, which is introduced in section 3.

Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern

33

Stage five: The distances of variable length features defined in section4 are computed for all regions from whole infrared face image. Last Stage: The final distance of two infrared face images are attained by weighted fusion method.

D (V 1 , V 2 ) =  i =1 wi d ( Ri1 , Ri2 ) N

(15)

The nearest neighborhood classifier based this distance between training and test infrared faces is employed to get the final recognition result.

6

Experiment Results

To verify the effectiveness of our method and other ones, all experiments are done under an infrared face database built by ourselves with a ThermoVisionA40 infrared camera supplied by FLIR Systems Inc [3, 10]. The training database comprises 400 thermal images of 40 individuals which were carefully collected under the similar conditions in November 17, 2006: environment under air-conditioned control with temperature around 25.6 26.3℃. The test database can be divided into two categories: same-session data and time-lapse data. The same-session data comprises 400 thermal images of 40 individuals which are captured under the same conditions to the training database. Each person has 10 templates: 2 in frontal-view, 2 in up-view, 2 in down-view, 2 in left-view and 2 in right-view. The time-lapse data comprises 165 thermal images of one individual which were collected in June 28, 2006. The original resolution of each image is 240×320. The resolution of infrared face image turns to be 80×60 after face detection and geometric normalization, as demonstrated in Figure 3.



Fig. 3. Infrared Face Databases

34

Z. Xie

In our experiments, the parameter (P, R) LBP code is (8, 1). To show the contribution of our adaptive pattern labels, the weighs of different blocks are set to the same:

wi equals to 1/ N . Five modes of partitioning are used: 1 is non-partitioning, 2 is 2×2, 3 is 4×2, 4 is 2×4, and 5 is 4×4. To verify the effectiveness of the proposed adaptive features extraction method for infrared face recognition, the two other LBP based extraction algorithms used for comparisons include traditional LBP uniform pattern [5], LBP discriminant pattern[10]. 100

Recognition Rate/%

90

80

70

60

50

Discriminant Pattern Uniform Pattern Adaptive Pattern 1

2

3 Partitioning Modes

4

5

(a) Same-session data 100

Recognition Rate/%

90 80 70 60 Uniform Pattern Discriminant Pattern Adaptive Pattern

50 40

1

2

3 4 Partitioning Modes

5

(b) Time-lapse data Fig. 4. Recognition results with different partitioning modes

Infrared Face Recognition Based on Adaptive Dominant Pattern of Local Binary Pattern

35

It can be seen from the Figure4 that recognition rates based on adaptive dominant pattern are higher than ones based on uniform pattern and discriminant pattern with all partitioning modes. From Figure4 one can see: partitioning modes can contribute to the better performance for all infrared face recognition methods based LBP. Especially for elapse-time database, the improving performance of adaptive dominant patterns is increasing with the blocks, which is superior to those methods based on uniform pattern and discriminant pattern. The main reason is that adaptive dominant pattern can extract different principle features for different blocks in infrared faces, which can make full use of the statistical information for recognition. Therefore, the adaptive dominant pattern of LBP is effectiveness for textures information extraction for infrared face recognition.

7

Conclusions

The conventional LBP-based feature as represented by uniform and discriminant patterns, have the fixed pattern labels for different parts of different people. Based on the statistical distribution of local textures, an adaptive pattern type algorithm of LBP is proposed for infrared face recognition. Compared with the traditional fixed patterns, the adaptive dominant pattern method can capture much more useful information with lower dimensionality in different infrared face images. Our experiments illustrate that the adaptive dominant pattern from LBP is effective in extraction fundamental information and can improve the roubusness performance for infrared face recogntion. Acknowledgements. This paper is supported by the National Nature Science Foundation of China (No. 61201456), the Science & Technology Project of Education Bureau of Jiangxi Province (No.GJJ14581) and the Nature Science Project of Jiangxi Science and Technology Normal University (2013QNBJRC005, 2013ZDPYJD04).

References 1. Ghiass, R.S., Arandjelovi, O., Bendada, A., Maldague, X.: Infrared face recognition: A comprehensive review of methodologies and databases. Pattern Recognition 47(9), 2807–2824 (2014) 2. Osia, N., Bourlai, T.: A spectral independent approach for physiological and geometric based face recognition in the visible, middle-wave and long-wave infrared bands. Image and Vision Computing 32(6), 847–859 (2014) 3. Wu, S.Q., Li, W.S., Xie, S.L.: Skin heat transfer model of facial thermograms and its application in face recognition. Pattern Recognition 41(8), 2718–2729 (2008) 4. Li, S.Z., Chu, R., Liao, S.: Illumination Invariant Face Recognition Using Near-Infrared Images. IEEE Trans on Pattern Analysis and Machine Intelligence 29(12), 627–639 (2007) 5. Ahonen, T., Hadid, A., Pietikäinen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans Pattern Analysis and Machine Intelligence 18(12), 2037–2041 (2006)

36

Z. Xie

6. Ojala, T., Pietikäinen, M.: Multi-resolution, Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns. IEEE Trans on Pattern Analysis and Machine Intelligence 24(7), 971–987 (2002) 7. Zhang, B., Shan, S., Gao, W.: Histogram of Gabor Phase Patterns (HGPP): A Novel Object Representation Approach for Face Recognition. IEEE Trans on Image Processing 16(1), 57–68 (2007) 8. Liao, S., Chung, A.C.S.: Face Recognition with salient local gradient orientation binary pattern. In: Proceedings of 2009 International Conference on Image Processing, ICIP 2009, pp. 3317–3320 (2009) 9. Tan, X., Triggs, B.: Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions. IEEE Transactions on Image Processing 19(6), 1635–1650 (2010) 10. Xie, Z., Wu, S., Zhijun, F.: Infrared face recognition using LBP and discrimination patterns. Journal of Image and Graphics 17(6), 707–711 (2012) 11. Nanni, L., Brahnam, S., Lumini, A.: A simple method for improving local binary patterns by considering non-uniform patterns. Pattern Recognition 45(10), 3844–3852 (2012) 12. Bianconi, F., Fernández, A.: On the Occurrence Probability of Local Binary Patterns: A Theoretical Study. Journal of Mathematical Imaging and Vision 40(3), 259–268 (2011) 13. Liao, S., Law, M., Chung, C.S.: Dominant local binary patterns for texture. IEEE Transactions on Image Processing 18(5), 1107–1118 (2009)

Single Training Sample Face Recognition Based on Gabor and 2DPCA Jun Yang1() and Yanli Liu2 1

College of Computer Science, Sichuan Normal University, Chengdu, China [email protected] 2 College of Mathematics and Software, Sichuan Normal University, Chengdu, China

Abstract. Single training sample face recognition problem is a challenge in face recognition field, so the distinguished feature extracting is important step for improving the recognition correct rate under the condition only one sample of one person in training set. Gabor feature and 2DPCA reducing dimension algorithm are also effective feature extracting method and are applied on face recognition and pattern analysis fields. But the two methods can’t be combined because that 2DPCA need inputting data with 2D structure. A feature extraction method based on Gabor and 2DPCA is proposed in this paper. It transforms a series of Gabor sub-images to an image with the help of image splicing technique, and then 2DPCA can be employed. Experimental results on ORL face dataset show that the proposed method is effective with higher correct rate than those of other similar algorithms for single face recognition. Keywords: Face recognition · Single training sample · Gabor · 2DPCA

1

Introduction

Human faces are the most extensively studied object in image-based recognition. This is due to numerous important applications for face recognition technology, including criminal identification, credit card verification, automated video surveillance and intelligent human-computer interaction. During the last two decades, there has been a significant effort to develop recognition algorithms from frontal face images, with a lot of encouraging results. A recent study showed that computer algorithms are capable of outperforming people on recognition of frontal face images [1], when large and representative training data set is available. They won human may because of information they can exploit from the representative training images about the variability of individuals across changes in illumination. However, more samples usually mean more effort, time, and thus money. Unfortunately, many current face recognition techniques rely heavily on the large size and representativeness of the training sets, and most methods suffer degraded performance or fail to work if there is only one training sample per person available. This so-called "Single Sample per Person" (SSP) situation is common in face recognition. The one sample per person problem in face recognition is defined as follows. Given a store © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 37–44, 2015. DOI: 10.1007/978-3-662-47791-5_5

38

J. Yang and Y. Liu

database of faces with only one image per person, the goal is to identity a person from the database later in time in any different and unpredictable expression, lighting, and aging, etc. from the individual image [2]. Due to its technical challenge and widerange of applications including law enforcement, driver license or passport card identification, surveillance photo identification, one sample problem has rapidly emerged as an active research field in the face recognition community. To solve SSP problem, many methods have been developed recently. Because Principal Component Analysis (PCA) [3] had already become the baseline algorithm of face recognition and can be used under one training sample, some extensions of PCA were proposed to copy SSP problem for higher performance. A method named as (PC)2A was presented in Ref [4]. It linearly combines the first order projection images and the original images to form new images for enriching information of original images. Chen et al. proposed E(PC)2A[5] method which was the enhanced version of (PC)2A. It gets new combined images from first order projection images and second order projection images. For reliably estimating the covariance matrix under the small sample size condition, two-dimensional PCA (2DPCA) has been proposed by Yang et al. [6]. This method uses straightforward 2D image matrices rather than 1D vector for covariance matrix estimation, thus claimed to be more computationally cheap and more suitable for small sample size problem. Recently, some research found the Gabor feature was an efficient feature for face representation and recognition [7, 8]. In Ref [7], different Gabor filters are used to generate different representation of face image and a classifier was assigned for each of them and also for original image, then an enhanced version of majority voting fusion method was utilized to combine the classifiers to increase the efficiency of face recognition system. In this paper, we proposed an approach that extracted face feature by Gabor wavelet and 2DPCA algorithm. Ordinary, face Gabor feature got by combing the result of filtering the face image by different Gabor filters was a high dimension vector. 2DPCA algorithm must extract feature from data with the 2D structure, so the face Gabor feature was not retransformed by 2DPCA. Not same with traditional Gabor feature extracting method, we directly combined the sub images created by different Gabor filter to get a big Gabor global image. The image was 2D structure, so the 2DPCA algorithm can be employed to reduce dimension and get more representative feature.

2

Gabor Feature of a Image

The Gabor image representation is obtained by computing the convolution of the original image with several Gabor wavelets. These wavelets can be represented as a complex sinusoidal signal modulated by a Gaussian kernel function and defined by

ψ u ,v ( z ) =

k u ,v

σ

2

(

exp − ku ,v

2

)

z / 2σ 2  exp ( i ( ku ,v ⋅ z ) ) − exp( −σ 2 / 2)  , (1) 2

Single Training Sample Face Recognition Based on Gabor and 2DPCA

where

ku , v

u

and

v

represents the filter’s scale.

ku ,v = kv (cos φu ,sin φu )

(2)

z = ( x, y )

(3)

represents the filter’s orientation and

z

39

are defined by

For image representation, it is convenient to express

φu = π u / 8 , where k max

kv = kmax / f v

and

is the maximum frequency, and f is the spacing factor

between kernels in the frequency domain. The parameters in this paper are set by

v = 5 , u = 8 , σ = 2π

,

kmax = π / 2 and f = 2 。

The Gabor representation of a face image is computed by convolving the face

I ( z) I ( z) and

image with the Gabor filters. Let

be the gray-level distribution of an image.

The convolution of image

a Gabor kernel

ψ u ,v ( z )

is defined as

follows:

Fu ,v ( z ) = I ( z) ∗ψ u ,v ( z) ,

(4)

where * denotes the convolution operator. The result is complex valued and only the complex modulus is used as feature ordinarily. Let

Fu ,v

represent the vector obtained from concatenating each row of a Gabor

representation and normalizing the result to zero mean and unit variance. Then the final Gabor feature of the image is defined by: (5)

G = { F0 ,0 ,  , F0,7 ,  , F4,7 ,  , F4 ,7 }

It can be seen that the conventional Gabor feature is a high dimension vector.

3

2DPCA Dimension Reducing Algorithm

Consider the training set denoted as ber of training samples and ples onto a column vector

X = { X 1 , X 2 ,..., X N } , where N is the num-

X i ∈ R m×n is a face image. Projecting the training sam-

a will yield a row vector. The process can be denoted by: yi = X ia

(6)

The 2DPCA algorithm tries to find the optimal vector such that the scatter of samples in projected space reaches maximum. The aim can be denoted by:

40

J. Yang and Y. Liu

J ( y ) =  i =1 ( yi − ym )T ( yi − ym ) N

=  i =1 aT ( X i − X m )T ( X i − X m )a N

=a

T



N i =1

,

(7)

( X i − X m ) ( X i − X m )a T

= aT Sa where

X m is mean of training samples and S is denoted as 2D global scatter of

training samples. Ordinary, it is not enough to have only one optimal projection axis. The 2DPCA usually needs to select a set of projection axes ( a 1 , a 2 , ... , a d ) , satisfying:

( a1 , a 2 , ... , a d ) = arg m ax ( a T S a ) a iT a j = 0

i , j = 1, 2, ...d a n d i ≠ j

(8)

It has been proved that the optimal orthonormal projection vectors are the eigenvectors of matrix S corresponding to the first d largest eigenvalues of S . The size of S matrix is only n×n. Hence, computing its eigenvectors is very efficient compared to 1D PCA algorithm.

4

The Proposed Method

For each face image of training set, a set of Gabor filters are applied on it to create corresponding Gabor images. Different filters with five scales and eight orientations are employed, so the number of Gabor representation image is 40. For lower computing burden, each Gabor representation image is sampled to 16×16. By image splicing, the 40 Gabor representation images are integrated to form a global Gabor image. It is a two dimension structure, so the 2DPCA can be applied on it. Taking global Gabor image of each original face image as new training sample, 2DPCA algorithm is implemented on the new training set to get optimal orthonormal projection vector set. Then, a representative subspace can be got and the dimension of global Gabor image can be reduced by projecting it onto the space. The feature extraction process can be illuminated by Fig.1 and the feature extraction algorithm is described as following. Algorithm 1: the feature extraction algorithm in this paper. Input: a face image. Output: the feature with dimension reduced of the inputting image. 1. Filtering the original image with different filters corresponding to five scales and eight orientations to get Gabor representation images.

Single Training Sample Face Recognition Based on Gabor and 2DPCA

41

2. Down sampling each Gabor images to 16×16 resolution and normalizing it to zero mean and unit variance. 3. Splicing the forty Gabor representation images to form a big global image by scales variety with row and orientations variety with column. It denoted by:

G2 D

4. Collecting each

G2 D

 F0,0  =  F4,7

  

F0,7    F4,7 

(9)

image of every face to form new images training set. Cal-

culating eigenvectors of 2D global scatter matrix corresponding to first d maximum eigenvalues of it by Eq.8. 5. Projecting the

G2 D

image to the subspace spanned by the eigenvectors to get the

ultimate feature denoted as FG 2 DPCA .

Fig. 1. Feature extraction procedure diagram of the proposed method

42

J. Yang and Y. Liu

In recognition stage, we get

FG 2 DPCA

feature for test image denoted as

FG 2 DPCA test . The distance between the test image and every gallery image is computed Eq.10.

Di = FG 2 DPCAi − FG 2 DPCATest where

i = 1,2,...,N ,

(10)

FG 2 DPCAi is the extracted feature of No. I image in gallery set. Then, the

most neighbor classifier is employed to classify the test image by Eq.11.

label (Test ) = lable( j ) j = arg min( Di ) i = 1,2,...,N ,

(11)

j

where

lable( j ) is the class label of image with No. j.

5

Experiment Results

The experiment is done in the ORL standard face database. The ORL database contains samples from 40 individuals, each providing 10 different images. For some subjects, the images were taken at different times. The facial expressions (open or closed eyes, smiling or non-smiling) and occlusion (glasses or no glasses) also vary. The images were taken with a tolerance for tilting and rotation up to 20 degree. There is also some variation in the scale of up to 10%. All images are grayscale and normalized to a resolution of 112 ×92 pixels. In experiments, one image of every face is selected for training (such as the ith image of every face) and the other samples for testing. So there are ten experiments and 40 training samples and 360 testing samples in every experiment. Finally, the mean correct recognition rate of ten experiments set is given as the performance assess of the proposed method. While, the other five methods, which were proposed to solve the single training sample per person problem, are used for comparison. The first method is the 2DPCA approach based on gray images; the second is the method based on Gabor feature and PCA; the third is dividing an image into several non-overlap blocks [9]; the fourth is to extract feature by Gabor sub-image and 2DPCA and classify face by majority voting fusion method [7]; the fifth is getting virtual images by SVD reconstructing [10] and extracting feature by 2DFLDA. In all the six schemes, the nearest classifier is used to classify the input image. The correct rate vs. dimension curves is showed in Fig.2. It can be seen that the proposed method reaches optimal correct rate when two eigenvectors are used to span the 2DPCA subspace. In Fig.2, we also describe the correct rate vs. dimensions curve of gray image plus 2DPCA. From the compare of two curves, it can be seen that under most projected dimension, the correct rate of the Gabor plus 2DPCA approach (the method in this paper) surpasses that of gray plus 2DPCA approach.

Single Training Sample Face Recognition Based on Gabor and 2DPCA

43

Fig. 2. The correct rate vs. dimension curves of two methods

Table 1 lists the top recognition accuracy of the five methods with optimal parameter setting in experiments. It can be seen that the proposed method achieves higher recognition accuracy than the other methods. Table 1. The most optimal recognition correct rates of the different methods on ORL face databases

Method Gray + 2DPCA

Gabor + PCA Method in Ref [9] Method in Ref [7] Method in Ref [10] Gabor + 2DPCA (the proposed method)

6

Correct rate (%) 72.94 73.92 70.83 74.40 73.78 77.81

Conclusion

In this paper, we research the integrate approach between image Gabor presentation and 2DPCA dimension reducing algorithm. By splicing Gabor sub-images, a synthetic global image can be got. With training set composed by these synthetic images, 2DPCA approach can be used to learn a subspace with samples separated and the noise in images space can be wipe out. We apply the method in face recognition with single training sample, the experiment results show that our proposed method outperforms than other algorithms designed to solve the single training sample problem. Furthermore, though we emphasize the single training sample problem in this paper, it is obvious that the method also can be applied on face recognition under condition of multi-samples of one person in training set. We will examine the effectiveness at that circumstance in future work.

44

J. Yang and Y. Liu

Acknowledgements. This project was supported by the National Nature Science Foundation of China (No. 61373163), the Scientific Research Fund of Sichuan Provincial Education Department (No. 15ZA0039) and Project of Visual Computing and Virtual Reality Key Laboratory of Sichuan Province (No.PJ2012001).

References 1. O’Toole, A.J., Phillips, P.J., Jiang, F., Ayyad, J., Penard, N., Abdi, H.: Face recognition algorithms surpass humans matching faces over changes in illumination. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 1642–1646 (2007) 2. Tan, X., Chen, S., Zhou, Z.-H., Zhang, F.: Face recognition from a single image per person: A survey. Pattern Recognition 39, 1725–1745 (2006) 3. Turk, M., Pentland, A.: Eigenfaces for Recognition. Journal of Cognitive Neuroscience 3, 71–86 (1991) 4. Wu, J., Zhou, Z.: Face recognition with one training image per person. Pattern Recognition Letters 23(14), 711–1719 (2002) 5. Chen, S.C., Zhang, D.Q., Zhou, Z.H.: Enhanced (PC)2A for face recognition with One training image per person. Pattern Recognition Letters 25(10), 1173–1181 (2004) 6. Yang, J., Zhang, D., Frangi, A.F., Yang, J.-Y.: Two-Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 131–137 (2004) 7. Ebrahimpour, R., Nazari, M., Azizi, M., et al.: Single training sample face recognition using fusion of classifiers. International Journal of Hybrid Information Technology 4(1), 25–32 (2011) 8. Tenllado, C., Gómez, J.I., Setoain, J., et al.: Improving face recognition by combination of natural and Gabor faces. Pattern Recognition Letters 31(11), 1453–1460 (2010) 9. Chen, S., Liu, J., Zhou, Z.: Making FLDA applicable to face recognition with one sample per person. Pattern Recognition 37(7), 1553–1555 (2004) 10. Gao, Q., Zhang, L., Zhang, D.: Face recognition using FLDA with single training image per-person. Applied Mathematics and Computation 205, 726–734 (2008)

Vibe Motion Target Detection Algorithm Based on Lab Color Space Zhiyong Peng, Faliang Chang(), and Wenhui Dong College of Control Science and Engineering, Shandong University, Jinan, China [email protected], {flchang,Dongwh}@sdu.edu.cn

Abstract. As Visual Background Extractor (Vibe) is sensitive to illumination change and object shadow and extracts movement area incompletely, this paper proposes a Vibe motion target detection algorithm based on Lab color space. For matching pixels with their background models, the algorithm improves CIE1976Lab color-difference formula by reducing the proportion of brightness difference. Experiments show that the improved color-difference formula is less sensitive to illumination change and object shadow. Then, the algorithm makes use of space consistency of pixels to correct pixel classification results, which improves the anti-noise ability and makes extracted movement area more complete. Experimental results demonstrate that the algorithm proposed has better detection effects both in indoor and outdoor environments. Keywords: Motion target detection · Visual background extractor (Vibe) · Lab color space · Color-difference formula

1

Introduction

In video content analyses and multimedia retrieval, extracting movement area from a video sequence completely and exactly, is a basic and critical task [1]. Nowadays, motion target detection algorithms frequently-used are optical flow, frame difference method and background subtraction. Optical flow analyzes the optical flow of motion targets and gets abundant motion information to extract foreground, but it has computational complexity and poor real-time [2]. Frame difference method exploits adjacent frame difference to extract foreground. It computes simply but movement area extracted by the algorithm is very different from the actual value [3]. Background subtraction gets foreground by doing difference operations with every frame and background model [4]. This algorithm has high accuracy and robustness and is one of the most widely used motion target detection algorithms. The key problem of background subtraction is to establish and update background model. Literature [5] proposed GMM (Gaussian mixture model) which can obtain good results in complex non-static background, but has computational complexity and many parameters. Literature [6] and [7] proposed codebook method, which uses quantification and clustering technology to build a background model [8]. It has no parameter but is sensitive to illumination change and consumes large computer memory. Literature [9] proposed self-organizing through artificial neural networks, which © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 45–54, 2015. DOI: 10.1007/978-3-662-47791-5_6

46

Z. Peng et al.

learns motion information by self-organizing. It is robust to illumination change but costs much in computation. Literature [10] and [11] proposed Vibe (Visual Background Extractor) based on probability and mathematical statistics. This algorithm makes use of random strategy to update background model and builds transmission mechanism of pixel information. It is simple, fast and has good extraction effects. Vibe algorithm is better than other motion target detection algorithms in working speed and extraction effect, but still has some problems. Firstly, it is sensitive to illumination change. When illumination change suddenly, a large number of background pixels are judged to be foreground points. Secondly, Vibe algorithm is unable to suppress object shadow. Object shadow is judged to be foreground, which makes accuracy low. Thirdly, Vibe algorithm extracts incomplete movement area, whose interior has some small holes. To solve these issues, we propose an improved Vibe algorithm based on Lab color space. Experimental results demonstrate that the algorithm solves these issues and improves the accuracy of motion detection.

2

Lab Color Space

Vibe algorithm provides us the framework of motion detection, but doesn’t discuss which color space we should choose. At present, most literatures choose gray feature, which is single but sensitive to illumination change. For color image, color is one of the most widely used features in motion detection. Different color spaces reflect different information of the image. Thus, motion detection algorithms based on different color spaces have different detection results [12]. Literature [12] analyzed information changes of motion targets in different color spaces, and demonstrated that motion detection algorithm based on Lab color space has better adaptability and robustness to complex background. Based on above analysis, this paper proposes improved Vibe algorithm based on Lab color space. Lab color space describes human’s vision sense by digitization means, and is a color system based on physiology feature. The most highlight of Lab color space is that it has uniformity of vision sense, in other words, two colors approximate in vision sense are adjacent in Lab color space [13]. Lab color space consists of Channel L, a and b. Where, Channel L describes the brightness of pixels. Its value ranges from 0 to 100, representing pure black to pure white. Channel a describes the range from green to gray and to red; Channel b describes the range from blue to gray and to yellow. Both of their values range from -128 to 127. Nowadays, images from most image capture devices are based on RGB color space. Thus, images need transformed to Lab color space. This paper uses the method provided by Open Source Computer Vision Library (OpenCV). The transition formulas [14] are as follows:  X   0.433953 0.376219 0.189828  R / 255  Y  =  0.212671 0.715160 0.072169  G / 255       Z   0.017758 0.109477 0.872765  B / 255 

(1)

Vibe Motion Target Detection Algorithm Based on Lab Color Space

 L = 116 f (Y ) − 16   a = 500 [ f ( X ) − f (Y ) ]   b = 200 [ f (Y ) − f ( Z ) ]

47

(2)

Where, t 1/3 f (t ) =  7.787t + 16 / 116

t > 0.008856 t ≤ 0.008856

3

Improved Vibe Algorithm Based on Lab Color Space

3.1

Background Model and Initialization

(3)

Background subtraction gets foreground by doing difference operations with every frame and background model. Therefore, a background model need built at first. This paper makes use of the method provided by Vibe algorithm to build background model. For every pixel, Vibe algorithm builds a background sample set whose capacity is N ( N is set to 20). Let us assume that v ( x ) is the value of the pixel located at x in the image in a given Euclidean color space and the background sample set of x is M ( x) = {v1, v 2,…, vN} . Then, compare v ( x ) with the value of every sample in M ( x ) . If the distance between v ( x ) and the value of a sample is less than R , we consider that the sample is matching with x . We judge whether x is a background point by counting the number of samples matching with x . This classification process is illustrated in Fig. 1 [11].

Fig. 1. Classification of pixels

M ( x ) is initialized by the first frame in a video sequence. In the first frame, we select a pixel randomly from N ( x ) ( N ( x ) is the eight-neighborhood of a pixel x ,

see Fig. 2, N ( x) = {n1, n2, …, n8} ) and assign its value to vi ( vi is a sample in M ( x ) , i = 1, 2, …, N ).

48

Z. Peng et al.

Fig. 2. The eight-neighborhood of a pixel x

3.2

Choice of Pixel Distance

Gray feature is widely used to implement Vibe algorithm in most literatures. Let us denote by g ( xi ) the grayscale value of a pixel xi . The distance formula of two pixels in grayscale space is Formula (4). D ( xi , x j ) =| g ( xi ) − g ( x j ) |

(4)

Formula (4) is simple but sensitive to illumination change. Color feature is widely used to detect motion targets and RGB space is a widely used color space. Let us denote by r ( xi ) , g ( xi ) and b( xi ) respectively the values of a pixel xi in Channel R, G and B. Euclidean distance formula of two pixels in RGB color space is Formula (5).

D( xi , x j ) =

( r ( x ) − r ( x ) ) + ( g ( x ) − g ( x ) ) + ( b( x ) − b( x ) ) 2

i

2

j

i

j

i

2

(5)

j

In Formula (5), the contributions of Channel R, G and B are the same. But RGB color space is not uniform. Moreover, all of the three channels contain brightness information of pixels. Among the information, there is a big relevance, which is not conducive to the detection and segmentation of motion targets [12]. Compared to RGB color space, Lab color space is a uniform space. Two colors approximate in vision sense are adjacent in Lab color space [13]. Furthermore, brightness and color are separated, which is beneficial to detect motion targets. In printing industry, color-difference needs computed when assessing the quality and controlling the process of copied color. The color-difference in Lab color space is widely used [15] and color-difference formulas frequently used are CIE1976Lab colordifference formula, CMC(l:c)1984 color-difference formula and CIEDE2000 colordifference formula [15]. For CIEDE2000 color-difference formula, its precision is the best but its computation is the most complex. For simplicity, we choose CIE1976Lab color-difference formula to compute the difference of two pixels (see Formula (6)).

ΔE ( xi , x j ) =

( L( x ) − L( x ) ) + ( a ( x ) − a ( x ) ) + ( b( x ) − b( x ) ) 2

i

j

2

i

j

i

j

2

(6)

When illumination change suddenly, brightness of background pixels changes greatly. If we choose gray feature to detect motion targets, some background pixels will be judged to be foreground points. However, the hue and saturation of background pixels are almost changeless. Thus, we can compute color-difference in Lab color space and set an appropriate threshold, and then the probability of judging background pixels to be foreground points can be decreased.

Vibe Motion Target Detection Algorithm Based on Lab Color Space

49

Many motion detection algorithms are sensitive to object shadow, and Vibe algorithm is no exception. Brightness of moving object shadow is darker than that of actual background located at the shadow, but the hue and saturation are almost changeless. Thus, we can compute color-difference in Lab color space, and then shadow can be removed to some extent. In Section 4, Figs. 3(d) and 4(d) show that CIE1976Lab color-difference formula can adapt illumination change and remove shadow better. Still, the detection effect is not satisfactory. This paper proposes a color-difference formula with weights. The formula lowers the weight of brightness difference and it is shown in Formula (7). ΔE ( xi , x j ) = α ( L ( xi ) − L( x j ) ) + ( a( xi ) − a( x j ) ) + ( b( xi ) − b( x j ) ) 2

2

2

(α < 1)

(7)

We denote by M and TE respectively whether xi and x j are matching and the matching threshold. The matching process is shown in Formula (8).

True M =  False

ΔE ( xi , x j ) < TE ΔE ( xi , x j ) ≥ TE

(8)

In this paper, α and TE are set to 0.25 and 10 respectively, which achieves good detection results as shown in Figs. 3(e) and 4(e) in Section 4. We denote by N M the number of samples in M ( x ) matching with a pixel x . If N M is more than TM (a fixed parameter set to 2), x is classified as a background point. Otherwise, x is classified as a foreground point. The judging process is shown in Formula (9). 0 F ( x) =  1

3.3

N M ≥ TM

background

N M < TM

foreground

(9)

Modification of Pixel Classification

Influenced by noise, some pixels may be classified incorrectly. Some background pixels may be judged to be foreground points and some foreground pixels may be judged to be background points. This paper makes use of space consistency of pixels to correct pixel classification results. We do statistics on classification results of the eight-neighborhood pixels of a pixel x and count the number of eight-neighborhood pixels whose classification result is the same as the pixel x . We denote by NC and

TN ( TN is set to 4) respectively the number mentioned above and its threshold. If NC is less than TN , we consider that the classification result of x is wrong and needs inverted. Let us denote by C whether to modify the classification result of x and the process of pixel classification modification is shown in Formula (10). True C=  False

N C < TN N C ≥ TN

(10)

50

Z. Peng et al.

In Section 4, Figs. 3(f) and 4(f) show that pixel classification modification removes some isolated foreground points and improves the anti-noise performance of the algorithm. In addition, it makes extracted movement area more complete and improves the accuracy of motion detection. 3.4

Updating Background Model

It is necessary to update background model because background is constantly changing, such as illumination change and object shift-in and shift-out. This paper makes use of the method, a conservative update strategy combined with counting foreground points [16], provided by Vibe algorithm to update background model. The conservative update strategy is that only pixels classified as background can update background model. When a pixel is classified as a background point, it updates its background model by the probability of 1 / ϕ and updates that of its eight-neighbor pixels by the same probability. Vibe algorithm initializes background model by the first frame, which is easy to introduce ghost regions [16]. Updating background models of neighboring pixels can spread background models and suppress ghost regions. The background model of every pixel contains N samples. We choose the sample replaced randomly by the probability of 1 / N . Then, the probability of a sample present in the model at time t being preserved after updating the model is ( N − 1) / N . The probability of reserving the sample at time t + dt is shown in Formula (11). dt

 N 

− ln   dt  N −1   N −1  p(t , t + dt ) =   =e  N 

(11)

Formula (11) demonstrates that whether a sample in the model is replaced is nothing to do with time. Thus, the random strategy is appropriate. Moreover, the expected remaining lifespan of any sample value of the model decays exponentially. Counting foreground points does statistics on the number of times of which a pixel is classified as a foreground point continuously. If the number is more than its threshold denoted by TC ( TC is equal to the video frame rate), the pixel will be judged to be a background points. Counting foreground points is beneficial to suppress ghost regions and update the background model of pixels blocked by foreground.

4

Experimental Results

We tested our algorithm in indoor and outdoor environments. Parameters in the two experiments were the same as mentioned above N = 20 , α = 0.25 , TE = 10 ,

TM = 2 , TN = 4 , ϕ = 16 .



Vibe Motion Target Detection Algorithm Based on Lab Color Space

4.1

Experiment 1

51

×

This experiment chose a 33-second video shot indoors, and its resolution was 704 576. In the 497th frame, partial light emitted by incandescent lamps was shaded by a moving person, which made some background pixels judged to be foreground points. Fig. 3 shows detection results of five kinds of Vibe algorithms for the 497th frame. Fig. 3(a) is the 497th frame; Figs. 3(b) ~ (e) are the detection results by Formulas (4) ~ (7) respectively; Fig. 3(f) is the modified result of Fig. 3(e). Table 1 shows the detection performances of the five corresponding Vibe algorithms. In Table 1, TPR denotes the true positive rate (the percentage of actual foreground pixels which are correctly identified as foreground points); FPR denotes the false positive rate (the percentage of actual background pixels which are incorrectly identified as foreground points); ACC denotes the accuracy rate (the percentage of pixels which are correctly classified).

(a) Original Image

(d) Lab-Vibe

(b) Gray-Vibe

(e) α -Lab-Vibe

(c) RGB-Vibe

(f) α -Lab-Vibe (Modified)

Fig. 3. Detection results of five Vibe algorithms in Experiment 1 Table 1. Detection performances of five Vibe algorithms in Experiment 1 /% Vibe Algorithm

TPR

FPR

ACC

Gray-Vibe

81.86

3.52

95.48

RGB-Vibe

95.86

7.06

93.14

Lab-Vibe

97.91

1.23

98.71

α -Lab-Vibe

96.98

0.54

99.29

α -Lab-Vibe (Modified)

97.09

0.45

99.38

52

Z. Peng et al.

Experimental results show that a large number of background pixels were classified as foreground points and a large amount of object shadow area was identified as foreground by Vibe algorithms based on gray feature and RGB color space. In addition, there were many little holes in movement area extracted by the Vibe algorithm based on gray feature. Unlike them, Vibe algorithms based on Lab color space improved detection results to great extent. The detection result of the color-difference formula with weights was better than that of CIE1976Lab color-difference formula. The former was more effective to differentiate background and foreground pixels, more robust to illumination change and removed a larger area of object shadow. After modification of pixel classification, the number of isolate noise points was less, the extracted movement area was more complete and the detection result was closer to the truth than before. 4.2

Experiment 2

×

This experiment chose a walking video shot outdoors, and its resolution was 768 576. In the 638th frame, there were six pedestrians. Fig. 4 shows detection results of five kinds Vibe algorithms for the 638th frame. Fig. 4(a) is the 638th frame; Figs. 4(b) ~ (e) are the detection results by Formulas (4) ~ (7) respectively; Fig. 4(f) is the modified result of Fig. 4(e). Table 2 shows the detection performances of the five corresponding Vibe algorithms. The meanings of TPR, FPR and ACC are the same as Table 1.

(a) Original Image

(d) Lab-Vibe

(b) Gray-Vibe

(e) α -Lab-Vibe

(c) RGB-Vibe

(f) α -Lab-Vibe (Modified)

Fig. 4. Detection results of five Vibe algorithms in Experiment 2

Vibe Motion Target Detection Algorithm Based on Lab Color Space

53

Table 2. Detection performances of five Vibe algorithms in Experiment 2 /% Vibe Algorithm

TPR

FPR

ACC

Gray-Vibe

88.84

0.32

99.41

RGB-Vibe

95.75

0.41

99.50

Lab-Vibe

96.50

0.36

99.56

α -Lab-Vibe

95.86

0.22

99.68

α -Lab-Vibe(Modified)

97.23

0.20

99.74

Experimental results show that extracted movement area was incomplete and a amount of object shadow area was identified as foreground by Vibe algorithms based on gray feature and RGB color space. Vibe algorithms based on Lab color space improved detection results and the color-difference formula with weights and pixel classification modification suppressed object shadow, filled holes in extracted movement area and removed targets in which we were not interested (such as the shaky rope in Fig. 4(a)). It is clear that the algorithm proposed by this paper is superior.

5

Conclusion

This paper proposes a motion target detection algorithm based on Lab color space on the basis of Vibe algorithm. This algorithm improves CIE1976Lab color-difference formula by reducing the proportion of brightness difference. We use the improved color-difference formula for matching pixels with their background models. Experiments show that the formula can enhance the robustness to illumination change and suppress object shadow effectively. Then, the algorithm makes use of space consistency of pixels to correct pixel classification results, which improves the antinoise ability and makes extracted movement area more complete. Experimental results demonstrate that the algorithm proposed has better detection effects both in indoor and outdoor environments. Acknowledgements. We would like to acknowledge the support from China Natural Science Foundation Committee (No.61273277), Specialized Research Fund for the Doctoral Program of Higher Education (No.20130131110038), the Project Sponsored by the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry (No. 20101174).

References 1. Karasulu, B.: Review and Evaluation of Well-known Methods for Moving Object Detection and Tracking in Videos. J. Journal of Aeronautics and Space Technologies 4(4), 11–22 (2010)

54

Z. Peng et al.

2. Zhang, H.: Real-time Detection Method of Human Motion Based on Optical Flow (in Chinese). J. Transactions of Beijing Institute of Technology 28(9), 794–797 (2008) 3. Li, G., Qiu, S., Lin, L., Zeng, R.: New Moving Target Detection Method Based on Background Differencing and Coterminous Frames Differencing (in Chinese). J. Chinese Journal of Scientific Instrument 27(8), 961–964 (2006) 4. Mohamed, S.S., Tahir, N.M., Adnan, R.: Background modelling and background subtraction performance for object detection. In: 6th IEEE International Colloquium on Signal Processing and Its Applications (CSPA), pp. 1–6. IEEE Press, New York (2010) 5. Stauffer, C., Grimson, W.E.L.: Adaptive Background Mixture Models for Real-time Tracking. Computer Vision and Pattern Recognition (1999) 6. Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.: Background modeling and subtraction by codebook construction. In: IEEE International Conference on Image Processing, 2004, vol. 5, pp. 3061–3064. IEEE Press, New York (2004) 7. Kim, K., Chalidabhongse, T.H., Harwood, D., Davis, L.: Real-time Foregroundbackground Segmentation Using Codebook Model. J. Real-Time Imaging 11(3), 172–185 (2005) 8. Li, Y., Chen, W., Jiang, R.: The integration adjacent frame difference of improved ViBe for foreground object detection. In: 7th IEEE International Conference on Wireless Communications, Networking and Mobile Computing (WICOM), pp. 1–4. IEEE Press, New York (2011) 9. Maddalena, L., Petrosino, A.: A Self-organizing Approach to Background Subtraction for Visual Surveillance Applications. J. IEEE Transactions on Image Processing 17(7), 1168–1177 (2008) 10. Barnich, O., Van Droogenbroeck, M.: Vibe: a powerful random technique to estimate the background in video sequences. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2009, pp. 945–948. IEEE Press, New York (2009) 11. Barnich, O., Van Droogenbroeck, M.: Vibe: A Universal Background Subtraction Algorithm for Video Sequences. J. IEEE Transactions on Image Processing 20(6), 1709–1724 (2011) 12. Ding, Y., Qian, F., Fan, J., Jiang, H.: Study on Moving Object Detection Algorithm Based on Different Color Space (in Chinese). J. Journal of Changchun University of Science and Technology (Nature Science Edition). 35(4), 1–4 (2012) 13. Wang, K., Lu, C., Le, W., Wang, X.: Color Harmony System Based on Lab Perceptual Uniform Color Space (in Chinese). J. Journal of Northwestern Polytechnical University 22(6), 695–698 (2004) 14. Miscellaneous Image Transformations. http://www.opencv.org.cn/opencvdoc/2.3.2/html/ modules/imgproc/doc/miscellaneous_transformations.html#cvtcolor 15. Liu, H.: The Application of CIE Uniform Color Space and Its Color Difference Formula (in Chinese). J. Journal of Beijing Institute of Graphic Communication 11(3), 3–8 (2003) 16. Song, D., An, B.: Infrared Object Detection Based on Improved Vibe Algorithm (in Chinese). J. Microcomputer & Its Applications 33(13), 35–37 (2014)

Image Data Embedding with Large Payload Based on Reference-Matrix Yan Xia, Zhaoxia Yin(), and Liangmin Wang Key Laboratory of Intelligent Computing and Signal Processing, Ministry of Education, Anhui University, Hefei, China {xyan.ahu,wanglm.ahu}@qq.com, [email protected]

Abstract. Steganography with high payload is the needs of the real application in the context of large data. Nowadays, the published steganography have the significant achievements on the embedding quality, but the embedding capacity usually is not enough. In this paper, a new image data embedding with large payload based on reference matrix method is proposed. The reference matrix is generated by a base-9 numeral system, which guides cover pixel pairs’ modifications. Compared to the recent methods, the experimental results show the proposed method not only ensures the accepted image quality, but has the larger embedding capacity which is up to 3.169 bpp. Keywords: Information hiding · Digital image · Pixel matching · Embedding payload

1

Introduction

Recently, data embedding is so important due to the increase of secret communication. The carriers of data embedding are various, including simple text [1], still image [2], voice [3], video [4] and so on. However, because of the rise of image transmission on the Internet, data embedding in images has become increasingly important [5]. There are two indicators to measure a good steganographic method, including embedding quality and payload. But embedding quality and payload usually have an antagonistic relationship between them. Large payload is of good importance in the context of large data. In 1989, Turner proposed the least significant bit replacement method(LSB) [6], replacing the least significant bits to hide the secret messages. LSB method has been widely researched due to the method is easy to be achieved. Based on LSB, LSB matching (LSBM) method [7] is proposed. This method is similar with the LSB method, matching the LSB of stego pixel with the message bit by randomly increasing or decreasing one digit. Both LSB method and LSBM method use one pixel as a unit to hide one bit of message. In 2006, LSB matching revisited method (LSB-MR) [8] is proposed by Mielikainen. The method by using a function to guide two pixels as a embedding unit to carry 2 bits of message. And then based on LSB-MR, Zhang and Wang proposed © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 55–62, 2015. DOI: 10.1007/978-3-662-47791-5_7

56

Y. Xia et al.

an efficient steganographic embedding method(EMD) [9], in which n pixels are used to carry a digit in base (2n + 1) . So the payload is log 2 (2n + 1) / n bpp . In 2008, Chang et al. proposed a method (Sudoku-S) [10] based on EMD to expand the directions of modification to the pixels. According to this method, a digit in base 9 is embedded into the two pixels so that the maximum payload up to log 2 9 / 2 bpp . Next, Hong et al. optimized the Sudoku-S and adopted the new method called Sudoku-SR [11]. Sudoku-SR improved the embedding quality, but it didn't change the payload. In 2009, Chao et al. proposed a diamond encoding (DE) method [12], which is adaptive. It use each pixel pair to embed a digit in base A, where A= 2k 2 + 2k + 1 and k ≥ 1 . In 2013, Chen et al. proposed square matrix encoding (SME) [13] based on DE, which improves embedding quality through a new reference matrix. In 2014, Liu et al. proposed a new search algorithm(Turtle) [14], in which one digit in base 8 be embedded into the two pixels by using a unique shell model reference table. The algorithm obviously improved embedding quality, which gain the best paper award on the international conference IIHMSP-2014. But the maximum payload is limited to 1.5 bpp. In this paper, a new image data embedding with large payload based on reference matrix method is proposed. First, we generate a unique reference matrix composed of digits in base 9. The matrix guides the two digits in base 9 to be embedded into the pixel pair at the same time. Different with the recent method, we explore embedding directions in two-dimensional space twice ulteriorly to achieve larger capacity. Therefore, the maximum payload is up to log 2 9 ≈ 3.169 bpp . The experimental results demonstrate the method has a great advantage in the embedding capacity compared to the other methods. The rest of this paper is organized as follows. Section 2 presents the proposed method. Section 3 is the experimental results, the concluding remarks are demonstrated in Section 4.

2

The Proposed Scheme

Here, we propose an improved image steganographic method to achieve higher payload. We use a reference matrix generated by a base-9 numeral system function to guide cover pixel pairs’ modifications. But, distinguishing from the published work, we explore embedding directions in two-dimensional space of the reference matrix twice and gain larger capacity. 2.1

Definitions and Theorems

Definition 1. The notation M(x, y) means the number that appears in Column x and Row y of the matrix M, and M(x, y) can be calculated according to Eqs. 1-3.

F ( x) = floor ( x / 3) R ( x, y ) = x mod y M ( x, y ) = R(3F ( x) + F ( y ) + 3R( x,3) + R( y,3),9) Fig. 1 shows the 256 × 256 reference matrix M.

(1) (2) (3)

Image Data Emb bedding with Large Payload Based on Reference-Matrix

57

Fig.. 1. The 256×256 reference matrix M

Definition 2. Starting from m the origin, we divide the 256 × 256 reference matrixx M into 3 × 3 boxes (shown in n Fig. 1). We introduce the notation B(bx , by ) for the bbox containing M(x, y), where x {3bx, 3bx +1, 3bx +2} and y {3by, 3by +1, 3by + +2}. We define the digit in the to op left corner of each box as its value ( shown in oversttriking in Fig. 1), i.e. B(bx , by )=M(3bx, 3by). For instance, B(1,1) = M(3,3) = 4 and B(1,3) = M(3,9) = 6.







dinate pair (x, y), where x, y {0,1,...,254}, element M M(x, Theorem 1. For each coord y) belongs to one of the box xes named B(bx, by) and bx, by can be calculated accordingg to Eq. (4).

ìïbx = F ( x) ïï í ïïb = F ( y ) ïî y

(4)

Definition 3. For each box B(bx, by), we define a set of candidate elements CeB(bx, by) as Eq. (5) and a set of candiidate boxes CbB(bx, by) as Eq. (6). ìï M (3bx , 3by ) ï M (3bx ,3by + 1) M (3bx , 3by + 2) ü ï ï CeB (bx , by ) = ïí M (3bx + 1, 3by ) M (3bx + 1, 3by + 1) M (3bx + 2, 3by + 2)ï ý ï ïï ï (3 + 2, 3 b ) M (3 b + 2, 3 b + 1) M (3 b + 2, 3 b + 2) M b y x y x y x ï îï þ

(5)

ì B(bx -1, by -1) B (bx -1, by ) B (bx -1, by + 1)ü ïìï ï ï ï ïïCbB b , b = ï ï ï B (b , b -1) B ( b , b ) B ( b , b 1) + ý ( x y ) íï x y x y x y ïï ï ï ïïbx , by Î{1,2,...,83} ï ï þ î B(bx + 1, by -1) B(bx + 1, by ) B(bx + 1, by + 1)ï í ïï bx (by ) = 0, bx (by ) = bx (by ) + 1 if ïï ïï ïï bx (by ) = 84, bx (by ) = bx (by ) -1 if ïî

(6)

58

Y. Xia et al.

For instance, CeB(1, 1) = {M(3, 3), M(3, 4), M(3, 5), M(4, 3), M(4, 4) , M(4, 5), M(5, 3), M(5, 4), M(5, 5)} as shown in Fig. 1 and CbB(3, 3) = {B(3, 3), B(3, 4), B B(3, 2), B(4, 3), B(4, 4) , B(4, 2), B(2, 3), B(2, 4), B(2, 2)} as shown in Fig. 2. Theorem 2. In CeB(bx, by) and CbB(bx, by), where bx, by base-9 numeral system digiit appears exactly once. Proof. Pigeonhole Princiiple.

∈ {0,1,2,...,84}, each off the

Fig. 2. The 85×85 sub-matrix B

2.2

Secret Data Embed dding Phase

Given an H × W sized gray yscale cover image, we establish the cover pixels by a ppairing method initially. Any pairing p method can be used in this paper, and it wouldd be applied both in the data em mbedding phase and the data extracting phase. For simpllicity, after pairing the cover pixels, p let I be a cover image, where I = {p1p2...pH×W}. G Given a binary secret bit stream m S = b1b2...bC, the maximum value of C is log 2 9 ×H× ×W. We convert S into secret diigits D = d1d2...dN in the base-9 numeral system, where N is the total number of the con nverted secret digits and equals to H×W. The main ideaa of the proposed method is to embed e each pair of secret digits (di, di+1) into a cover piixel pair (pi, pi+1) at a time acco ording to a pre-prepared 256 × 256 reference matrix M as shown in Fig. 1. We locate pixel pair (pi, pi+1) onto the reference matrix M at Row pi and Coluumn pi+1 in order to imply secreet digits (di, di+1). According to Theorem 1, the located element M(pi, pi+1) belongs to o one of the boxes, e.g. B(bi, bj), where bi and bj can be calculated according to Eq. (4)).

Image Data Embedding with Large Payload Based on Reference-Matrix

59

Next we will choose the replacement element M(p'i, p'i+1) from one of the boxes named B(b'i, b'j) in matrix M and B(b'i, b'j) Î CbB(bi, bj), M(p'i, p'i+1) Î CeB(b'i, b'j) according to Eq (7).

ìï di = B[bi¢, b ¢j ] ïï í ïïîïdi +1 = M [ pi¢, pi¢+1 ]

(7)

Example 1: Secret data embedding We take the secret base-9 numeral system digits (di, di+1) = (4, 5) and cover pixel pair (pi, pi+1) = ( 2, 3) as an example to show the embedding method. Firstly, We locate pixel pair (2, 3) onto the reference matrix M at Row 2 and Column 3, M(2,3) Î B(bi, bj) and (bi, bj)= (0, 1) can be calculated according to Eq. (4). Since B[0, 1]= 1 ≠ di, traverse CbB(0, 1) in the sub-matrix B and get the candidate boxe B(1, 1), B[1, 1]=4= di. Ulteriorly, traverse CeB(1,1) in the matrix M and obtain M[3, 4] =5= di+1. Then, cover pixel pair (pi, pi+1) =(2, 3) can be modified to (p'i, p'i+1)=(3, 4) with slight distortion and the two secret digits (di, di+1) = (4, 5) are embedded successfully. However, when pi = 255 or pi+1 = 255, element M(pi, pi+1)does not belong to one of the boxes. To deal with this problem, we change the saturated pixels by 1 and embed the secret digits again. As an example, consider a pixel pair (255, 7) and the secret digit pair (2, 0). In this case, the saturated pixel is decreased by 1 and the cover pixel pair (255, 7) is changed to (254, 7), which belongs to box B(84, 2). Then, the same embedding method can be used to imply the secret digits. Although saturation pixels may require more modifications, it does not affect the overall performance since saturated pixels of a natural image are rare. 2.3

Secret Data Extracting Phase

At first, the pixels of the received stego image are paired by using the same pairing method, which was used in the embedding phase and shared as a key in advance. Then, the rightful recipient reconstructs the reference matrix M with the functions according to Definition 1, which can be sent to the rightful recipient as another key. After that, each stego pixel pair (p'i, p'i+1) is located onto the reference matrix M at Row p'i and Column p'i+1 to point out the corresponding secret digits. According to Theorem 1, M(p'i, p'i+1) belongs to B(b'i, b'j) and b'i, b'j can be calculated according to Eq. (4). The embedded secret digits can be calculated according to Eq.(7). Example2: Secret data extracting In order to extract secret digits (di, di+1) from the pair of stego pixels (3, 4) mentioned in Example 1, we will locate stego pixel pair (3,4) at the 3th row and 4th column of reference matrix M. The located element M(3, 4) belongs to box B(1, 1). Then, the secret digits di = B[1, 1] = 4 and di+1= M[3, 4] = 5 are extracted exactly and simply.

60

3

Y. Xia et al.

Experimental Results R

Table 1 shows the maxim mum embedding payload and the corresponding PSNR R of different methods. Experim mental results show that the average visual quality of steego images produced by this prroposed method is near 40 dB, which is lower than thatt of else methods. Beyond this concession, the proposed method achieves an embeddding rate of 3.169 bpp, which is twice the maximum payload of other methods. Table 1. The maximum embeedding payload and the corresponding PSNR of different methods

Further more, visual quaality comparison of different stego images of Goldhill ggenerated by various Matrix- based methods are shown in Fig. 3. At last, in order to concceptualize the efficiency improvement and make compaarison more intuitive, we sho ow the difference images of Lena in Fig. 4, which w were constructed by calculating the absolute difference between the cover images and the stego images. In Fig. 4, (a)), (b) and (c) are the resultant difference images generaated by using EMD, Sudoku-SR R and the proposed methods, respectively. All altered pixxels were marked in white, whille unaltered pixels were marked in black. The image (cc) is darker than (a) and (b), wh hich indicates that the proposed method alters fewer pixxels under the same payload and d achieves higher embedding efficiency.

Image Data Emb bedding with Large Payload Based on Reference-Matrix

Fig. 3. Different stego images i generated by different methods with payload=1 bpp

Fig. 4. Different D images of Lena, payload=1.585 bpp

61

62

Y. Xia et al.

4

Conclusions

A new image data embedding method with large payload based on reference matrix method is proposed. In this method, two secret digits in a 9-ary notational system can be embedded into a cover pixel pair at a time. Therefore, the maximum embedding capacity is up to log 2 9 ≈ 3.169 bpp . The experimental results demonstrate the embedding capacity along with the acceptance image quality and good embedding efficiency. Acknowledgments. This research work is supported by the National Natural Science Foundation of China under Grant No. 61472001, the Scientific Research Foundation for Doctor of Anhui University under Grant No. J10113190069 and the Research Training Foundation for student of Anhui University under Grant No.J18520152 and No. J18520162.

References 1. Johnson, N.F., Jajodia, S.: Exploring Steganography-seeing the Unseen. IEEE Computer 31(2), 26–34 (1998) 2. Hong, W., Chen, T.S.: A Novel Data Embedding Method Using Adaptive Pixel Pair Matching. IEEE Transactions on Information Forensics and Security 7(1), 176–184 (2012) 3. Tian, H., Liu, J., Li, S.: Improving Security of Quantization-Index-Modulation Steganography in Low Bit-rate Speech Streams. Multimedia Systems 20(2), 143–154 (2014) 4. Cetin, O., Ozcerit, T.: A New Steganography Algorithm based on Color Histograms for Data Embedding into Raw Video. Computers & Security 28(1), 670–682 (2009) 5. Fridrich, J.: Steganography in Digital Media: Principles, Algorithms, and Applications. Cambridge University Press, Cambridge (2009) 6. Turner, L.F.: Digital Data Security System. PatentIPN, WO89/08915 (1989) 7. Ker, A.D.: Improved detection of LSB steganography in grayscale images. In: Fridrich, J. (ed.) IH 2004. LNCS, vol. 3200, pp. 97–115. Springer, Heidelberg (2004) 8. Mielikainen, J.: LSB Matching Revisited. IEEE Signal Processing Letters 13(5), 285–287 (2006) 9. Zhang, X., Wang, S.: Efficient Steganographic Embedding by Exploiting Modification Direction. IEEE Communications Letters 10(11), 781–783 (2006) 10. Chang, C.C., Chou, Y.C., Kieu, T.D.: An information hiding scheme using Sudoku. In: Proceedings of the Third International Conference on Innovative Computing, Information and Control, pp. 17–21(2008) 11. Hong, W., Chen, T.S., Shiu, C.W.: A minimal euclidean distance searching technique for Sudoku steganography. In: Proceedings of International Symposium on Information Science and Engineering, vol. 1, pp. 515–518 (2008) 12. Chao, R.M., Wu, H.C., Lee, C.C.: A Novel Image Data Hiding Scheme with Diamond Encoding. EURASIP Journal on Information Security, 1–9 (2009) 13. Chen, J., Shiu, C.W., Wu, M.C.: An Improvement of Diamond Encoding Using Characteristic Value Positioning and Modulus Function. Journal of Systems and Software 86(5), 1377–1383 (2013) 14. Chang, C.C., Liu, Y., Nguyen, T.: A novel turtle shell based scheme for data hiding. In: The Tenth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIHMSP-2014), Kitakyushu, Japan, pp. 89–93 (2014)

A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair Histogram Fusheng Yang and Tiegang Gao() College of Software, Nankai University, Tianjin, China (300071) [email protected] Abstract. A novel image forensic method based on generalized coefficient-pair histogram in DCT domain was proposed. In the proposed method, firstly, the image is transformed by DCT, and then the differential DCT coefficient matrix of two directions, such as horizontal and vertical direction are computed, the following is to compute the coefficient-pair histogram for each differential DCT coefficient matrix within the given threshold. Finally, support vector machine (SVM) is used to classify the authentic and spliced image through training the feature vectors of authentic and tampered image. The experimental results show that the proposed approach has not only the lower computing complexity; it also outperforms all the state-of-the-art methods in detection rate with the same test database. Keywords: Coefficient-pair histogram · Generalized differential DCT coefficients · Image splicing detection

1

Introduction*

In recent decades, with the rapid developments of digital image devices and software for image-editing, it has become easy for distributing digital images through internet, and counterfeiting digital images by means of photo editing tools. Hence it is sometimes very difficult to distinguish fake image from the genuine ones, this phenomenon has aroused the people’s attention in information security, because application of forged image may cause serious problems when the image is used in governmental or lawful society. Therefore, developing some effective technique to distinguish forged image from the genuine ones has become major topic of research for people, and some feasible schemes have been proposed recently. At present, the technologies for authentication checking of digital images can be divided into two classes, referred to as intrusive or active and non-intrusive or passive methods, respectively. For active method, some particular data is embedded into digital image, when image need to be checked, the particular data is extracted from the suspicious image and it is compared with the original one, the image can be judged whether the image is altered through the comparison. However, owing to huge amounts of images exist in the real life; this method has its limitations in application. Different from active detection, the passive one can validate the authenticity of the digital images without any help of prior knowledge, thus it has attracted more and more attentions recently. Among various methods of image forgery, image splicing is one of the most common methods for creating a non-perceptible image for human visual system. With the help of An erratum to this chapter is available at 10.1007/978-3-662-47791-5_51 © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 63–71, 2015. DOI: 10.1007/978-3-662-47791-5_8

64

F. Yang and T. Gao

image splicing technology, people can creates an image by combination of different section of different images, Many efficient approaches for image splicing forgery detection have been proposed so far [1-8]. For example, Ng et al. investigated bicoherence features for blind image splicing detection, they estimated the bicoherence features of the authentic counterpart and incorporating features that characterize the variance of the feature performance, and got the image splicing detection accuracy for about 70% [1]. In [2], HilbertHuang transform (HHT) is utilized to generate the high non-linearity and non-stationary features of image, the moments of characteristic functions with wavelet decomposition is then combined with HHT features to distinguish the spliced images from the authentic ones, and a detection accuracy of 80.15% is presented. In the following work of [3], image features from moments of wavelet characteristic functions and 2-D phase congruency are used for image splicing detection, and the 82% detection accuracy is reported on Columbia Image Splicing Detection Evaluation Dataset. Later, the discontinuity of image pixel correlation and coherency caused by splicing in terms of image run-length representation and sharp image characteristics are considered, the statistical features extracted from image run-length representation and image edge statistics are used for splicing detection, and the detection rate as high as of 84.36% is achieved [4]. And then He et al. improved the above the scheme, here, the edge gradient matrix of an image is computed, and approximate run length is calculated along the edge gradient direction, and the approximate run length is applied on the predict-error image and the reconstructed images based on DWT [5]. In [6], Shi proposed a natural image model, the model includes moments feature of characteristic functions of wavelet sub-bands and Markov transition probabilities feature of difference 2-D arrays, and the method achieves an image splicing detection rate of 91.87% on the Columbia Image Splicing Detection Evaluation Dataset, and they also apply multi-size block discrete cosine transform to the given test image, and extract image features from Cr channel, a chroma channel in YCbCr color space, and get the accuracy of image tampering detection as high as 97.9% [7]. With the development of detection technology for image splicing, some attentions are paid to the color image splicing, and more and more schemes are tested on the CASIA Tampered Image Detection Evaluation Database, the database includes various kinds format of splicing image, and is now becoming a public test database. For example, expanded Markov features generated from the transition probability matrices in DCT domain, and the features in DWT domain to characterize the three kinds of dependency among wavelet coefficients across positions, scales and orientations are used to carry the detection of image splicing, and detection rate of 89.76% is achieved on the CASIA Tampered Image Detection Evaluation Database V2.0 [8]. In [9], a multi-scale Weber local descriptors (WLD) based image splicing detection method is proposed, in the scheme, the image features are extracted from differential excitation and gradient orientation of a chrominance components of an image, and the detection rate reaches 94.19% for CASIA v1.0 and 96.61% for CASIA v2.0. In [10], an image splicing detection scheme based on Local Binary Pattern (LBP) and Discrete Cosine Transform (DCT) is proposed. In their scheme, LBP is firstly calculated and then it is transformed into frequency domain using 2D DCT, lastly, standard deviations are used as image features, the detection accuracy of is up to 97% for CASIA v1.0 and 97.5% for CASIA v2.0, and it is the best accuracy so far for color image splicing detection. Obviously, for all the schemes of image forgery detection, the selection of image features plays an important role for detection rate; in the meantime, if the computation

A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair

65

complexity of the algorithm is higher; the calculation of the scheme will be a little time-consuming. In this paper, a novel image splicing detection algorithm based on coefficient-pair histogram in the frequency domain is proposed, the proposed scheme has lower dimensionality of the feature vector, and outperform all the state-of-the-art methods in detection rate on the CASIA v1.0 and CASIA v2.0. The rest of the paper is organized as follows, in Section 2; the basic method is introduced in detail. The experimental results and some analysis and comparisons are discussed in Section 3, and the conclusions are given in Section 4.

2

The Proposed Scheme

In this section, some preliminaries will be given firstly, then, the proposed scheme will be described in detail. 2.1

Pixel-pair Histogram and Its Form

The pixel-pair histogram is an matrix where data of each location (i,j) in matrix represents the number of times that the pixel pairs with the intensities i and j, it is given in the following form.

p (i, j ) 

  ( F  i, F

1

 j)

(1)

p(i, j ) is the number of times,  () is 1 only when the condition F  i, F 1  j is satisfied. Traditionally, it is defined in image spatial domain, and it has been used in the steganography and steganalysis applications [11]. Recently, pixel-pair histogram has been used in image forensic [12]. The typical example of pixel-pair histogram in spatial domain is depicted in Fig. 1.

where

1

2

0

1

2

0

0

2

0

1

2

2

1

1

0

1

(a) 1

2

0

1

2

0

1

1

0

0

2

0

1

(b) 0 0 1 1 1 2 1 … 3 255 0 0

2



255

3

1

0

0

1

3

0

0

1

1

0

0

0

0

0

0

0

0

0

0

(c)

2

2

1

66

F. Yang and T. Gao

Fig. 1. Example of pixel-pair histogram of a typical image using column ordering form (a) Typical image (b) 1D-vector (c) Pixel-pair histogram

2.2

Generalized Coefficient-pair Histogram in DCT Domain

Discrete Cosine Transform (DCT) is an orthogonal mathematical transformation, which has features of fast calculating speed and easy to implement, it is defined as follows M 1N 1

C (u, v)  a(u)a(v)

 f ( x, y) cos[ x 0 y 0

(2 x  1)u (2 y  1)v ] cos[ ] 2M 2N

(2)

In above equation, where C (u, v) are called DCT coefficients, and

 1  a(u )   M  2M 

u0 u  1,2,  M  1

 1 v0 a (v )   N  N v  1,2,  N  1 When DCT coefficients of an image is generated through DCT, it is donated by matrix D(x, y), where, x and y are the same value as that of the width and length. Next, generalized coefficient-pair histogram in DCT domain will be presented in the following step: (1) Differential matrix of D ( x, y ) in horizontal and vertical direction are calculated by the following formula Dh ( x, y )  D( x, y )  D( x  n, y )

(3)

Dv ( x, y )  D( x, y )  D( x, y  n)

(4)

where, x and y are index of position information of DCT coefficients. Dh ( x, y) stands for horizontal direction; Dv ( x, y ) represents for vertical direction; n is the step length of differential matrix. (2) Coefficient-pair histograms of two differential matrixes are calculated according to the definition in 2.1. Firstly, convert the differential matrix D h ( x, y ) and Dv ( x, y) into a row vector, respectively; then calculate the coefficient-pair histogram matrix according to the formula (1). In fact, in the processing of coefficient-pair histogram matrix, image pixel value is replaced with coefficient value in DCT domain, and the corresponding pixelpair histogram matrix becomes the coefficient-pair histogram matrix. Obviously, the different step length n will result in the different coefficient-pair histograms. And in order to reduce the complexity of calculation, for a given positive threshold T, only the coefficient-pair histograms within interval [-T, T] is calculated, thus two coefficient-pair histograms matrixes are got, and the total dimensions of matrixes are 2  ( 2  T  1) 2 .

A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair

67

Fig. 2. General flow diagram of the proposed scheme

2.3

Classifier Training

In order to distinguish the authentic from spliced images, some efficient features can be used to discriminate them, and pattern classifier is the most useful tool in this aspect. Here, the method of classifier training is described as follows. Firstly, coefficient-pair histograms matrixes of authentic image and spliced image are used as features, and they are trained by LIBSVM with Radial Basis Function (RBF) kernel. After that, the features of some image for test will be used to testify the effectiveness of the classifier. The entire diagram of scheme is depicted in Fig. 2.

3

Experimental Results and Analysis

In this section, some experimental condition and results are presented, in the meantime, some comparison on effectiveness between the proposed scheme and the existing algorithm are also given. 3.1

Database for Test

CASIA V1.0 is a splicing detection evaluation database. It contains 800 authentic and 921 spliced color images of size 384×256 pixels with JPEG format. The authentic images are divided into several categories (scene, animal, architecture, character, plant, article, nature and texture) according to their content. Compared with CASIA V1.0, CASIA V2.0 is with the larger size and with more realistic and challenged fake images by using post-processing of tampered regions. It contains 7491 authentic and 5123 tampered color images with difference size, various from 240×160 to 900×600 pixels. Unlike CASIA v1.0, CASIA v2.0 contains uncompressed images and also JPEG images with different Q factors. Now, more and more proposed image splicing detection algorithms are evaluated based on CASIA database [14].

68

3.2

F. Yang and T. Gao

Experimental Method

In our experiments, LIBSVM is selected, and a RBF kernel is used. In the process, all the authentic image are labeled as +1, while the spliced images are labeled as -1. The software is implement in Windows platform, and Matlab R2013b is used as the software test. To evaluate the performance, the True Positive Rate (TPR), True Negative Rate (TNR), Accuracy and Precision are used to verify the effectiveness of the classifier. They are defined as follows. TP TP  FN TN TNR  TN  FP TP  TN Accuracy  TP  FN  FP  TN TP Pr ecision  TP  FP TPR 

(5) (6) (7) (8)

where, TP (True Positive) stands for the number of correct classified image for all the authentic image; FN False Negative is the number of incorrect classified image for all the authentic image; TN True Negative is the number of correct classified image for all the spliced image; FP False Positive means the number of incorrect classified image for the entire spliced image. For database CASIA 1.0 and V2.0, independent 10-fold cross validation test is implemented, respectively, and in every one of the tests, 5/6 authentic and 5/6 spliced images in database are randomly selected for training, and the rest images are for test the performance of the classifier.



3.3



) (





The Detection Performance of the Proposed Algorithm

In the experiments, the parameter of step length of n is set to 1 and 8, and the total dimensions of image features is 4  ( 2  T  1) 2 , the parameter T is set to 6, then, the dimensions of the features is 676. For CASIA V1.0, 667 authentic images and 767 spliced images are used for training, and the remaining 133 authentic and 154 spliced images are used to test, the average test results of ten groups classifier are that, the average of TPR is 99.40% for authentic images; while the TNR is 99.09% for spliced images, and the average of the accuracy is 99.24% for the classifier, the average of precision is 98.95%. For CASIA V2.0, in order to prove the effectiveness of the proposed algorithm, all the images in the database of CASIA V2.0 are randomly divided into five groups; every group has the same number of images. Then four groups images are used for training by SVM, the remaining group are used to test. Thus test is implemented for 10 runs, respectively, the average of accuracy reaches 97.56%. 3.4

The Comparison of Performance with other Approaches

In order to evaluate the proposed method comprehensively, some comparison between the proposed scheme and some existing approaches are given based on the

A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair

69

same database for test. Table 1 is the comparison of performance with the same database of CASIA V1.0. In table 2, some comparisons results with the recently proposed algorithm for detecting image splicing on the database CASIA V2.0 are given. For CASIA v1.0, the comparisons of ROC are shown in Fig. 3, the ROC curves show that the proposed scheme achieved a higher the true positive rate than all others scheme at all the false positive rate. Table 1. The comparison of performance between the proposed algorithm and other algorithm in CASIA V1.0 Feature vector

Dimensioality

Accuracy (%)

Algorithm [7]

266

97.90

Algorithm [9]

770

94.52

Algorithm [10]

--

97.00

The proposed

676

99.24

Fig. 3. The comparison of ROC within some existing schemes in CASIA 1.0 TABLE2

Table 2. The comparison of performance between the proposed algorithm and other algorithm in CASIA V2.0 Feature vector

Dimensionality

Accuracy (%)

Algorithm [8]

100

89.76

Algorithm [9]

770

96.61

Algorithm [10]

--

97.50

The proposed

676

97.56

For CASIA v2.0, the comparisons of ROC are shown in Fig. 4, the ROC curves show that the proposed scheme achieved a higher the true positive rate than all others scheme at all the false positive rate.

70

F. Yang and T. Gao

Fig. 4. The comparison of ROC within some existing schemes in CASIA 2.0

It can be seen that form the table and the ROC, the proposed scheme performs better in the CASIA v1.0 than that in the CASIA v2.0. How to improve the effectiveness of proposed scheme so as to give the high detection rate in the CASIA 2.0 is our duty in the future.

4

Conclusions

A novel image forensic method based on generalized coefficient-pair histogram in DCT domain was proposed, different from traditional definition of pixel-pair histogram in spatial, generalized coefficient-pair histogram of differential matrix in DCT domain is proposed. In the proposed method, coefficient-pair histograms are obtained from the differential DCT coefficient of two directions, and support vector machine (SVM) is used to classify the authentic and spliced image through training the feature vectors of authentic and tampered image. The experimental results show that the proposed approach has not only low computing complexity; it also outperforms all the state-of-the-art methods in detection rate with the same test database. Acknowledgments. The authors would like to thank the support from the Key Program of Natural Science Fund of Tianjin (Grant #11JCZDJC16000).

References 1. Ng, T.-T., Chang, S.-F., Sun, Q.: Blind detection of photomontage using higher order statistics. In: IEEE ISCAS, Vancouver, Canada, pp. 688–691 (2004) 2. Fu, D., Shi, Y.Q., Su, W.: Detection of image splicing based on Hilbert-Huang transform and moments of characteristic functions with wavelet decomposition. In: Shi, Y.Q., Jeon, B. (eds.) IWDW 2006. LNCS, vol. 4283, pp. 177–187. Springer, Heidelberg (2006)

A Novel Image Splicing Forensic Algorithm Based on Generalized DCT Coefficient-Pair

71

3. Chen, W., Shi, Y.Q., Su, W.: Image splicing detection using 2-d phase congruency and statistical moments of characteristic function. In: Imaging: Security, Steganography, and Watermarking of Multimedia Contents (2007). 6505R 4. Dong, J., Wang, W., Tan, T., Shi, Y.Q.: Run-length and edge statistics based approach for image splicing detection. In: Kim, H.-J., Katzenbeisser, S., Ho, A.T. (eds.) IWDW 2008. LNCS, vol. 5450, pp. 76–87. Springer, Heidelberg (2009) 5. He, Z., Sun, W., Wei, L., Hongtao, L.: Digital image splicing detection based on approximate run length. Pattern Recognition Letters 32(12), 1591–1597 (2011) 6. Shi, Y.Q., Chen, C., Chen, W.: A natural image model approaches to splicing detection. In: Proceedings of the 9th Workshop on Multimedia & Security, pp. 51–62 (2007) 7. Sutthiwan, P., Shi, Y.Q., Dong, J., et al.: New developments in color image tampering detection. In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 3064–3067 (2010) 8. He, Z., lu, W., Sun, W., Huang, J.: Digital image splicing detection based on Markov features in DCT and DWT domain. Pattern Recognition 45, 4292–4299 (2012) 9. Saleh, S.Q., Hussain, M., Muhammad, G., Bebis, G.: Evaluation of image forgery detection using multi-scale weber local descriptors. In: Bebis, G., Boyle, R., Parvin, B., Koracin, D., Li, B., Porikli, F., Zordan, V., Klosowski, J., Coquillart, S., Luo, X., Chen, M., Gotz, D. (eds.) ISVC 2013, Part II. LNCS, vol. 8034, pp. 416–424. Springer, Heidelberg (2013) 10. Alahmadi1, A.A., Hussain1, M., Aboalsamh, H.: Splicing image forgery detection based on DCT and Local Binary Pattern. In: Proceedings of IEEE Global Conference on Signal and Information Processing, pp. 253–256 (2013) 11. Qian-lan, D.: The blind detection of information hiding in color image. In: Proceedings of Second Int. conf. Computer Engineering and Technology, vol. 7, pp. 346–348 (2010) 12. Shabanifard, M., Shayesteh, M.G., Akhaee, M.A.: Forensic detection of image manipulation using the Zernike moments and pixel-pair histogram. IET Image Process 7(9), 817–828 (2013) 13. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011) 14. Dong, J.: CASIA tampered image detection evaluation database (2011). http://forensics. idealtest.org

A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching Zhi Dou(), Yubing Han, Weixing Sheng, and Xiaofeng Ma School of Electronic and Optical Engineering, Nanjing University of Science and Technology, No. 200 Xiaolingwei, Nanjing 210094, China [email protected]

Abstract. Non-local means (NLM) is a powerful denoising algorithm that can protect texture effectively. However, the computational complexity of this method is so high that it is difficult to be widely applied in real-time systems. In this paper, we propose a fast NLM denoising algorithm which can product comparable or better result with less computation time than the traditional NLM methods. Some experimental results are provided to demonstrate the superiority of the proposed method. Keywords: Non-local means · Patch-based processing · Hierarchical searching

1

Introduction

Image and video denoising is one of the most important and challenging problem in the image processing community that has been studied for many years and continues to attract researchers with an aim to perform better restoration in the presence of noise. Many denoising algorithms have been proposed, such as median filtering, adaptive wiener filtering, wavelet denoising [1,2], etc. However, for various reasons, these methods cannot protect texture very well; an excellent denoising algorithm is required. In recent years, because the highly-structured geometrical form of natural images was taken into account, the nonlocal means (NLM) algorithm and some developed versions have been presented. NLM algorithms can achieve results which are close to the state-of-the-art. In the original NLM method, the restored gray value of each pixel is obtained by the weighted average of the gray values of all pixels in the image, and the weights of pixels is calculated based on the similarity between the neighborhood of the current pixel and the neighborhood corresponding to the other pixels [3-6]. To reduce the cost of computational complexity, the modified NLM schemes process the weighted average considering a limited area around the current pixel instead of the entire image [7]; even so, they cannot yet be extensively used in real-time systems for the excessive complexity. In this paper, a fast NLM denoising algorithm is proposed. Unlike the previous methods to restore image by weighted average of each pixel, we process image based on patch. For each patch, we find the similar area by fast scan, note that the similar area is always discontinuous and irregular geometrically; and then we restore the current patch by weighted average of each patch in the similar area. © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 72–79, 2015. DOI: 10.1007/978-3-662-47791-5_9

A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching

2

73

Proposed Method

In the proposed method, we create a two-dimension low pass filter of size 3×3 with standard deviation (σ=0.8) and use it to process the original image I, then we obtain a convoluted image I'. We divide images I and I' into a set of nonoverlapping patches with the same size of N×N pixels (N is 5 in this paper). For each patch, we use a method of two-step search to effectively find patches of high similarity for weighted average and obtain favorable denoising result. In the first searching step, as illustrated in Fig .1, we characterize the patch of I' by its center pixel and the special 3×3 neighborhood of I’(i,j) as:

P '(i, j)

 I '(i− 2, j− 2) I '(i− 2, j) I '(i − 2, j+ 2)  =  I '(i, j− 2) I '(i, j) I '(i, j+ 2)  ,  I '(i + 2, j− 2) I '(i + 2, j) I '(i + 2, j+ 2) 

(1)

for the current batch P' (i,j) , we calculate the similarity between itself and the special neighborhood of each the pixel in a search window of size M×M (M equal to 19 in this paper), which is formulated as: 2 w '(a,b) (i, j) =|| V ( P '(i, j) ) − V ( P '(a,b) ) ||2,α ,

(2)

where V(P') is the original 3 channel (red, green and blue channels) value matrix of the patch P' in convoluted image I'. We find L (L equal to 20 in this paper) pixels whose special neighborhoods P'(a,b) are most similar to the given P'(i,j), and define the 3×3 neighborhoods of these pixels I(a,b) as the similar area for the second searching step which are denoted by S(i,j).

I'(i,j) S(i,j) P'(i,j)

I'(a,b)

P'(a,b)

I' Fig. 1. The first searching step in I' to find the similar area

74

Z. Dou et al.

In the second searching step, as shown in Fig .2, for the original image I, corresponding to P’(i,j) in I’, the patch to be restore is P(i,j) which is the 5×5 neighborhood of the center pixel I(x,y). Unlike traditional NLM methods, we calculate the weighted average of the current patch P(i,j) in the similar areas S(i,j) by



NL(V )( P(i, j) ) =

w(i,(c,d) j) V ( P(c,d) ) ,

(3)

I (i, j)∈S(i, j)

where NL(V)(P(i,j)) represents the restored 3 channel value matrix of the current patch P(i,j), V (P(c,d)) is the 3 channel value matrix of the similar patch P(c,d) that the center pixel I(c,d) of which is in S(i,j), and the similarity between P(i,j) and P(c,d) is defined as:

w(i,(c,d) j) =

1 z(i, j)

exp( −

|| V ( P(i,j) ) − V ( P(c,d) ) ||22 h2

),

(4)

where Z(i,j) is a normalizing factor calculated by

Z(i , j ) =



w(i,(a,b) j) .

(5)

I (a,b)∈S(i, j)

I(i,j)

I(c,d)

S(i,j)

B (i,j) B (c,d)

I Fig. 2. The second searching step in I to restore the current patch

3

Experimental Results

To assess the performance of our algorithm, we employ three images (leaf, feather and carving) corrupted by additive white Gaussian noise with standard deviation

A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching

75

σ=0.03 to test the there denoising algorithms (original NLM, K-SVD [8] and the proposed method). In all our experiments, for original NLM, we used a 13×13 neighborhood window for average computation and a 5×5 patch for average RGB values computations; for K-SVD, block size equal to 8 and redundancy factor is 4; for the proposed method, we use a 5×5 patch and a 19×19 neighborhood window in the first searching, the number of similar patches is 20. The parameters h in original NLM and our method are the optimal value [9]. We compare the run times and the PSNR values of three test images from the three methods, MATLAB programs were run on a PC (Intel I7-2720QM CPU@ 2.2GHz, SDRAM 4GBytes). As shown in Table 1, for each test image, the cost of the proposed method is much less than the other algorithms and the PSNR value of the method is highest. On the other hand, all the test and resultant images are illustrated in Figs. 3-5, the results of the proposed method are comparable or better than those of two other algorithms. Table 1. Run time (second) and denoising results (PSNR in dB) of noisy images Leaf, Feather and Carving, obtained with original NLM, K-SVD and the proposed algorithm Image

Leaf PSNR:16.1667 dB

Feather PSNR:15.9700 dB

Carving PSNR:15.7373 dB

Method

Run time (S)

PSNR (dB)

Original NLM

657

23.7008

K-SVD

190

22.8732

Proposed

74

24.4902

Original NLM

781

23.0358

K-SVD

238

21.1325

Proposed

72

24.8982

Original NLM

707

21.5391

K-SVD

264

19.7055

Proposed

71

22.1710

Fig. 3(a) is a noisy image of irregular textures. As shown in Fig. 3(b), big textures in the result of the original NLM are well protected; however, small veins of the leaf are practically invisible. The resultant image of K-SVD is shown in Fig. 3(c); it is blurry obviously and loses some details. As can be seen in Fig. 3(d), the proposed method can protect all the textures well and product the better result, compared with previous two methods. Fig. 4(a) is a noisy image of regular textures. As illustrated in Fig. 4(c), the resultant image of K-SVD is blurry obviously. For the original NLM and the proposed method, as shown in Fig. 4 (b) and (d), the protections of the textures are both effective, and the results of which are better than that of K-SVD.

76

Z. Dou et al.

Fig. 5(a) is a noisy image consists of various geometrical patterns. As shown in Fig. 5(b), big patterns in the result of the original NLM are clear, while small patterns become a little blurry. The resultant image of K-SVD, shown in Fig. 5(c), loses a great deal of details. Compared with previous two methods, the proposed method can protect all the geometrical patterns well and product the better result, as can be seen in Fig. 5(d).

(a)

(b)

(c)

(d)

Fig. 3. Image “Leaf”. (a) Original image, (b) result of original NLM, (c) result of K-SVD, (d) result of proposed method.

A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching

(a)

(b)

(c)

(d)

77

Fig. 4. Image “Feather”. (a) Original image, (b) result of original NLM, (c) result of K-SVD, (d) result of proposed method.

78

Z. Dou et al.

(a)

(b)

(c)

(d)

Fig. 5. Image “Carving”. (a) Original image, (b) result of original NLM, (c) result of K-SVD, (d) result of proposed method.

4

Conclusions

A fast nonlocal means color image denoising method was proposed in this paper. In order to significantly accelerate the algorithm, a patch-based restoration technique is employed; to obtain more patches of high similarity, we locate the similar area via the fast scanning, and then calculate the weighted average in the area. It was shown experimentally that the proposed method can products comparable or better result with much lower computation cost.

A Patch-Based Non-local Means Denoising Method Using Hierarchical Searching

79

References 1. Buades, A., Coll, B., Morel, J.M.: A review of image denoising algorithms, with a new one. SIAM Interdisc. J. 4(2), 490–530 (2005) 2. Gupta, S., Meenakshi.: A review and comprehensive comparison of image denoising techniques. In: International Conference on Computing for Sustainable Global Development, pp. 972−976 (2014) 3. Buades, A., Coll, B., Morel, J.-M.: A non-local algorithm for image denoising. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 60–65 (2005) 4. Mohan, M.R.M., Sheeba, V.S.: A novel method of edical image denoising using bilateral and NLm filtering. In: International Conference on Advances in Computing and Communications, pp. 186−191 (2013) 5. Lu, L., Jin, W., Wang, X.: Non-Local Means Image Denoising With a Soft Threshold. IEEE Signal Processing Letters 22(7), 833–837 (2015) 6. Coupé, P., Manjón, J.V., Robles, M., Collins, D.L.: Adaptive multiresolution non-local means filter for three-dimensional magnetic resonance image denoising. IEEE Image Processing 6(5), 558–568 (2012) 7. Dixit, A.A., Phadke, A.C.: Image de-noising by non-local means algorithm. In: Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 275−277 (2013) 8. Elad, M., Aharon, M.: Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries. IEEE Trans. Image Processing 15(2), 3645–3736 (2006) 9. Van De Ville, D., Kocher, M.: SURE-Based Non-Local Means. IEEE Signal Processing Letters 16(11), 973–976 (2009)

Approach for License Plate Location Using Texture Direction and Edge Feature Fujian Feng() and Lin Wang Guizhou Key Laboratory of Pattern Recognition and Intelligent System, Guizhou Minzu University, Guiyang, Huaxi 550025, China [email protected]

Abstract. This paper presents a new method of license plate location under complex background. The texture direction map was obtained by gradient direction field and the calculation of the original image. License plate candidate area is determined using the method of interval judgment edge information through texture direction and binary image. Finally, the plates are accurately positioned using the improved regional notation. This experiments results demonstrate the great robustness and efficiency of our method. Keywords: License plate location · Gradient direction · Edge feature · Texture direction

1

Introduction

License plate recognition is an important part of intelligent transportation system to automatically obtain the vehicle information. Accurate positioning of the plate is very difficult due to uneven illumination, complex environment as well as a wide variety of vehicles. At presents, people have widely researched the method of license plate location [1-5]. Shuang Ma proposed the multi-decision mechanism which combines the texture features and the color features of license plate to achieve the license plate location [1]. License plate location is realized by combining mathematical morphology and color features [2]. In HSV space using fuzzy logic methods have been proposed Feng Wang [3]. License plate is positioned through the rectangular box to retrieve the edge image [4]. The above method is feasible only Simple background or the car itself is processed, but license plate information is acquired limited range of applications in road vehicle detection. Therefore difficult to apply to the actual scene. In this paper, the license plate location method is proposed based on edge features and texture orientation and according to the license plate information peculiar texture. Direction image is acquired using the gradient direction field. Then adopts the method of interval decision edge information to accurate determine the license plate region. This approach can solve the license plate location problem in the more complex scenes. © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 80–86, 2015. DOI: 10.1007/978-3-662-47791-5_10

Approach for License Plate Location Using Texture Direction and Edge Feature

81

The remaining part of this paper is organized as follows. Section 2 introduces methods for image pre-processing. In section 3, license plate location algorithm is presented. Section 4 shows some experiment results. Finally, conclusions are drawn in Section 5 with the future work discussed as well.

2

Image Preprocessing

In Fig. 2, we can see that difficult to obtain feature information and detect the license plate area due to the impact on the weather, lighting and road stains. Therefore, the image preprocessing has become the critical step for subsequent operations in order to reduce the effects of noise. 2.1

Edge Detection

So far, there have been many maturity edge detection algorithms, such as first-order differential operator, second laplacian operator. As shown in Fig. 1, we select Sobel operator to detect the image edge due to the edge of the image has noise immunity and Simple calculation.

 −1 0 1   −2 0 2     −1 0 1  (a)

 −1 −2 −1 0 0 0    1 2 1  (b)

Fig. 1. Sobel operator: (a) horizontal operator, (b) vertical operator

From Fig. 3, we can see that information of interest cannot be protruding through the edge detection. So Shen operator is used to image enhancement processing.

Fig. 2. Road vehicle images of the real scene

2.2

Fig. 3. Detection results by Sobel operator

Shen-Based Image Enhancement Algorithm

From Fig. 3, we can see that noise is mainly concentrated in the high-frequency portion. Thus Shen low-pass filter [7] is used to smooth the edge image. Smoothing processing is as follows.

82

F. Feng and L. Wang

Given a signal I(x) (x=0, 1, 2… N), IL(x) is obtained, with  I L (0) = I (0)   I L ( x) = aI ( x) + (1 − a ) I L ( x − 1)

x = 1, 2,...N

(1)

The output IR(x) of the Shen low-pass filter will be  IR (N ) = IL (N )   I R ( x) = aI L ( x) + (1 − a ) I R ( x + 1)

x = N − 1, N − 2,...,1, 0

(2)

Where, a is the smoothing parameter and 0 TM F (i, j) ≤ TM

(5)

F (i, j) > TM F (i, j) ≤ TM

(6)

According to the results of the above calculations, the BFM is generated by equation (7), as shown in Figure 3 (a). Also, according to TN and T , F and F can N

N

N

be got by equation (8). In Figure 3 (b) we can see the result of BFN.

BFM = FM  FM

(7)

BFN = FN  FN

(8)

FM and FN will be processed by OR operator by equation (9), then the out is FMN as shown in Figure 3 (c).

FMN = FM FN

(9)

But in Figure 3 (c), only two targets are shown, and other information of the image does not be shown, which will increase the isolation of the targets, and reduce the significance of their existence. If we make two targets more clearly in the whole image, the relationship between them must be analyzed in the element values and the background element values. On the terms of this image used in this paper, the element values around the first target are mostly 1s (Figure 3 (a)), the element values around the second target are mostly 0 (Figure 3 (b), If a logical operator is selected to merge them into the overall binary image, one of them will disappear. So that, two logic operations will used to retain all the 0s for the first target and retain all the 1s for the second target, the following approach can be achieved as follows: Step1, calculate the overall threshold T:

T=

1 p q  F (i, j) pq i =1 j =1

(10)

Step2, binary F, then get the BF as shown in Figure 1 (h):

1 F (i, j) > T BF =  0 F (i, j) ≤ T

(11)

338

S. Wang

Step3, use “AND” operator and “OR” operator, merge FM and FN into BF, as shown in Figure 3 (d).

 BF (i, j) and FM (i, j), (i, j) ∈ M BF (i, j) =   BF (i, j) or FN (i, j), (i, j) ∈ N

(a) target1 detection

(b) target2 detection

(c) two detections

(12)

(d) the last result

Fig. 3. Target Detection

Figure 1 (h) and Figure 3 (d) are compared together; the two small target objects are more clearly detected by this algorithm. If another target is needed to detect, it will be selected again, and this method is used to extract target information. But it should be noted that if multiple detected targets will be captured into the overall binary image, we must take into account the relationship between the target itself and the surrounding elements, then make decision to take the appropriate logical operations, and ensure that the target will still exist after integrating.

5

Conclusions

In this paper, a detection algorithm for small target is based on the ROI, and ROIPOLY function is used to select the target. The select range of this function is not limited to the normal image, its sole purpose is based on actual needs to extract some information within a certain range. For the calculation of the threshold, a relatively simple and common method is to calculate the mean of selected targets; different targets have different thresholds, which is crucial to extract information. When multiple targets were respectively detected, the background values must be considered, and then the target will be incorporated into the overall image. On the other hand, there are many 0s and 1s in the binary image; it is more convenient to select the appropriate logical operations. But when you want to select another more target, the algorithm must be repeated to implement single-target detection, which is need to improve later.

References [1] [2]

Cao, F.: Good continuation in digital images. In: Proceeding of ICCV 03, Nice, vol. 1, pp. 440–447 (2003) Chan, T., Vese, L.: Active contours without edges. IEEE Transactions on Image Processing 10(2), 266–277 (2001)

An Adaptive Detection Algorithm for Small Targets in Digital Image [3] [4] [5] [6] [7] [8] [9]

[10] [11]

[12] [13]

[14] [15]

339

Desolneux, A., Moisan, L., Morel, J.M.: Meaningful alignments. International Journal of Computer Vision 40(1), 7–23 (2000) Desolneux, A., Moisan, L., Morel, J.M.: Edge detection by Helmholtz principle. Journal of Mathematical Imaging and Vision 14(3), 271–284 (2001) Kimmel, R., Bruckstein, A.M.: On regularized laplacian zero crossings and other optimal edge integrators. International Journal of Computer Vision 53(3), 225–243 (2003) Cao, F., et al.: Extracting Meaningful Curves from Images. Journal of Mathematical Imaging and Vision 22(3), 159–181 (2005) Mallat, S.: A Wavelet Tour in Signal Processing, 2nd edn. Academic Press (1999) Meyer, F., Maragos, P.: Nonlinear scale-space representation with morphological levelings. J. of Visual Comm. and Image Representation 11, 245–265 (2000) Salembier, P., Garrido, L.: Binary partition tree as an efficient representation for image processing, segmentation, and information retrieval. IEEE Transactions on Image Processing 9(4), 561–576 (2000) Salembier, P., Serra, J.: Flat zones filtering, connected operators, and filters by reconstruction. IEEE Transactions on Image Processing 4(8), 1153–1160 (1995) You, J., Cohen, H.A., Pissaloux, E.: A new approach to object recognition in textured images. In: Proceedings of International Conference on Image Processing. [S. l.], pp. 639-642. IEEE Press (1995) Harris, C.G., Stephens, M.: A combined corner and edge detector. In: Proceedings of the 4th Alvey Vision Conference, Manchester, UK: [s. n.], pp. 147–151 (1988) Itti, L., Koch, C., Niebur, E.: A Model of Saliency-based Visual Attention for Rapid Scene Analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(11), 1254–1259 (1998) Rao, P.R., Ballard, D.H.: An Active Vision Architecture Based on Icon Representations. Artificial Intelligence 78(1), 461–505 (1995) Linlin, C., Zhaojiong, C.: Image Non-Photorealistic Rendering Algorithm Based on Regions of Interest. Computer Engineering 36(16), 195–197 (2008)

A Leaf Veins Visualization Modeling Method Based on Deformation Duo-duo Qi, Ling Lu(), and Li-chuan Hu Department of Information Engineering, East China Institute of Technology, Taiyuan, China [email protected]

Abstract. We proposed a leaf vein visualization modeling method through deformation. Based on blade modeling which is on the basis of deformed rectangle is applied to the midrib and lateral vein which are made by conical function. In this way, different shapes of leaf veins produced are similar to what the realistic veins. The method is suitable for the generation of the midrib (principal vein) and lateral vein (secondary vein) of leaf, especially for the sleek lateral vein. Keywords: Deformation · Leaf vein · Visualization

1

Introduction

Plant leaves and veins are both complicated and having certain rules. Plant leaves and vein have various shapes and venations, each shape and venation has its respective features, especially the primary vein and secondary veins. So the vein of the visualized modeling is a difficult task in the area of computer graphics modeling. Currently, for the geometric shape of branches and leaves of the plant , the number of scholars who study plants veins are relatively small. Runions[1] introduce a kind of biologically-motivated algorithms for generating leaf venation patterns. The effective implementation of the algorithms relies on the use of space subdivision (Voronoi Diagrams) and time coherence between iteration steps. Depending on the specification details and parameters used, the algorithms can simulate many types of venation patterns. Wen-Zhe Gu [2] proposed an algorithm for generating model of leaf veins. B-spline curves is used to realize Blade profile and fractal LS grammar is used to realize the first and the second vein of simulation, Voronoi diagram to blade with mesh segmentation is used to generate tertiary veins. Wen-Zhe Gu [3] improved of veins built method. The veins which L-system joined to Runions modeling methods can generate amore realistic model of vein at a faster rate. Yan-Feng Xue[4] introduce that the user interactively sketches the outline of the leaves, main veins and the second layer veins, the third and deeper veins are generated by the particle system. second, the system finds the closest attraction node (assignation veins in interaction) for the grid nodes of the leaf, generates the gravity graph with direction and randomly scatters particles within the outline of the leaf, particles © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 340–352, 2015. DOI: 10.1007/978-3-662-47791-5_39

A Leaf Veins Visualization Modeling Method Based on Deformation

341

moved under the direction given by the gravity graph. By constantly merging with other particles veins were generated. Nan Zhou[5] use the deformation what deformation of the cone equation, by rotating and moving the deformation cone, completed of the first, second veins visualization modeling, this method is good for the simulation of the effect of the leaf vein skeleton. This paper is related to computer graphics and botany knowledge, mainly connected with the 3D graphics of realistic graphics generation and the types of veins in botany, particularly related to a leaf veins visualization modeling method based on deformation. Currently, the shortcoming of realistic graphics generation method of 3D veins is that the 3D real effect which mostly is the result of 3D curve with a certain width is not good. This method using conic simulation specific3D effect. In this paper, the main technical problem we have to resolve is providing a method what is based on deformation to generate veins. This method uses the shape of blade to control the shape of vein, and it is not only simple with the modeling process, but also good with the simulation effect.

2

The Type of Leaf Blade

Vein is associated with the shape of the blade. A typical leaf consists of leaf as well as petiole connected with stem. Simple leaf is single leaf, and its boundary has three types of sleek, serrate and split .it will form some recognizable lobes, when the blade of the sawtooth divided in a large extent. This paper discusses a single serrated blade. Ling Lu [6] refer to pictures as shown in figure 1.

(a)

(b)

(c)

(d)

Fig. 1. The type of Simple leaf (a) sleek blade (b) saw blade (c) saw or divided blade (d) divided blade

The basic shape of a single blade is similar to ellipse, its type of base and apex included acute , acuminate , circular , concave and so on, the blade can be asymmetric and flexible as shown in figure 2. Using the idea of deformation to generate the geometry of blade, And using the parameters of rectangular plane to define the initial contour of the blade. Use different amplitude and frequency of the triangle wave to summation to outstand the appearance of blade. For example, the parameters equation(1) of ginkgo leaf is in the following. Lu Ling [6] referred to in several pictures, as shown in figure 2.

342

D. Qi et al.

(a)

(b)

(c)

(d)

Fig. 2. Leaf base and top (a) the bottom is tip and the top is tip (b) the bottom is circle and the top is tip (c) the bottom is concave and the top is circle(d) the bottom is tip and the top is concave

x = a x u + 1.1b y usin(π v − π/ 2) y = b y v + 0.05v b y sin(11 uπ − 1.5) − 0.6 vb y sin((u − 0.5) π)

(1)

(−0.5 ≤ u ≤ 0.5,0 ≤ v ≤ 1)

where: ax is the width of the initial rectangular (in horizontal direction).by is the height of the initial rectangular (in vertical direction). Variable unit is pixels (the same below).

3

The Type of Leaf Vein

The leaf vein is the vascular bundle grown in leaf, and it is the structure of transmission and support. Vein are connected with vascular tissue in stem through petioles, and vascular bundles distributed in parts of blade through petiole. The coarse and distinct vein which is located in the center of the blade is the first vein, also known as the midrib and principal vein. The thin veins which are on both sides of midrib in the first time called secondary veins, also known as the lateral veins. The more thinner veins which are on both sides of the secondary veins are called tertiary veins, also known as small veins and fine veins. Veins on the blade presented various regular veins are called venation, and it included feathery vein , parallel vein , arc vein , palmate vein[7] and so on, as shown in figure 3.

(a)

(b)

(c)

(d)

(b)Half straight(c) Arching (d) Straight lines

Fig. 3. The type of leaf veins (a) Straight arching

A Leaf Veins Visualization Modeling Method Based on Deformation

343

Fig.3a-3f is the feathery vein ,Straight veins are the veins that all secondary and tertiary veins arrived at the top of the margin (Fig.3a); Half straight venation is the secondary veins which at its top have two branches inside the leaf margin, one branch reached the margin and the other connected with upper secondary veins (Fig.3b); Arching venation is the secondary veins that its top isn’t connected with the leaf margin, and arching bow line is that its top is connected with other secondary veins showing a series of arcs(Fig.3c); Straight lines arching are secondary veins which is bending upwards(Fig.3d).

4

Model Design of Veins

4.1

The Vein Generation Algorithm Based on the Deformation

4.1.1 The Generation of Primary Veins Generally the main vein is midrib vein that from the base of the blade to the apex of the blade. When using deformation parameter equation to simulate the blade geometry, the midrib is generated with the slender circular cone according to the plane of rectangular with u =0, and 0≤v≤1. as follows: R——radius of bottom of Conical.

x(u', v' ) = R(1.5 − v' ) cos(2 πu') y(u', v' ) = b y v'

(2)

z(u', v' ) = R(1.5 − v' ) sin(2 πu') 4.1.2 The Generation of Secondary Veins

Secondary veins originated in primary vein, and more thinner than primary veins. While spacing and direction are different in various secondary veins of leaves, there still some rules, and relatively single to the tertiary vein, and we can simulate through a series of curves. In this paper, the generation of secondary veins is that by controlling the density of the connections of the primary vein and the secondary vein, we can control density degree of secondary vein. And then connected the deformation of parameter equation of rectangular and the parameter equation of coin to generate the veins which the length is changing as the size of the leaf changes. The equations are as follows: x(u, v ) = 2(1.5 - v ) cos(2 πu) + Δxh (3) y(u, v ) = 220 v + Δy z(u, v ) = 2(1.5 - v ) sin(2 πu)

4.2

The Instance of Veins Deformation

Adopt the method described above, using Microsoft Visual C++ 6.0 to design program, through the examples below for a description of this method, However, The scope of this method is not limited to the following example.

344

D. Qi et al.

Step 1: In a blade as an example: According to the size of the actual blade, the location of the rectangular plane is disposed and the size of the rectangle, can determine the coefficient of planar parametric equation. ax ——Width of the rectangle, by ——High of the rectangle. ax=100 ,by=220. Step 2: Assuming that rectangular planar parametric equation is: x p (u, v ) = 100 u y p (u, v ) = 220 v

(4)

z p (u, v ) = 0

Step 3: Within the scope of the rectangle, generate initial midrib with slender cone, the cone bottom radius is 2. x(u' , v ' ) = 2(1.5 - v ' ) cos(2 πu' ) y(u' , v ' ) = 220 v '

(5)

z(u' , v ' ) = 2(1.5 - v ' ) sin(2 πu' )

And more initial lateral veins (obtained from the cone displacement and rotation) ; Step 4: In a deformation of the blade as an example for the following functions: (1) Deformation in the horizontal direction Δ xh = 50 usin(3 v π / 2.2) - 50 u v + 3 sin( π v )

(6)

(2) Deformation in the vertical direction Δy = 80 u(1 - v ) sin( π v / 1.2)

(7)

(3) The edge of the deformation generated serrated

Δxe = 6 u | sin(8 π v ) | +3 u | sin(4 π v ) |

(8)

Therefore, The blade geometry parameter equation is:

x' p (u, v) = 100 u + Δxh + Δxe y' p (u, v ) = 220v + Δy z' p (u, v) = 0

(9)

A Leaf Veins Visualization Modeling Method Based on Deformation

345

Step 5: The principal vein geometry parameter equation is:

x' (u' , v ' ) = x(u' , v ' ) + Δxh y' (u' , v ' ) = y(u' , v ' ) + Δy z' (u' , v ' ) = z(u' , v ' )

(10)

Among them: Δxh = x(u' , v ' ) sin(3 π v / 2.2) / 2 - x(u' , v ' ) y(u' , v ' ) / 440 + 3 sin( π y (u' , v ' ) / 220) (11)

Δy = 8 x(u' , v' )(1 - y(u', v' ) / 220) sin(π v / 1.2) / 10

(12)

In the same way, The geometry of secondary vein are also the same deformation. Step6: The blade and leaf veins are divided into multiple small plane, and judge whether it is visible, if it is visible, to calculate the light and fill in the corresponding color, and Generate realistic leaf. (1)Refer to figure below detailed description of this method. According to the simulation of leaf, shown in figure 4, and determined a rectangular plane, as shown in figure 5.

x p (u, v ) = 100 u y p (u, v ) = 220 v

(13)

z p (u, v ) = 0

Fig. 4. The real blade

Fig. 5. Rectangular plane

Determine the principle of the size and location of the rectangle is that make the left and right of blade and bottom boundary curve convenient to use the trigonometric functions and other simple function which is of the left , right and upper sides. (as shown in figure 6).

346

D. Qi et al.

Fig. 6. Determine the size of the blade

(2) Determining the rectangular horizontal deformation function. The right curves and the parameters of rectangles are nearly two-thirds periodic sinusoidal function



Δ x 0 = 50 sin(3 π v / 2.2)

(14)

In order to make the change of rectangular center axis and the other side, we have associated with that the sinusoidal amplitude and the U value. Is as follows:

Δx1 = uΔ x0 = 50usin(3π v / 2.2)

(15)

As shown in figure 7:

Fig. 7. Horizontal deformation for the first time

As a result of the type of the leaf is acuminate Also can be other type, but you should use other curve function), we can add this type curve function to make this leaf to acuminate. Δx 2 = - 50 u v

(16)

A Leaf Veins Visualization Modeling Method Based on Deformation

347

As shown in figure 8:

Fig. 8. Horizontal deformation for the second time

(3) Determining the rectangular vertical deformation function. Bottom curve and rectangle the parameter of the rectangle are nearly one-second periodic sinusoidal function: Δy = 80 u(1- v) sin(π u/1.2) (17) As shown in figure 9:

Fig. 9. Vertical deformation

(4) Determining the edge of leaf deformation function. As a result of the edge of the leaf is the edge of the tooth shape(Also can be other type, but you should use other curve function ,we can add this type curve function in the horizontal to make this leaf to increase the high frequency amplitude sine function.



Δx 3 = 3 u | sin(4π v) | +6 u | sin(8π v) |

(18)

348

D. Qi et al.

As shown in figure 10:

Fig. 10. Horizontal deformation for the third time

In addition. Not all are straight blades, have a plenty of curved, so that we can be small deformation the leaf of one cycle of the sine function.

Δx 4 = 3sin(π v)

(19)

As shown in figure 11.

Fig. 11. Horizontal deformation for the fourth time

(5) The deformation of parametric equation of rectangular plane :

x' p (u, v ) = 100 u + Δx 1 + Δx 2 + Δx 3 + Δx 4 y' p (u, v ) = 220 v + Δy

(20)

z' p (u, v ) = 0 (6) The generation of initial veins, Use long conical generated initial veins, according to the veins on the blade.

A Leaf Veins Visualization Modeling Method Based on Deformation

349

x(u' , v ' ) = 2(1.5 - v ' ) cos(2 πu' ) y(u' , v ' ) = 220 v ' z(u' , v ' ) = 2(1.5 - v ' ) sin(2 πu' )

(21)

As shown in figure 12:

Fig. 12. The initial veins

①control the U is zero and the V from zero to one ,and Generate midrib; ②On the midrib, through to the cone rotation and translation at different intervals to generate lateral veins on both sides of midrib. ③Generate the branch veins by recursive method on the top of the lateral veins. :

(7) Eventually the veins to generate The initial deformation of veins on the horizontal position is as follows △xh=△x1+△x2+△x4 (x(u’,v’)/ax Instead of the U, v y(u’,v’)/by Instead of the V) The following deformation in vertical position: △y=80u(1-v)sin(πu/1.2) (x(u’,v’)/ax Instead of the U, v y(u’,v’)/by Instead of the V) The veins of the parametric equation is:

用 用

x' ( u ' , v ' ) = x(u' , v ' ) + Δxh y' (u' , v ' ) = y(u' , v ' ) + Δy

(22)

z' (u' , v ' ) = z(u' , v ' ) (8)The steps of generation algorithm of leaf and leaf veins. Set up the deformation function of the edge and horizontal and vertical of the leaf; Changing the parameters of rectangular plane, the U from -0.5 to 0.5 ,the V from 0 to 1, access to the four vertex coordinates of one of the small planes; Deformation the four vertex coordinates using the formula (20); Use the formula (23) to calculate the light intensity of the small plane;

① ② ③ ④

I = I e + I d + I s = I a K a + I t (K d cos θ + K s cos n α)

(23)

350

D. Qi et al.

⑤Using polygon area filling method to calculate the coordinates within the polygon,

Z-buffer algorithm is the method to calculate whether the coordinates after Projection of each block are visible, if it is, filling the spot with the brightness of plane and the color of leaves. Otherwise ,filling with nothing. Repeating the steps of ~ until all the planes in the rectangle have been circulated. changing the parameter U, V in the conical surface from 0 to 1 respectively to get four vertex coordinates of the facet in the conical surface. Using the formula (20) to calculate the coordinates of the four vertices in a plane after deformation. Using the formula (24) to calculate the normal vector of the facet to determine whether the facet is visible, go to next step if it is.

⑥ ⑦ ⑧ ⑨

② ⑤

- y1)(z 3- z1)-(y 3- y1)(z 2- z1) - - - - - (24) - - - - - Determine whether the facet is visible设 (Xp,Yp,Zp) Setting (Xp, Yp, Zp) as a proa = (y 2

b = (z 2 z 1)(x 3 x 1) (z 3 z 1)(x 2 x 1) c = (x 2 x 1)(y 3 y 1) (x 3 x 1)(y 2 y 1)

jection direction and the velue is (0,0,–1) when it is orthographic projection.the plane is unvisible when a Xp + b Yp + c Zp > 0 and go to next plane,the plane is visible when a Xp + b Yp + c Zp < 0 and go to the next step ; using formula (23) to calculate strong light of planes.

⑩ ⑪Using polygon area filling method to calculate the coordinates within the polygon,

Z-bufferalgorithm is the method to calculate whether the coordinates after Projection of each block are visible, if it is, filling the spot with the brightness of plane and the color of leaves. Otherwise ,filling with nothing. Repeating the steps of until all the facets in the rectangle have been circulated. As shown in figure 13.



⑦~⑪

Fig. 13. Finally simulation effect

A Leaf Veins Visualization Modeling Method Based on Deformation

351

Each parameter in this paper: Ie——diffuse intensity of ambient light Ia——incident intensity of environmental light Id ——diffuse light intensity It——Incident light intensity emitted from a point source Kd ——constant of diffuse (0≤Kd≤1) ,It depends on the material of the surface of the object θ——The angle between the incident light and the surface normal Is ——intensity of specular reflected light Ks——constant of specular reflection,0≤Ks≤1 α——The angle between the vector of sight line and the vector of the reflected light n——Power, to simulate the spatial distribution of the reflected light, the smoother the surface is,the larger the n is (x1, y1, z1 ), (x2, y2 z2 ), (x3, y3, z3 ) ——three point coordinates of facet (a, b, c)——External normal vector of facets ;

5

Conclusions

The method of generation of plant leaf blades and veins this paper presents is based on rectangular deformation parameter equation and use it to the program of the parameter equation of the leaf vein, and this method is presented. Different methods can be used for simulation of diffent types of leaf shape and vein shape.Simulation of the main vein with serrated blade and a half straight vein is based parameters of blade, which used the central axis of the rectangular before deformation and the process of simulation is very simple.Secondary veins is based on the tapered parametric equations applied with deformated blade function ,fitting out the corresponding transformation function which is easily to control the shape of the veins.This method works well on the generating of rather sleek secondary veins. The leaf and vein produced by the modeling of vein visualization based on deformation simple to calculate and better to simulation. However, the effect of this idea of deformation is limited to the first and secondary veins. The future work can be concentrated on simulating tertiary vein with a certain thickness especially outstanding the 3D visualization effects. Acknowledgements. This work was funded by Department of Education Project of Jiangxi Province of China (2013 GJJ13461)

References [1] Runions, A., Fuhrer, M., Lane, B., et al.: Modeling and visualization of leaf venation patterns. In: Computer Graphics Proceedings. Annual Conference Series, pp. 702–711. ACM SIGGRAPH, Los Angles (2005)

352

D. Qi et al.

[2] Wen-zhe, G., Jin, W.-B., Zhang, Z.-F.: Leaf venation generation method based on Voronoid iagram. Journal of Computer Applications 2(6), 309–312 (2009). (in Chinese) [3] Wen-Zhe, G., Wen-Biao, J., Zhi-Feng, Z.: Improved method for modeling leaf venation patterns. Computer Engineering and Applications 46(21), 242–245 (2010).(in Chinese) [4] Yan-Feng, X., Cai-Hong, W., Zhi-e G., Li Q., Liu, K.: Interactive Generating Veins Based on Particle System. Software Engineer (4) (2014) [5] Zhou, N.: Visualization modeling of the 3D-plant leaves based on deformation. Nan Chang. East China Institute of Technology (2013) (in Chinese) [6] Ling, L., Yang, X., Wang, L.: Research on Visualization Model for Translucent Flower. Transactions of the Chinese Society for Agricultural Machinery 41(3), 173–176 (2010). (in Chinese) [7] Qin, H.Z., Wu-Sheng, Z.: Identification Plant Materials Evidence, pp. 4–10. Southeast University Press (2007) (in Chinese)

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks Zixi Jia(), Linzhuo Pang, Jiawen Wang, and Zhenfei Gong College of Information Science and Engineering, Northeastern University, Shenyang, China [email protected]

Abstract. The motivation of the paper is to position and track moving targets in real time by means of wireless video sensor networks (WVSNs), and with the aid of selecting and associating information from multi-view videos to obtain the trajectory of targets in the world coordinates system (WCS). The whole process includes camera calibration, target localization, tracking trajectory in WCS, time synchronization calibration of multi-view videos and data fusion. The innovation point of the paper is that aimed at time synchronization phenomenon of multi-view videos while associating data, we propose a new scheme of information selection and recognition. For validating the effectiveness of the scheme, the comparison of the real trajectory and the estimated one is conducted through MATLAB simulation, and the proposed scheme has a satisfactory performance. Keywords: Camera calibration · Target positioning · Tracking trajectory in WCS · Time synchronization calibration · Data fusion of multi-view videos

1

Introduction

Because of the limitations of traditional wireless sensor networks, such as the low precision and limited abilities, WSNs haven’t been applied widely. However, WVSNs will completely change the current applications of WSNs. It has the ability to transmit and process images, videos, infrared imaging signal and some other high-dimensional information, which makes WVSNs become one of the most reliable and effective solutions to bottleneck problems of WSNs. [1,2] At present, although domestic and foreign scholars have done a lot of research on WVSNs technology, the research on targets localization and tracking is still insufficient. For example, the time nonsynchronization phenomenon in multi-sensor data fusion hasn’t been given a practical solution, which is not available for us to obtain information of target’s real-time locations. To solve this problem, the authors begin to carry out research on target localization and tracking based on WVSNs. After locating and tracking moving target using the algorithm based on Gaussian mixed model (GMM) and restoring its trajectory in the world coordinates system (WCS) from video information, we start to study methods of data fusion for multi-view videos, and propose a new data fusion algorithm based on synchronization calibration of multi-view videos. The core idea of this © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 353–365, 2015. DOI: 10.1007/978-3-662-47791-5_40

354

Z. Jia et al.

algorithm is to do synchronization of data from multi-view videos, and get real-time world trajectory of moving targets through data fusion. Finally, experimental results show that after data fusion, the restored world trajectory has a high similarity to the real track. This technology has great significance in promoting the application of WVSNs in production and daily life. For example, it can not only remote monitor, but also monitor industrial processes, even monitor conditions of a mine and forecast dangerous situation, which greatly frees the labor force.

2

Related Work

Many scholars and institutes have got a lot of achievements in WVSNs. [3] In the aspect of target positioning and tracking: Deyun Gao, Wanting Zhu and other people propose target positioning algorithm HLS combining CGLS and FGLS, and change the tracking problem into a series of positioning problems, accordingly propose target tracking algorithm HTS based on HLS. [4] However, the realistic result may be very different because the state of motion may change a lot. In the aspect of merging data of multi-view videos: classic optimal Kalman filtering has great calculated performance in observing geometrical information, dynamical model and statistical information. However, it’s hard to meet the practical needs. Inaccuracy models and inappropriate noise statistics can lead to image distortion of the result [5]. Self-adaption Kalman filtering appears to solve the problem. EscamillaAmbrosio and others propose a multi sensor data fusion algorithm based on adaptive Kalman filter sand fuzzy logic performance assessment. [6] The algorithm adjusts Q and R’s value using fuzzy logic in order to correspond with estimated value of covariance better. [7] However, because of its lacking objectivity, subjective factors affect a lot on the express and process of information. ChoiJN and others propose hybrid optimization of fuzzy inference system using hierarchical fair competition-based parallel genetic algorithm sand information granulation. [8] TaftiAD and others propose an adaptive Kalman filtering and fuzzy track fusion approach for real-time applications.[9] Due to the complementarities of the various methods, the organic integration of a variety of algorithms can often get better results than simply using one algorithm. Although multi-sensor data fusion technology has been widely researched, there are still many problems existing in current data fusion research, such as the low robustness, the ambiguity of fusion, the time non-synchronization and so on. To solve the time non-synchronization, we propose a new data fusion algorithm. Firstly, it calibrates the synchronization of multi-view video information. Then we use a certain algorithm to fuse data. By verification of experiment, the data fusion algorithm can correctly restore the real-time position of moving target in real space, so we can locate and track moving target better.

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks

3

355

Camera Calibration

The camera is a main tool to get 3D original information in the machine vision system. In order to use the camera to do 2D or 3D inspection, you must determine the parameters of the camera. [10] At present, there are already a number of scholars who have studied camera calibration techniques, for example, Tsai presented a very practical two-step calibration method. In this paper, we combine Zhengyou Zhang calibration method with artificial calibration method, and obtain a simpler camera calibration method. According to the Zhengyou Zhang calibration method, we can know the relationships between the three coordinate systems: image coordinate system (ICS), world coordinate system (WCS) and camera coordinate system (CCS), then the following equations are easily obtained.

X w m11 + Yw m12 + Z w m13 + m14 − uX w m31 − uYw m32 − uZ w m33 = um34 X w m21 + Yw m22 + Z w m23 + m24 − vX w m31 − vYw m32 − vZ w m33 = vm34

(1)

Where (u, v) is the coordinate of point P in ICS, and ( , , ) is the coordinate of the same point P in WCS. If there are n known points on the calibration board, and their world coordinates ), (i = 1,2, … … , n), and image coordinates are ( , ), (i = are( , , 1,2, … , n).Then we have 2n linear equations on matrix .

 X w1 Yw1  0 0      X wn Ywn  0  0

Z w1 0

1 0

0 X w1

0 Yw1

0 Z w1

0 −u1 X w1 1 −v1 X w1

−u1Yw1 −v1Yw1

Z wn

··· ··· 1

··· ··· 0

··· ··· 0

0

0 −u1 X wn

−u1Ywn

0

0

X wn

Ywn

Z wn

1 −v1 X wn

−v1Ywn

m11   u1m34      m12   v1m34  m13  ···  −u1Z w1      m14  ···  (2) −v1Z w1      m21 ···        × m22  = ···   m  ···   −u1Z wn   23    m24  ···  −v1Z wn      m31  ···  m  un m34   32    m33  vn m34 

)is the coordinate of the i-th , , Wherein, is called projection matrix,( point in space,( , , 1) is the image coordinates of the i-th point, m is the i-th row jth column element in of projection matrix . Given the world coordinates and image coordinates are known, we can obtain matrix and all the internal and external parameters of the camera: Firstly, measure the calibration plate precisely and manually on its grid to obtain their world coordinates, and then use the image coordinates of these points to get matrix .Part of the matrix can be obtained through m=(KT K)−1 K T U ,then take m = 1[11] we can get the camera projection matrix .

356

4

Z. Jia et al.

Locating and Tracking Targets in Video

In this paper we use Gaussian Mixture Model (GMM) to locate and track the moving target. In order to establish a simulation experimental platform to verify this method, we use four wireless cameras, a 1m * 1m calibration plate, an elliptical orbit, a toy train and other equipments to build an experimental platform. Given the videos from the four cameras, we use the GMM to locate and track the target. The results show that the GMM can accurately locate and track the moving target, tracking results are shown in Fig.1. Fig.1 (a) shows each camera's visual range; Fig.1 (b) is frame image of a single camera; Fig.1 (c) shows the extracted moving target from image; Fig.1 (d) is the image after de-noising processing; Fig.1 (e) shows an image in the process of real-time locating and tracking; Fig.1 (f) is the movement trajectory of center position coordinates of a green box, showing almost the same as the real trajectory of image coordinates. Foreground

Video Frame

a.

b.

Clean Foreground

c. 250

200

150

100

50

0

-50

d.

0

50

100

150

e.

200

250

300

350

f.

Fig. 1. The process of target localization and tracking

5

Restore the 3D World Coordinates

As equation (1) shows us, for a single camera, given image coordinates of a point P (u, v)and internal and external parameters of the camera, we can only obtain the ray equations where the point is in WCS and cannot determine the certain world coordinates. If point P is in the space which is in the visual overlapped range of two cameras, the image coordinates of this point in both of these two cameras are assumed as ( , ), ( , ).

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks

357

For the first camera:

For the second camera:

Here are four equations and three unknown variables. Any three equations of them can give us the coordinate( , , ).In order to maximize the use of all the useful observation information above, we use the least square method to solve equations and eventually obtain the world coordinate of point P in space. When the orbit of the target is on the same plane, the above problem can be simplified as "how to use information of single camera video to restore 3D world coordinates of a point." When the trajectory of the target is not in the same plane, using the parameters of any two cameras can get the world coordinates of the point P according to the above steps.

6

Multi-view Video Data Fusion

In practical applications, since there are some uncertain delays during the data transmission, which make multi-sensor system fail in time synchronization. [12] So there may be two image coordinates , corresponding to the same spatial point P after the above steps. In order to reduce this error, we need to determine whether frames of multi-view videos are synchronous or not, and find the synchronous one. This is called time calibration. Taking two-view video as example, we apply tracking algorithm based on the GMM to obtain the trace of the moving target, and get a collection of the moving target’s image coordinates for each camera, which can be represented by matrix [u1,v1;u2,v2;...].Selecting camera as a reference camera, we assume video information obtained by this camera is correct and smooth, and target coordinates is also credible. Another camera is defined as the second camera. During one motion circle, the position coordinate of the target is a single-valued function of time, and one camera’s delay time is usually less than the cycle time. In terms of two frames from two co-view videos, we can determine whether the two frames are synchronous by calculating distance between two targets and see if the distance is within the allowable range. However, sensors do not have the same viewpoint. If the train’s image coordinates( , )of camera C is mapped to camera C ’s image plane, we let the mapped image coordinate be ( , ), you can compare ( , ) with ( , ) which is observed by camera C and see if they are consistent with each other to check time synchronization.

358

Z. Jia et al.

There are two key steps to solve above problems. 1. When two cameras have different viewpoints, how to map the second camera’s image plane to the reference camera’s image plane? 2. After the mapping process, how to determine whether the two frames are synchronous by comparing their image coordinates of corresponding point? 6.1

The Mapping Relationships of Two Cameras

For two cameras C and C ,assume the relative position between the camera C and WCS can be expressed as matrix and , the relative position between the camera C and WCS can be expressed as , ,assume the rejection matrix of camera C is ,the 3 3 matrix on the left part of M is assumed as M , and the 3 1 matrix on the right part of is assumed as (i = 1,2). According to Zhengyou Zhang in Computer Vision, coordinates relationships of two cameras’ corresponding points can be expressed as follows [11] u2T [m]* M 21M11−1u1 = 0

wherein

[m]* =m 2 − M 21 M 11−1m1

(3)

(a point on camera C ’s image plane) and the two Given the image coordinates cameras’ parameters, we can get a linear equations on the image coordinates of corresponding point on camera C ’s image plane. This is called epipolar constraint. According to Fig.2, the corresponding point of p1 on camera ’s image plane is also on the line connecting point P and the origin of the camera coordinate system. So we can get point p1’s corresponding point on camera ’s image plane with above equations and the point is unique.

Fig. 2. The geometric relationship between the two cameras

According to above analysis, given the image coordinates of a point on a camera’s image plane, we can calculate the image coordinates of the corresponding point on another camera’s image plane. Then we can map the image coordinates of several cameras with different viewpoints to a reference camera’s image plane.

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks

6.2

359

Synchronization Calibration

This part mainly optimizes the synchronization of the current two videos, including finding out the synchronous frame and obtaining accurate time correspondence of multiple cameras. Generally, time non-synchronization is caused by the network transmission congestion, so the continuity will not be greatly affected, but only a little bit. Based on this consideration, we assume that the errors of synchronization are less than two seconds. Since there are 24 frame images per second, so there will be 49 frames to be tested. Then the key problem converted into this one: how to identify the synchronous frame of time step? Based on the above assumptions, to calibrate synchronization, we only need to compare the mapped coordinates of the 49 frames of second camera with coordinates of the reference camera. Then we select 48 frames before and after the certain frame to do the synchronization test. They are frame(i 24),…, frame(i 2), frame (i 1), frame(i), frame(i + 1), frame(i + 2),…, frame(n + 24).From all the pictures above, we can get the real-time coordinates of the target, they are ( , ),…, ( , ),( , ),( , ),…,( , ). Then we can obtain the trace by using least square method. Next, figure out the certain point (u, v) which is nearest to the trace, and consider it as the image point in the other camera which synchronizes with the i-th frame in reference camera. At last, we replace the image coordinate of the current frame in the other camera with the certain point’s coordinate (u', v').As shown in Table 1: Table 1. Image coordinates of moving target Frames(Time)

i-24

i-1

i

i+1

i+24

Target’s image coordinates in reference camera( )

u

,v

u ,v

u, v

u ,v

u

,v

Target’s image coordinates in second camera( )

u

,v

u

u ,v

u

u

,v

,v

,v

After finding the synchronous frame of other cameras at one time step, we record the moving target’s coordinates of every camera, let them be A ,A … (A is the image coordinates of moving target in reference camera ). Because of the uncertainty of time delay and different characters of every camera, we need to select these points further in order to get the most probable position of moving target at the certain time step. Here is the specific method: 1.

Judge distance sum from each certain point to all the other points: D0 = ((A 0 − A1 ) 2 + (A 0 − A 2 ) 2 + ...) D1 = ((A1 − A 0 ) 2 + (A1 − A 2 ) 2 + ...)

… 2.

When these image points’ distribution is relatively concentrated, we find out the most probable position by calculating the variance of D ,D ... D ,When the variance is less than a certain threshold value, we find a certain point in the inner

360

3.

Z. Jia et al.

region of all the determined image points which has the minimum sum distance to all the vertexes, and this point is the most probable position of the target. When the variance is larger than the threshold value, the point which has the maximum sum distance to all other points should be removed, and use the former method among the remaining points to find the certain point which has the minimum sum distance to all the vertexes and regard it as the moving target. This idea will be applied to all the frames during the recording, you can get the most probable position of moving target which takes time non-synchronization of multiple video sensors into consideration and integrate multiple visual information of cameras. It is a new way to restore the trace in WCS from videos.

7

Experiment Confirmation

7.1

Experimental Platform

We use a 1m 1m non-glare calibration board which can minimize the influence of light change, three wifi-cameras, an approximately circular orbit and a toy train sticked with a red ball whose color distinguishes from the color of the calibration plate. The three cameras are started at the same time. The record duration is 21s. 7.2

Locating and Tracking the Target

We use the algorithm referred in this article, program in MATLAB, and obtain the results of the moving target’s localization and tracking. Fig.3 (a) shows the frame after extraction of moving object; Fig.3 (b) is the image after de-noising. The green rectangle in Fig.3 (c) is for locating the target; Fig.3 (d) shows track of the green box’s center. In Fig.3 (d.) we can see that the trace of the box is approximately the same as trace of the moving target, so it proves the feasibility of the GMM. Foreground

Clean Foreground

a.

b. 500

400

300

200

100

0

-100 400

c.

500

600

700

800

900

d.

Fig. 3. Results of the moving target’s localization and tracking

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks

7.3

361

Camera Calibration

1. We choose one point on the calibration board as the origin of WCS, set the direction perpendicular to the board as z-axis and two side boundaries as x-axis and y-axis. That is WCS and we can get the world coordinate of every grid points. 2. Fixed the calibration board to ensure that it is in every camera’s ken, then we take photos of the board using three cameras at the same time. We choose one camera as an example to introduce the procedure of calibration. 3. According to the photos taken by the selected camera, we can get the image coordinates of every grid on the board. In our experiment, we choose 20 grids. Given the image coordinates of the 20 grids and the world coordinates measured in the first step, according to equation (2), we can get the projection matrix of the camera. We can figure out the projection matrices of other cameras in the same way.  0.198053 M1 =  −0.152255 −0.000681

0.338411

−0.201901 −0.006962 0.005354   1

−0.260159 0.155284 0.000554 −0.034545

 0.158617 −0.159932 −0.025036 −0.000863 M2 = −0.355075 0.358015 0.056129 0.001935   0.000426 0.000424 −0.034437 1   0.187135 −0.236917 −0.009604 −0.000331 M 3 = −0.354552 0.448869 0.018267 0.000629   0.000458 0.000350 −0.034285 1 

4. Then the world coordinates can be restored from video information. According to former steps, we already got the image coordinates( , ) of the moving target. We need to build at least 2 camera models to restore( , , )of one point on the image. In the experiment, the target moves in the same horizontal plane, so the z-axis is stationary. Based on this, only two unknown numbers are left, and two equations are enough to solve it. That is to say, we can use parameters of just one camera to get the target’s world coordinates. Taking one camera for example, we calculate the world coordinates of the space point in WCS according to its image coordinates. The video is 21-second long, so we have about 500 image coordinates( , )( = 1,2, … … ,500) of the moving target. We use the equation(2)to figure out the world coordinates( , , ). In addition, coordinate of z-axis is stationary. Because of this, we only need to plot ( , ) of world coordinates. Fig.4 (a) is the restored trajectory of world coordinates of camera 1;Fig.4 (b) is the restored trajectory of world coordinates of camera 2; Fig.4 (c) is the restored trajectory of world coordinates of camera 3; Fig.4 (d) is the restored trajectories of world coordinates of three cameras.

362

Z. Jia et al. 2000

1400

1800

1200 1600

1000 1400

800

1200

600

1000 800

400

600

200 400

0 200

-200

0 -100

0

100

200

300

400

500

600

700

0

500

a.

1000

1500

2000

2500

b. 2000

1600 1400

1500

1200 1000

1000

800 600

500

400 200

0

0 -200

0

200

400

600

800

1000

1200

1400

1600

1800

c.

d.

Fig. 4. Restored trajectories of world coordinates of 3 cameras

Combining data of Camera 1, 2, 3 in one coordinate system, we can see that two of them have a little difference, but another one differs a lot from others, so it’s hard to confirm the exact spatial position of moving target in every time step. 7.4

Multi-view Video Data Fusion

In order to merge restored traces of 3 cameras and get the exact spatial position of moving target in every time step, we should first do synchronization, then data fusion. Specific steps are as follows: We select camera 3 as a reference camera, the target’s image coordinates of the two other cameras are all mapped to the reference camera’s image plane, then we plot the image coordinates of the target on the reference camera and the mapped image coordinates of other two cameras in the same coordinate system, Fig.5 (a) shows the image coordinates of the target of three cameras; Fig.5 (b) shows three world coordinate trajectories restored from three cameras; Fig.5 (c) is the fusion of world coordinate trajectories.

a.

b. Fig. 5. Fusion of world coordinate trajectory

c.

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks

363

We match data of camera 3 in every time step with the other two cameras’. For every frame of camera 3, we can find the synchronous frame among 49 frames of camera 1, the same as camera 2, and we regard the synchronous frame of the measured camera as the frame which is time-synchronized with the reference camera, every frames can be handled in the same way until the last frame, then we can get a new frame sequence of both camera 1 and camera 2. Thus the time synchronization calibration is completed. Given all the parameters of camera 1 and 2, we can restore all their image coordinates of the moving target to the world coordinates, and draw the three world coordinate trajectories in one coordinates system as Fig.5 (b). Next is the data fusion. Since the three world coordinate trajectories of the target are in the same coordinates system, we can’t get real-time target location. In order to merge the three world coordinate trajectories into one trajectory track, we need to determine the target’s position of a maximum probability in the world coordinates according to the three world coordinates of every time step. Because three points uniquely identify a triangle, and among all the points in the triangle, the quadratic sum of the distance between center of gravity and every vertex of the triangle is a minimum, so we take the focus of the triangle as a most probable position of the moving target. The final fusion of world coordinate trajectory is shown in the Fig.5(c.) 7.5

Comparison

Finally, we compare the fused data with the real coordinates of the moving target and plot the deviation as Fig.6 shows. Fusion Error 100

80

pix els

60

40

20

0

0

100

200

300

400

500

frames

Fig. 6. The deviation between fused data with the real one

From Fig.6 we can see that the fused data has a high degree of similarity to the real one, which demonstrates the feasibility of the data fusion method.

8

Conclusion

In this paper, we propose a series of intelligent algorithms, which enable us obtain real-time world coordinate trajectory of moving target in conditions of WVSNs.

364

Z. Jia et al.

In addition, we conducted the experiments on the platform developed by us, and experimental results demonstrate the feasibility of the algorithm. This technology is a great complement to traditional video surveillance means, for example, it can be used in detection monitoring in industrial processes, which has very good prospects for development. Acknowlegement. Supported by National Natural Science Foundation of China (No. 61273078, 61471110), China Postdoctoral Science Special Foundation (No.2014T70263), China Postdoctoral Science Foundation (No. 2012M511164), Chinese Universities Scientific Foundation (N130404023, N110804004 N140404014), Liaoning Doctoral Startup Foundation (No. 20121004), Foundation of Liaoning Educational Commission (L2014090).



References 1. Hao, W., Lin, C., Ren, F.: QoS Architecture in WSNs. Computers 32(3), 432–440 (2009) 2. Phan, K.T., Vorobyov, S.A.: Network lifetime maximization with node admission in wireless multimedia sensor networks. IEEE Transactions on Vehicular Technology 58(7), 377–386 (2009) 3. Tezcan, N., Wang, W.: Self-orienting wireless multimedia sensor networks for occlusionfree viewpoints. Computer Networks 52(13), 2558–2567 (2008) 4. Gao, D., Zhu, W., Xu, X., Han-Chieh, C.: A Hybrid Localization and Tracking System in Camera Sensor Networks. International Journal of Communication Systems (2012) 5. Huang, M., Fan, S.C., Zheng, D., Xing, W.: Advances in multi-sensor data fusion technology. Transducer and MicrosystemTechnologies (2010). Doi:10.13873/j.100097872010.03.015 6. Smith, D., Singh, S.: Approaches to multi sensor data fusion in target tracking: A survey. IEEE Transactions on Knowledge and Data Engineering 18(2), 1696–1710 (2006) 7. Escamilla-Ambrosio, P.J., Mort, N.: Multi sensor data fusion architecture based on adaptive Kalman filters and fuzzy logic performance assessment. In: Proceedings of the Fifth International Conference on Information Fusion, Annapolis, pp. 1542–1549 (2002) 8. Choi, J.N., Oh, S.K., Pedrycz, W.: Identification of fuzzy relation models using hierarchical fair competition-based parallel genetic algorithms and information granulation. Applied Mathematical Modelling 33(6), 2791–2807 (2009) 9. Tafti, A.D., Sadati, N.: Novel adaptive Kalman filtering and fuzzy track fusion approach for real time applications. In: Proceedings of the 3rd IEEE Conference on Industrial Electronics and Application, pp. 120–125, Singapore (2008) 10. Shi, Q., Comaniciu, C., Wang, D., Tureli, U.: Cross-layer MAC design for location-aware wireless sensor networks. International Journal of Communication Systems 24(7), 872–888 (2010) 11. Dhurandher, S.K., Obaidat, M.S., Gupta, M.: Providing reliable and link stability-based geo casting model in underwater environment. International Journal of Communication Systems 25(3), 356–375 (2012) 12. Li, Z., Li, R., Wei, Y., Pei, T.: Survey of localization techniques in wireless sensor networks. Information Technology Journal 9(8), 754–1757 (2010) 13. Wang, Y., Shi, P., Li, K., Chen, Z.: An energy efficient medium access control protocol for target tracking based on dynamic convey tree collaboration in wireless sensor networks. International Journal of Communication Systems 25(9), 1139–1159 (2012)

Research on Tracking Mobile Targets Based on Wireless Video Sensor Networks

365

14. Gao, D., Yang, O., Zhang, H., Chao, H.C.: Multi-view routing protocol with unavailable areas identification in wireless sensor networks. Wireless Personal Communications 60(3), 443–462 (2011) 15. Khan, A.R., Madani, S.A., Hayat, K., Khan, S.U.: Clustering-based power-controlled routing for mobile wireless sensor networks. International Journal of Communication Systems 25(4), 529–542 (2012) 16. Kil-Woong, J.: Meta-heuristic algorithms for path scheduling problem in wireless sensor networks. International Journal of Communication Systems 25(4), 427–446 (2012) 17. Alippi, C., Boracchi, G., Camplani, R., et al.: Detecting External Disturbances on the Camera Lens in Wireless Multimedia Sensor Networks. IEEE Transactions on Instrumentation and Measurement 59(11), 2982–2990 (2010) 18. Tezcan, N., Wang, W.: Self-orienting wireless multimedia sensor networks for occlusion-free viewpoints. Computer Networks 52(13), 2558–2567 (2008) 19. Yuanyuan, T., Yunhui, Z.: Qinling Tan studies on CCD camera calibration 20. Ma, S., Zhang, Z.: Computer Vision (2008) 21. Time synchronization in multi-platform and multi-sensor data fusion process. Fire Control and Command Control 32(11) (2007)

Paper Currency Denomination Recognition Based on GA and SVM1 Jian-Biao He, Hua-Min Zhang(), Jun Liang, Ou Jin, and Xi Li Computer Science and Technology Department, College of Information Science and Engineering, Central South University, Changsha 410083, China [email protected]

Abstract. SVM is a new general learning method based on the statistic learning system which can be used as an effective means to process small sample, nonlinear and high dimensional pattern recognition. This paper did research on the learning algorithm of support vector machine, extracted characteristic data of banknote which is on account of PCA according to the characteristics of the support vector machine (SVM), and proposed to put support vector machine (SVM) into banknotes denomination recognition by combining SMO training algorithm with one-to-many multi-value classification algorithm. Besides, this article used genetic algorithm in parameters optimization such as the punishment coefficient C of soft margin SVM and the width parameter of Gaussian kernel function. The ultimate purpose is to recognize the denomination of banknote efficiently and accurately. The experimental results verified that this kind of recognition method increases the recognition accuracy up to 90% or more. Keywords: SVM banknote denomination recognition · GA Gaussian kernel function · Punishment coefficientc

1

Introduction

The content of banknote recognition research includes banknote denomination recognition, orientation recognition, authenticity recognition, serial number recognition, etc. The main purpose of this article is to seek out a solution to the banknote value recognition. Being a typical pattern recognition, the process of denomination recognition is made up with four steps, they are information acquisition, preprocessing, feature extraction and selection, and classification decision [1]. At present, there are three kinds of dominating ways to recognize the paper currency denomination, which are listed as following, (1) Notes denomination recognition method based on template matching. (2) Banknotes denomination recognition method based on neural network. (3) Notes denomination recognition method based on support vector machine (SVM). The shortcoming of template matching method is that the amount of internal storage and calculation is gigantic because of a large number of data processing. The anti-interference ability of template matching method is poor. As for the second method, a Fund project: The National Natural Science Foundation of China under Grant No.61272147. © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 366–374, 2015. DOI: 10.1007/978-3-662-47791-5_41

Paper Curreency Denomination Recognition Based on GA and SVM

367

lot of training sample is neeeded to neural network method in the way finding classsification function, and the neecessary characteristic quantity is excessive, which meeans complicated calculation is hard h to avoid, and over-fitting is very easy to come aboutt. On the strength of struccture risk minimization principle, support vector machhine (SVM) that has stronger theoretical basis and better generalization was bbeen brought up. In addition to using the theory of support vector machine (SVM), this article used the method baseed on PCA to get the banknote’s features and did parameeter optimization by adopting genetic algorithm to improve recognition accuracy. T The experimental results confirm that this method is effective to avoid the “over-fittiing” problem in neural network way, and the recognition accuracy is higher than BP, L LVQ and PCA models apparentlly. In the case of searching optimization using the puniishment coefficient C in GA algorithm a and the width parameter of Gaussian kernel fuunction, the recognition accuracy of banknote recognition is up above 90%.

2

Image Preproceessing

2.1

Edge Detection

The first step of preprocesssing image is make image grizzled processing, and thenn do edge detection. Edges are im mportant foundations of images include graphical segm mentation, texture feature extracction and shape feature extraction, mainly exist betweenn in target and target, target and d background, region and region [2]. The edge detectionn of paper currency image is to t separate the four edges of the banknote from baackground image accurately in the original figure collected, measure the four straight lline of edges for paper currency y tilt correction by Hough transformation, detect the eddges using Sobel operator. The final fi image after edge detection is shown in Fig.2.

Fig. 1. The original imag ge

2.2

Fig. 2. The image after edge detection

Lean Correction

Because of the influences such s as the angle between camera lens and paper curreency while taking photos, as welll as the swing of the camera lens, the tilt of paper curreency image relative to the horizo ontal plane is likely to happen. It is necessary to do soome adjustment and correction to t the slant image of the banknote.Through edge detectiion, the paper outline is apparen nt whose four sides are all lines. Hough transformationn is

368

J.-B. He et al.

considered in detecting the straight lines. After that, do rotate correction of the anglee of lines to horizontal direction n. The final image profile is shown in Fig. 4.

Fig. 3. Image after lean correction

Fig. 4. The profile of image after lean correctiion

3

Paper Currency y Recognition Based on SVM and GA

3.1

The Classification M Model of SVM

The principle of SVM is seeparate the two types of sample points in plane correctlyy by classification hyper-plane method, and get maximum edge. That problem cann be ascribed to the solution of a quadratic equation problem whose mathematical form iis as Eq.1.

1 Φ( ω , b ) = 2 ω Minimize

2

(1)

The constraint is as Eq.2 2.

(

)

yi ω T ∗ xi + b − 1 ≥ 0 i = 1,2,3,.......n

(2)

The ω that meets the co onstraints condition is the normal vector of the optimal separating hyper-plane. The ttarget function is strictly concave-up quadratic form whhile the constraint function is cconcave-down. It is a strictly convex programming. In accordance with the solution to a convex quadratic programming in the theory of ooptimum. That problem can be translated into Wolfe dual problem as Eq.3.

(3) The constraint is as Eq.4 4 (4)

α i is Lagrange multiplieer of sample points xi . According to the Kuhn-Tunker ccondition, the Invalid constraintts corresponding Lagrange multiplier is 0, only samples tthat

Paper Currency Denomination Recognition Based on GA and SVM

369

α i ≥ 0 work

for classification. Those are called support vector. Classification rules only decided by the minority of support vector which are just at the edge of hyper-plane, and has nothing to do with the other samples. However, There are many linearly inseparable situations under the actual situation. One solution proposed by Cortes&Vapnik1 who introduce non-negative slack law in the conditional expression, ξi ≥ 0 , i = 1,2,3,....... n , at this time, constraint becomes

(

)

yi ω T ∗ xi + b ≥ 1 − ξi i =1,2,3,.......n

(5)

Classification hyperplane that allows wrong separating is called linearly soft interval classification hyperplane. Because it allows the existence of fault samples, the soft interval classification hyperplane at this moment is the largest classification interval of hyperplane after weeding out the wrong separated samples. Meanwhile, the target function turned into Minimize

Φ (ω , ξ ) =

n 1 2 ω + C ξi 2 i =1

(6)

The penalty parameter C controls the penalty coefficient for wrong separated samples. The dual problem of linear soft interval classification hyperplane is the same as linear separable target function. The only difference is the constraints need to be translated into 0 ≤ α i ≤ C . In SVM, The parameter C controls the degree of punishment for wrong separated samples. The selection of C impact Generalization ability of SVM directly [3]. According to the pattern recognition theory, low dimensional space linearly inseparable mode might be linearly separable by nonlinear map to high-dimensional feature space, however, if use the technology to classify directly or regress in the high-dimensional space, there exist some problems such as the form, parameter and feature space dimension of certain nonlinear mapping function. The biggest obstacle is the dimension disaster when operating in high dimensional feature space. Using Kernel function technology can solve that problem effectively, avoid the problem of changing calculation in higher dimensional and greatly simplify the problem[4]. Assuming that x ,z X,X Belongs to R(n)space. The Nonlinear function Φ make the input X mapping in the feature space F. F belongs to R(n), n is inner product, K (x, z ) is Kernel function. As can be seen from Eq.7, Kernel function turns the inner product of high m dimensional space into Kernel function calculation of low n dimension input space, and solves the dimension disaster problem in high-dimensional feature space skillfully, lays the theoretical foundation solving complicated classification or regression problem in high-dimensional feature space. According to the Hilbert-Schmidt theory, a Kernel

370

J.-B. He et al.

function K(x, z)which meets the Mercer conditions can be used as the inner product operation here. At present, the more commonly used kernel functions are mainly three categories: (1) The linear kernel function. (2) Polynomial kernel function. (3) Gauss kernel function. The expression of Gauss kernel function is as Eq.8 K ( x − xc

) = exp  − 

x − xc ^ 2

 2 * σ ^ 2 

(8)

xc is the center of Kernel function, σ is the Width parameter of function, controlling the radial range of action. The training algorithm use SMO (Sequential minimal optimization) algorithm which can be said to be a special case of the decomposition algorithm whose work aggregation only have one sample. The advantage of training algorithm is that it have analytical solution form for quadratic programming problem of multiple samples, and avoid numerical instability and time-consuming problem in the case of problem having diversity samples, at the same time, the training algorithm doesn’t need large matrix storage space, suits for sparse sample especially[5]. The selection of work aggregation is not traditional steepest descent method, but in heuristic way, looking for samples to be optimized by two nested loops, selecting another sample in the inner loop, completing the first optimization. Then recycle, and carry on the next optimization until all the samples are satisfied to the optimal condition. The algorithm mainly consumes time in optimal condition judgment. Therefore, looking for the most reasonable, i.e. minimal computing cost optimal condition is necessary. Adopting one-versus-rest method (OVR SVMs). Turn a type of sample as a class successively during training, the other remaining samples belong to another class. In this way, k category samples constructed k SVM. Classify the unknown samples with maximum classification function of the class in the process of classification. 3.2

Feature Extraction and Selection Based on PCA

In banknote recognition, it would be best to find the key original feature subset, reduce unnecessary feature calculation and resources cost, rather than get original all feature mapping. Principal Components Analysis (PCA) is one of the more commonly used Dual Sensor algorithm recently. PCA map paper currency image to the feature space which set a good characterization of training images. The defect of PCA banknote recognition is that all the characteristics of the original space map to lower dimensional feature space, it is the feature subset based on best descriptive. This paper applied a new feature selection method based on PCA, which combine feature selection and feature extraction, carry on feature selection in feature space. The original features of selected paper money focus the most critical features[6]. PCA transform input data vector

xi into a new vector si by

si = U T xi

(9)

Paper Currency Denomination Recognition Based on GA and SVM

371

U is an orthogonal matrix . U i , the i columns, is the i feature vectors of the sample Covariance matrix  , writen as Eq.10 and Eq.11

=

1 M  ( xi − u )( xi − u)T M i =1

u =

1 M

(10)

M

x

i

i =1

(11)

In the equation (10), xi is the i th training sample images, u is the training sample mean vector, M is the total number of training samples. Making

A = [Φ 1 , Φ 2 , Φ 3 ,, Φ M ] , Φ i = xi − u . Covariance matrix can be expressed as

Eq.12 ∑=

AT ∗ A

(12)

In order to get the eigenvalue and orthogonal normalizing feature vector of N ∗ N dimension matrix (N is the characteristics dimensions of the sample) ∑, too large amount of calculation is needed if doing calculate directly. Nonzero eigenvalues and nonzero eigenvalues corresponding eigenvectors of A T A could be gotten by solving nonzero eigenvalues and nonzero eigenvalues corresponding eigenvectors of AA T . Here obtaining the matrix composed by nonzero eigenvalues of ∑, . it is arranged by descending order. r is the T

T

order of A A or AA .

p i ,the corresponding eigenvectors of λi , makes up unit of

P = [ p1 , p2 ,…… pr ] ⊂ R N *r . Pack up the front k vectors of P = [ p1 , p2 ,…… pr ] ⊂ R N *r and get the feature space N *k U = [u1 , u2 ,……uk ] ⊂ R , calculate projection in the feature space of a picture

orthogonal matrix

xi consequently: si = U T xi . 3.3

Optimization of Parameter C and σ

Relative to the choice of kernel function, to determine the parameters of the kernel function is more important. After choosing Gaussian kernel function, it is need to make sure σ(the width of the Gaussian radial basis function).σ is very sensitive to the performance of classifier [7]. Making Analysis on the performance of the Gaussian kernel support vector machine (SVM) while taking different σ. It is found that all of the training sample points are support vectors if 0→σ, and all of them can be classified correctly. However, the phenomenon of "overlearning" appears very easily, its

372

J.-B. He et al.

generalization ability is poor, and the rate of false recognition of test sample is relativly high. If ∞→σ, Gaussian kernel support vector machine (SVM) consider all samples as the same, both generalization ability and the correct recognition rate of the test samples are zero. In fact, when σ get the value smaller than the average distance between the training sample points very much, the result of 0→σ can be achieved. On the contrary, when σget the value larger than the average distance between the training sample points very much, the result of ∞→σ can be achieved [8]. In this article, the genetic algorithm is used to search out the most appropriate function parameters. In the view of biological genetics, genetic algorithm (GA) integrates the concept of the survival of the fittest and the idea of random information exchange. The evolution of the population is realized by mechanisms such as natural selection, chiasma and variation. In the process of optimization, the genetic algorithm produce multiple starting points in the solution space, do searching at the same time. It is a search algorithm whose search dierction is guided by the fitness function that is able to seek for the most optimized global solution in a complex search space. In determining the width σ adopting genetic algorithm (GA) and chiasma verification, initialize the population firstly, which means give a set of initial values for σ. Then, do SVM training for each σ, caculate the false rate of classification respectively, and choose highest rate σ as the ultimate width of Gaussian kernel function[9]. In caculating the accuracy rate of classification, K-fold Cross Validation is adopted. The original datas are divided into k groups(dicided equally generally). Put each subset data as validation set, the rest of the k-1 subsets as the training sets. It will get k models whose accuracy rate of classification in their final validation sets is the accurate performance indicators of K-fold Cross Validation. The most optimal punish coefficient c can be determated at the same time. Figure 5 is the fitness curve of evolution algebra in parameter optimization with the remote transmission algorithm.

Fig. 5. The fitness graph

Paper Currency Denomination Recognition Based on GA and SVM

4

373

Test Results and Analysis

The following article will take training test for the punish coefficient c selected by support vector machine (SVM) and genetic algorithm,as well as width paremater σ of Gauss kernel function. Then, draw a comparision with experience choise of c and σ. The experiment al hardware conditions is Pentium(R) Dual-Core PC and limsvm based on matlab. The banknote used in this article is Rand of South Africa in 2005 whose denomination is 100,50,20 and 10 respectively. The number of each denomination is 60 pieces. In denomination recognition, each banknote has two sides pros and cons, and every side has two directions. That means there exit four kinds of situations for each banknote denomination, making a total 4 * 4 categories for each paper currency combining denomination with orientation and side. Table 1. The result of experiment using genetic algorithm to parameter optimization Algorithm

GA

Train set

C

σ

Test set

160 160

7.6452e+003 5.1147e+003

1.5259e-006 1.9418e-006

80 80



Accuracy rate 91.25% 87.5%

140

7.2236 e+003

1.8610e-006

60

93.3%

140

8.5582 e+003

1.5368e-006

60

91.67%

Table 2. The result of experiment using PSO algorithm toparameter optimization Algorithm

PSO

Train set setset 160

C

σ

8.5024e+003

160 140 140

1.8011e-006

Test set 80

Accuracy rate rate 60%

5.5305e+003

4.8576e-006

80

62.5%

10000 10000

1.0000e-006 4.3125e-006

60 60

60% 63.33%

From table 1 and table 2, it can be found that the accuracy rate is higer by adopting genetic algorithms to optimization of C and σ. Because genetic algorithms operate offline, it has no influence on real-time response of the system even if the velocity of optimization is slow.

5

Conclusion and Prospects

This paper studied the banknote recognition based on SVM, especially the genetic algorithm that used to do parameter optimization of penalty coefficient C of SVM and σ(the width of the Gaussian kernel function). The adaptive genetic algorithm proposed in this paper provides a solution to SVM parameters selection. The experimental results confirm the feasibility and high efficiency of the scheme. From the simulation results, it is also apparent that the model adopting GA and optimized SVM is more accurate

374

J.-B. He et al.

compared with the empirical estimates SVM, and the former has stronger generalization ability. It has a certain value to application, which can be used in parameter optimization of other types of SVM. The direction of banknote recognition research mainly focus on speeding the velocity of multiclass classification and researching the training algorithm in the future. It is necessary to make a further step in optimization algorithm, so as to speed up the training, improve the ability of real-time processing, and raise recognition accuracy [10].

References 1. Zhang, E.H., Jiang, B., Duan, J.H., et al.: Research on paper currency recognition by neural networks. In: 2003 International Conference on Machine Learning and Cybernetics, vol. 4, pp. 2193–2197. IEEE (2003) 2. Ray, K.: Unsupervised edge detection and noise detection from a single image. Pattern Recognition 46(8), 2067–2077 (2013) 3. Onoda, R.G., Muller, T.K.: Soft Margins for AdaBoost. Machine Learning 42(3), 287–320 (2001) 4. Lin, C.-L., Qingyun, S.: Computer engineering, 11 (2003) 5. Liu, Y.-H., Chen, Y.-T.: Face recognition using total margin-based adaptive fuzzy support vector machines. IEEE Transactions on Neural Networks 18(1), 192 (2007) 6. Yu, C.-L.: Computer technology and development, 04 (2011) 7. Pei, Y.M.S.: The SVM parameters optimization research based on improved genetic algorithm. Computer Simulation 008, 150–152 (2010) 8. Lin, H.-T., Lin, C.-J., Weng, R.C.: A note on Platt’s probabilistic outputs for support vector machines. Machine Learning 68(3), 267–276 (2007) 9. Frontzek, T., Navjn, T, Eekmiller, R.: Predicting the nonlinear dynamics of biological neurons using support vector machines with different kernels. In: Proc of the International Joint Conference on Neural Networks, Washington DC 10. Mirza, R., Nanda, V.: Paper Currency Verification System Based on Characteristic Extraction Using Image Processing. International Journal of Engineering, 1

Simulation Research for Outline of Plant Leaf Qing Yang, Ling Lu(), Li-min Luo, and Nan Zhou Department of Information Engineering, East China Institute of Technology, Nanchang, China [email protected]

Abstract. We proposed a parameter model method for outline of plant leaf which combining the shape feature of the outline and planar rectangle. The proposed method using deformation function to define the outline of plant leaf and it included leaf apex, leaf base and leaf margin .The method combined these deformation functions to form different shapes of the outline. We give some examples of representative outline of plant leaf. The method provided recognition with high speed and efficiency according to the results. Keywords: Outline of plant leaf · Simulation function · Planar rectangle · Leaf apex · Leaf base · Leaf margin

1

Introduction

The shapes of plant leaves have many types and great differences, but there are some regulations in these shapes of plant leaves. This paper gives some examples of representative outline of plant leaf. In the beginning, the modeling of real plants pays attention to the structure of plant and the shape of framework, then to the details of the organ. In the simulation of leaf and petal, Lintermann B Deussen O and Ijiri T Owada S Okabe M et al. [1-2] performed the outline of plant leaf which is determined by the user interactively and most of the leaf margin generated are smooth; Ijiri Ti, Yokoo1 M ,et al [3] using biological mechanisms to process the flowering animation with stretch triangular mesh showed the petal and using every growing delta region to imitate the growing progress (the outline of petal is inputted by user);With the help of parameterized system-2Gmap L, Peyrat A, Terraz O, MerillouS, et al[4] get different shapes of plant leaf through a certain grammar; Based on the NURB curved surface, Xiaodong Liu et al.[5-6] get the plant leaf;Peiyu Qin and Lei Wang et al.[7-8] performed the petals and plant leaf with the Bezier curved surface; Wenzhe Gu et al.[9] uses B-spline curve to model the outline of plant leaf; Zhenjie Ma and Shenglian Lu et al.[10-11] get the outline of plant leaf with Bezier or B- spline curve, on that basis, it subdivided the outline to get the surfaces of the leaf. The given method based on freeform surface is not conducive to the serrate outline of plant leaf. Xiaoyu Chi et al.[12] described double-deck structure model to show the mechanical structure of plant leaf and performed the double-deck mass – spring model to show the changes of basic structure of plant leaf; Chang Liu et al.[13] developed the outline of plant leaf with leaf image throughing features extracted from the outline and get the fitting line; Jianwen Shu et al.[14] proposed the geometric modeling and visualization methods of © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 375–385, 2015. DOI: 10.1007/978-3-662-47791-5_42

376

Q. Yang et al.

leaf margin by the fitting of broken line; Lichen Wang and Yuanyuan Cao et al.[15-16] use image-based leaf simulation method; Ying Tang et al.[17] proposed mass – spring model with 3D deformation to simulate the geometrical morphologic changes of the process of withering, aging leaves. The initial states of leaves are also modeled by image-based techniques. Quan et al [18] proposed image-based geometric modeling method which is helped by images with different angles as well as 3D point cloud and reconstructed the geometric modeling of a single leaf through division correcting data; Ling Lu et al. [19-20] proposed a geometric model of plant petal which is performed through deformation. In summary, four kinds of methods about simulation for outline of plant leaf surfaces have been proposed: (1)Get the surfaces margin through interaction; (2)Use free-form surface or curved surfaces to simulate the outline of plant leaf; (3)Get the outline through the real leaf image with image methods; (4)Based on deformation methods and startting from simple regulation surfaces, use deformation to get the outline of plant leaves. This paper used the last method.

2

Morphological Characteristics of the Plant Leaf’s Outline

Leaves can be divided into simple leaf and compound leaf. This paper focuses on the shape structure of simple leaf. Simple leaf is the only leaf in one petiole, and the outline of plant leaf refers to the shape of simple leaf. Different kinds of plants have different shapes. As a consequence, the shape of plant leaves can be used as one of the important features to distinguish different types. Although there will be big differences in size of the leaves of the same plant, for most of the same species there are little differences in the shape of leaves. Knowing accurately about the shape of plant leaf has a big significance to the realistic effects of visualization model for plant leaf. The basic shape of the plant leaf can be divided into four types: oblong, oval, ovoid and obovate which are shown in Fig.1:

(a)

(b)

(c)

(d)

Fig. 1. The basic shape of leaf (a) oblong (b) oval (c) ovoid (d) obovate

Various leaf shapes that having particular characters with different types of leaf base, leaf apex and leaf margin fall into these four types basically. 2.1

The Types of Leaf Apex

Leaf apex that the other end of leaf connected with petiole is the top of leaf. There are many types of leaf apex in reality: acute type which is short and sharp(As shown in Fig.2a), attenuate type which is long and sharp( As shown in Fig.2b),

Simulation Research for Outline of Plant Leaf

377

attenuate-accuminate type which is long and sharp progressively, like the leaf of pipal tree (As shown in Fig.2c),obtuse type which is blunt but pointless (As shown in Fig. 2d),circular type which is approximately rounded (As shown in Fig. 2e),concave type which is shallow(As shown in Fig. 2f), emarginate type which is shallow and concave (As shown in Fig.2g), mucronate type with sudden small point (As shown in Fig.2h), truncatus type with flat top(As shown in Fig.2i) etc.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Fig. 2. The basic shape of leaf apex (a)acute (b)attenuate (c) attenuate-accuminate (d) obtuse (e) circular(f) concave (g) emarginate (h) mucronate (i) truncatus

2.2

The types of Leaf Base

Leaf base refers to one end of leaf connected with petiole directly. There are also many types of leaf base: acute type which the angle is less than 90 degrees (As shown in Fig.3a), cuneate type which the angle is less than 90 degrees (As shown in Fig.3b), obtuse type which the angle is greater than 90 degrees(As shown in Fig.3c), circular type which is approximately rounded(As shown in Fig.3d), decurrent type with extension turning to a point(As shown in Fig.3e), cordate type that the leaf base on both sides are shrink slightly(As shown in Fig. 3f), sagittate type that two lobes of leaf base pointing down sharply (As shown in Fig.3g), auriculate type that the shape is liking a drooping ear(As shown in Fig. 3h), truncate type that the shape is liking a flat edge (As shown in Fig.3i) etc.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Fig. 3. The basic shape of leaf base (a) acute (b) cuneate (c) obtuse (d) circular (e) decurrent (f) cordate (g) sagittate (h) auriculate (i) truncate

2.3

The Types of Leaf Margin

Leaf margin which refers to the shape of the whole edge of leaf is affected by the development of mesophyll and the distribution of leaf veins as well as other factors and presented a variety of shapes. They can be divided into the following types basically: entire type (smooth and no unevenness edge, also known as entire, as shown in Fig.4a), repand type (shallow and large wavy, as shown in Fig.4b),crimped type (deep and large wavy, as shown in Fig.4c),serrate type (small irregularities which were notched, as shown in Fig.4d), split type that included lobed, hemi-cleft, sudden-cleft, deep-cleft types in various degrees, and they were produced by inhibiting of the leaf(as shown in Fig. 4e).

378

Q. Yang et al.

(a)

(b)

(c)

(d)

(e)

Fig. 4. The basic shape of leaf margin (a) entire (b) repand(c) crimped (d) serrate(e) split

3

Simulation for Outline of Plant Leaf

The simulation has been separated into the following steps: (1) Using deformation techniques to perform the geometrical shape of leaf.(2) Define the initial outline of plant leaf with the parameters of the flat rectangular.(3) Deform the boundary by using different functions, and it provides the corresponding shape of plant leaf. Parametric equation[19-20] is as follows:

 x(u, v) = a x u + Δx(u, v)   y(u, v) = b y v+ Δy(u, v) z(u, v) = c z 

(−0.5 ≤ u ≤ 0.5,0 ≤ v ≤ 1)

(1)

Where : ax , by—length of plane in x, y directions cz —intercept(a constant) of plane in the z-axis △x, △y—boundary deformation function in x, y directions The focus of this paper is how to decide △ x and △ y. 3.1

Simulation for Basic form of Plant Leaf

(1) On the basis of the rectangle, rounding the upper and under portions (as shown in fig.5a):

Δy1 (u, v) = b y v+ v a x 2 / 4 − x 2 (u, v)

(2)

(2) On the basis of the rectangle, deforming the upper, under, left and right sides with sine function: Δx 1 (u, v ) = 2 uA 1sin(π v ) Δy 1 (u, v ) = v A 2 sin((u + 0.5) π) + (1 − v ) A 2 sin((u − 0.5) π)

(3)

Where: A1, A2 control the degree of curvature of the oval (3) It differs from the generation of oval, it is that the unsymmetric deformation in ydirection of left and right sine function. That is to say a certain displacement of the sine function, while upper and under sides of the sine function are also different(as shown in Fig.5cd): Δx 1 (u, v ) = 2 uA 1sin( π v + d 1 ) Δy 1 (u, v ) = v A 2 sin(( u + 0.5) π ) + (1 − v ) A 3 sin(( u − 0.5) π )

(4)

Simulation Research for Outline of Plant Leaf

379

where: A1 controls the degree of bending of the left and right sides. d1 controls the degree of symmetry of upper and under sides. A2, A3 control the degree of curvature of the upper and under circle.

(a)

(b)

(c)

(d)

Fig. 5. Simulation for the basic shape of leaf (a) Simulation for oblong (b) Simulation for oval (c) Simulation for ovoid (d) Simulation for obovate In the realistic plant leaves which belonged to the above four types. There are cases that upper and under, left and right sides are asymmetry. Hence , there still need concrete analysis of concrete conditions.

3.2

Simulation for Leaf Apex and Leaf Base

Analyzing the apex and base of the leaf we can get it which is summarized by the following deformation equation or function representation . (1) Linear equation h (5) |u|=0 △y3= 15|u3|sin(24π v)+v3sin(27π(u+0.5)) it can be generated in Fig.9b with upper method.

Simulation Research for Outline of Plant Leaf

(a)

383

(b)

Fig. 9. Simulation for ramie leaf margin (a) specimen image of ramie (b) simulation for outline of ramie

4.4

Alocasia

The shape of alocasia leaf is oval with repand type of leaf margin and the leaf apex is attenuate-accuminate type, leaf base is auriculate type(as shown in Fig.10a), the outline of the leaf is designed as follows: 1) Setting Rectangle: the height is 150 and the width is 100(as shown in Fig.10b). 2) At the top of the rectangle, we get a circle type through deformation with half of a periodic function, and attenuate-accuminate type throuth deformation with gaussian function. 3) In the both left and right sides, we use sine function to get the repand type (as shown in Fig.10c). 4) At the bottom of the rectangle, we get auriculate type of leaf base through deformation with half of a periodic function (as shown in Fig.10d). The final simulation results is shown in Fig.10e.

(a)

(b)

(c)

(d)

(e)

Fig. 10. The modeling process of alocasia leaves

(a)

(b)

(c)

(i)

(d)

(e)

(j)

(f)

(g)

(k)

Fig. 11. The modeling process of maple leaves

(h)

384

4.5

Q. Yang et al.

Maple

Maple is different from other leaves because of its divided leaves(as shown in Fig.11a). the outline of the leaf is designed as follows:1)The method is simulating one divided leaf by assuming the initial rectangle of the divided leaf that the height is 160 and the width is 80(as shown in Fig.11b). 2)Use sine function to get the initial obtuse type of leaf apex(as shown in Fig.11c). 3)and use two cycles of reversed (adding a minus sign)sine function with absolute value to get the effect of multi-serras (as shown in Fig.11d). 4)Use exponential function to projecting the sharp-pointed serra in the middle part(as shown in Fig.11e). 5)Use exponential function goes gradient direction to projecting serra on both sides(as shown in Fig.11f). 6)Use half a periodic sine and exponential function of to get the shape of leaf base and the divided leaf in the middle part(as shown in Fig.11g). 7)Slightly modify the shape of divided leaf in the middle of leaf apex by rotating the z-axis of 40°,and get the divided leaf in upgradiented direction(as shown in Fig.11h). 8)Get the divided leaf in right-gradient direction through horizontal mirror-transformation of divided leaf in left-gradient direction(as shown in Fig.11i). The method of performing two divided leaf in left and right sides are similar to the method of divided leaves in left-gradient and rightgradient directions(as shown in Fig.11j). 10)The last two divided leaves are oval type with attenuate-accuminate type of leaf apex and base. One is the mirroringtransform result of the other divided leaf(as shown in Fig.11k).

5

Conclusions

The model of the outline of plant leaves in this paper can basically simulate most types of leaf. From the simulation results of five types of leaves, this method is rather preferable. The method with high speed of modeling provides a reference for simulation of studying various of the whole leaf surfaces. The future work can be concentrated on subdividing and simulating various types of serrate type of leaf margin further more. Acknowledgements. This work was funded by Department of Education Project of Jiangxi Province of China(2013 GJJ13461)

References [1] [2]

[3]

Lintermann, B., Deussen, O.: Interactive modeling of plants. IEEE Computer Graphics & Applications 19(1), 56–65 (1999) Ijiri, T., Owada, S., Okabe, M., et al.: Floral diagrams and inflorescences : interactive flower modeling using botanical structural constraints. In: PPComputer Graphics Proceedings, Annual Conference Series, ACM SIGGRAPH, Los Angles, pp. 720–726 (2005) Ti, I., Yokoo1, M., et al.: Surface-based Growth Simulation for Opening Flowers. Graphics Inyerface, 227–234 (2008)

Simulation Research for Outline of Plant Leaf [4] [5]

[6] [7]

[8] [9] [10] [11] [12] [13]

[14] [15] [16]

[17] [18] [19]

[20] [21]

385

Peyrat, A., Terraz, O., MerillouS, et al.: Generating vastvarieties of realistic leaves with parametric 2Gmap L-systems. The Visual Computer 24(7/8/9), 807–816 (2008) Liu, X., Luo, Y., Guo, X., et al.: The Modeling of Maize Leaf During the Growing Process Base on NURBS. Computer Engineering and Applications 40(14), 201–204 (2004). (in Chinese) Liu, X., Cao, Y., Liu, G., et al.: The Modeling of Rice Leaf Based on NURBS. Microelectronics & Computer 21(9), 117–124 (2004). (in Chinese) Qin, P., Chen, C., Lv, Z., et al.: Simulation Model of Flower Using the Interaction of L-systems with Bezier Suifaces. Computer Engineering and Applications. 16, 6–8 (2006). (in Chinese) Wang, L., Ling, L., Jiang, N.: A study of leaf modeling technology based on morphological features. Mathematical and Computer Modeling 54, 1107–1114 (2011). (in Chinese) Wenzhe, G., Jin, W., Zhang, Z.: Leaf venation generation method based on Voronoi diagram. Journal of Computer Applications 6, 309–312 (2010). (in Chinese) Ma, Z., Jiang, Y.: Chinar Leaf Simulation. Computer Simulation 26(2), 221–224 (2009). (in Chinese) Lu, S., Guo, X., Li, C.: Research on Techniques for Accurate Modeling and Rendering 3D Plant Leaf .3 Journal of Image and Graphics 14(4), 731-737 (2009) (in Chinese) Chi, X., Sheng, B., Chen, Y., Enhua, W.: Physically Based Simulation of Weathering Plant Leaves. Chinese Journal of Computers 32(2), 221–230 (2009). (in Chinese) Liu, C., Dai, S., Guo, X., Shenglian, L.: Simulation and visualization of morphological feature in plant leaf margin. Computer Engineering and Design 30(6), 1435–1440 (2009). (in Chinese) Shu, J.: Study on Morphological Feature and Growth of Cucumber Leaf Margin. Research and Exploration in Laboratory 33(1), 56–88 (2014). (in Chinese) Wang, L., Huai, Y., Yang, G., Luo, D.: Research on Realism Leaves Modeling and Rendering in Virtual Botany. Computer Simulation 27(5), 204–208 (2011). (in Chinese) Cao, Y.: Plant Leaves Wilting Deformation Technology Research, pp. 1–64. Yan shan University Institute of information science and technology, Qin huang Dao (2013) (in Chinese) Tang, Y., Yang, K.: Research on Visualization of Deformation of Three – Dimensional Leaves. Computer Simulation 28(5), 250–253 (2011). (in Chinese) Quan, L., TanP, Z.G., et al.: Image-based plant modeling. ACM Transactions on Graphics 25(3), 599–604 (2006) Ling, L., Wang, L.: Visualization Modeling on Plant Petal Based on Plane Deformation. Transactions of the Chinese Society for Agricultural Machinery 39(9), 87–91 (2008). (in Chinese) Ling, L., Li, L.: Research on Simulation for Plant Flower Color. Journal of System Simulation 24(9), 1892–1895 (2012). (in Chinese) Qin, H.: Identification of Plant Materials Evidence. Southeast University Press (2007). (in Chinese)

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation Wenqi Lu1, Jinming Duan2, Weibo Wei1(), Zhenkuan Pan1, and Guodong Wang1 1

College of Information Engineering, Qingdao University, Qingdao 266000, People's Republic of China {ccluxiaoqi,allen_wgd}@163.com, [email protected], [email protected] 2 School of Computer Science, University of Nottingham, Nottingham NG7 2RD, UK [email protected]

Abstract. Image segmentation is to segment images into subdomains with same intensity, texture or color. Texture is one of the most important features to images. Because of the complexity of texture, segmentation of texture image is especially difficult and it seriously restricts the development of image processing. In this paper, a nonlocal Mumford-Shah (NLMS) model is proposed to segment multiphase texture images. This proposed model uses nonlocal operators that are capable of handling texture information in the image. In order to segment different patterns of texture simultaneously, multiple region partition strategy which uses n label functions to segment n+1 texture regions is adopted. Furthermore, to improve computational efficiency, the proposed model avoids directly computing the resulting nonlinear partial differential equation (PDE) by using Split Bregman algorithm. Numerical experiments are conducted to validate the performance of proposed model. Keywords: Mumford-Shah model · Nonlocal operators · Texture images · Multiphase segmentation · Split bregman algorithm

1

Introduction

Multiphase texture image segmentation is a complicated research topic for image processing and analysis [1, 2], computer vision etc. It has a number of significant applications such as moving objects tracking in videos, resources classification in SAR images, objects identification in texture images etc. Mumford-Shah [3] (MS) model is a fundamental variational energy functional for image segmentation. However, this classical model suffers from the inconsistent dimension problem and is very difficult to optimize. Chan and Vese [4] approximated the original MS model to segment piecewise constant image by using level set method [5,11,12]. This model is the well-known Chan-Vese (CV) model. In order to segment non-homogeneous object in the image, CV model was improved in [6], where authors minimized the two-phase piecewise smooth MS energy. In terms of texture image segmentation, Sandberg and Chan [7] applied Gabor transform, transforming the © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 386–395, 2015. DOI: 10.1007/978-3-662-47791-5_43

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation

387

original texture images into vector images and then implementing CV model on each layer of vector images. However, this method has two drawbacks: first, it is computational expensively; second, as some redundant and incorrect layers exist in vector images, it occasionally results in wrong segmentation results. In 2005, nonlocal means which could produce excellent results for texture preserving was first proposed in [8] and Gilboa and Osher [9] in 2008 defined a new series of nonlocal operators to make nonlocal models operate very similarly and conveniently. Bresson and Chan [10] combined nonlocal operators [10] and piecewise smooth Mumford-Shah in [6] to segment two-phase texture segmentation. Their model overcomes the drawbacks in [7] and therefore gives better results. However, their model cannot be directly employed to segment multiphase texture images and its convergent rate is very slow. In this paper, a nonlocal Mumford-Shah model is proposed for multiphase texture image segmentation to overcome the problems existing in the model. The proposed model combines nonlocal operators in [10] and multiple region partition strategy. The former is capable of handling texture information in the image, and the latter can segment n+1 texture regions by using only n label functions. Instead of solving nonlinear PDE [13], Split Bregman algorithm [12], which has been successfully applied to optimize L1-based variational models, is adopted to transform energy minimization problem of the proposed nonlocal Mumford model into two subproblems. These subproblems are then efficiently solved by Gauss-Seidel iteration and analytical soft thresholding equations. The new model is validated through experiments.

2

Two-Phase Nonlocal Mumford-Shah Segmentation and Nonlocal Operators

In this section, two-phase NLMS model will be firstly reviewed, followed by some nonlocal operators used in the model. For a scalar texture image f ( x ) : Ω → R, x ∈ Ω , the convex NLMS model is stated as the following minimization problem min

u1 , u2 ,φ ∈[0,1]

{E (u , u ,φ ) = α  ((u − f ) + λ ∇ 2

1

2

1 Ω

1

1

u

NL 1

2

)φ dx + α  (( u 2 Ω

− f ) + λ2 ∇ NLu2 2

2

2

) (1 − φ ) dx + γ 

Ω

}

∇φ dx

(1)

where u1 and u2 respectively represent gray values inside and outside of the closed contour denoted by label function φ . The first two terms on the right-hand side of the energy functional in (1) are fidelity data term, leading the contour to evolve to the boundaries of different patterns of texture. The third term is regularisation term that guarantees smoothness of closed contour in the evolution. α1 , λ1 , α 2 , λ2 and γ are the penalty parameters balancing these three energy terms. ∇ NLu is gradient operator defined in the nonlocal space. In order to solve (1), we need nonlocal operators. Based on nonlocal means [2] and concepts of graph gradient and divergence, Gilboa and Osher [9] systematically defined nonlocal gradient operator, divergence operator and Laplacian etc. Let u ( x) : Ω → R be the grayscale image defined on image space Ω , nonlocal similarity of the two pixels x : x ∈ Ω and y : y ∈ Ω in the image can be defined as follows

388 W. Lu et al.

(

 G ∗ u ( x + ⋅) − u ( y + ⋅)  σ w ( x, y ) = exp − h2 

)

2

   

(2)

where Gσ is Gaussian kernel function, σ is standard deviation of Gaussian kernel function, h is thresholding value controlling similarity. With the definition of nonlocal similarity, the nonlocal gradient vector operator at point x is given by ∇ NLu ( x, y )  ( u ( y ) − u ( x ) ) w ( x, y ) : y ∈ Ω

(3)

The nonlocal gradient module value at point x reads

Ω ( u ( y ) − u ( x ) ) w ( x, y ) dy

∇ NLu ( x ) =

2

(4)

The nonlocal divergence at point x can be defined as 

( ∇ NL ⋅ p ) ( x )  Ω ( p ( x, y ) − p ( y, x ) )

w ( x, y )dy

(5)

Then the nonlocal Laplace at point x is 1 Δ NLu ( x )  ∇ NL ⋅ ( ∇ NLu ( x ) ) =  ( u ( y ) − u ( x ) ) w ( x, y ) dy Ω 2

(6)

The nonlocal Tikhonov regularisation term can be derived according to (4) as follows

Ω ∇

u dx =  2

NL i

 ( u ( y ) − u ( x ) ) w ( x, y ) dydx 2

Ω Ω

(7)

Based on the above definition of nonlocal operators, (1) can be optimized. First we fix φ in order to obtain Euler-lagrange equations with respect to u1 and u2 φ ( u1 − f ) − λ1∇ NL ⋅ (φ∇ NLu1 ) = 0

(8)

(1 − φ )( u2 − f ) − λ2∇ NL ⋅ ( (1 − φ ) ∇ NLu2 ) = 0

(9)

In (8), ∇ NL ⋅ (φ∇ NLui ) at point x can be derived from the definition of nonlocal divergence (5) as follows ∇ NL ⋅ (φ∇ NLu1 )( x ) =  (φ ( y ) + φ ( x ) ) ( u1 ( y ) − u1 ( x ) ) w ( x, y ) dy Ω

(10)

Thus, the detailed form of u1 reads u1 ( x ) =

f ( x ) φ + λ1  (φ ( y ) + φ ( x ) ) u1 ( y ) w ( x, y ) dy Ω

φ + λ1  (φ ( y ) + φ ( x ) ) w ( x, y ) dy Ω

u2 in (9) can be deduced in the same manner as u1 .

(11)

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation

389

Then, we fix u1 and u2 , using following gradient decent flow to optimise label function φ  ∇φ  ∂φ (12) = γ∇ ⋅  − Q(u1 , u2 )  ∇φ  ∂t   where

(

Q (u1 , u2 ) = λ1 ( u1 − f ) + λ1 ∇ NLu1 2

2

) − λ ((u 2

2

−f

)

2

+ λ2 ∇ NLu2

2

)

(13)

Finally, function φ is constrained into interval [0, 1] with following projection formula φ = Max ( Min (φ ,1) ,0 )

(14)

Model (1) can only segment two-phase texture images and the optimization method (i.e. gradient decent flow) used to solve label function φ results in nonlinear partial differential equation, which is difficult to discretize to solve. Therefore, it is a nontrivial problem proposing multiphase texture segmentation model and simultaneously developing effective and efficient algorithm for it in the next section.

3

Multiphase Nonlocal Mumford-Shah Segmentation and Its Split Bregman Algorithm

In this section, we propose nonlocal Mumford-Shah model and design its Split Bregman algorithm. To the best of our knowledge, one label function can only divide one image into two regions (foreground and background). In order to segment multiple regions, it is natural to consider using multiple label functions. In this paper, we adopt following region partition approach (i.e. Fig. 1) which can segment n+1 regions by using only n label functions.

Ω1

Ω5

Ω2

Ω3

Ω4

Ω

Fig. 1. Segmenting five regions using four label functions

390 W. Lu et al.

Note that each region in Fig. 1 has a characteristic function. They are listed as follows

: Ω :χ (φ ) = φ (1 − φ )(1 − φ ) Ω :χ (φ ) = φ (1 − φ )(1 − φ )(1 − φ ) Ω :χ (φ ) = φ (1 − φ )(1 − φ )(1 − φ )(1 − φ ) Ω :χ (φ ) = φ (1 − φ )(1 − φ )(1 − φ )(1 − φ )(1 − φ ) Ω1 χ1 (φ ) = φ1 (1 − φ0 ) 2

2

2

1

0

3

3

3

2

1

0

4

4

4

3

2

1

0

5

5

5

4

3

2

1

0

The general expression form of characteristic function of Fig. 1 is given by i −1

χ i (φ ) = φi ∏ (1 − φ j )

i = 1,2,, n

(15)

j=0

n

where φ0 ≡ 0 , φn ≡ 1 . The scheme can automatically satisfy  χ i ( x ) = 1 (i.e., one i =1 pixel point can only belong to one segmented region) and therefore avoid producing overlapping or missing segmentation results. After characteristic function (15) is defined, we propose the following nonlocal Mumford-Shah model for multiphase texture image segmentation n +1  2 min  E (u,φ ) = α i  ( ui − f ) + λi ∇ NLui Ω i =1 

(

ui ,φi ∈[0,1]

2

) χ (φ ) dx + γ  ∇φ dx  n

i

i

i

(16)

i =1

(16) is a multiple variables optimization problem and can be solved by using alternating optimization method. In detail, first fixing φi for ui , we have

( ui − f ) χi (φ ) − λi∇ NL ⋅ ( χi (φ ) ∇ NLui ) = 0

(17)

The detailed form of ui can be written as ui ( x ) =

( ) χ (φ ) + λ  ( χ (φ ( y ) ) + χ (φ ( x ) ) ) w ( x, y ) dy

f ( x ) χ i (φ ) + λi  χ i (φ ( y ) ) + χ i (φ ( x ) ) ui ( y ) w ( x, y ) dy Ω

i

i

Ω

i

i = 1, 2,, n

(18)

i

Then, we fix ui to solve function φi . Note that if directly using gradient decent flow, the resulting Euler-equation is a nonlinear partial differential equation as shown in (12), and the discretization becomes more difficult. To make the numerical computation simple and efficient, Split Bregman algorithm is used to solve variable φi . The minimization problem (16) with respect to variable φi can be transformed into the following minimization problem by introducing auxiliary variable wi and Bregman iterative parameter bi n +1  2 Min  E (φ , w) = α i  ( ui − f ) + λi ∇ NLui Ω  i =1

wi , φi ∈[ 0,1]

(

2

) χ (φ ) dx +  γ  w dx +  θ2  ( w − ∇φ − b ) dx n

n

i

i

i

i =1

i

i

i =1

i

k +1 2 i

(19)

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation

391

To minimize (19), we first fix variable wi for φi n +1

∇ ⋅ ( wik − ∇φi − bik +1 ) +  Qi ( u ) i =1

∂χ i (φ ) ∂φk

=0

(20)

where  i −1 k =i  ∏ 1 − φ j , ∂χ i (φ )  j = 0  =  i −1 ∂φk − 1 − φ j φi , k ≠ i  j =∏  0, k ≠ i

(21)

(20) can be efficiently solved by one iteration of Gauss-Seidel. Then apply projection formula to variable φi φi = Max ( Min (φi ,1) ,0 )

(22)

After obtaining φi , the following analytical generalized soft thresholding equation is used to solve wi  γ  ∇φik +1 + bik +1 (23) wik +1 = Max  ∇φik +1 + bik +1 − i ,0  θi  ∇φik +1 + bik +1  with 0 0 = 0 . Now it is time to present following flow chart of Split Bregman algorithm for the proposed nonlocal multiphase Mumford-Shah model. Note that stopping criterion can be empirically chosen as φ k +1 − φ k φ k ≤ ε . ε is a small tolerance parameter.

Split Bregman algorithm for multiphase nonlocal Mumford-Shah segmentation 1. Initialization: ui0 = f , bi0 = 0 , wi0 = 0 , φi0 2. Repeat 3. Compute ui using (18); 4.

Compute φi using (20);

5.

Constrain φi into interval [0, 1] using (22);

6.

Compute wi using (23);

6. Compute Bregman parameter b using bik = bik +1 + ∇φi − wi 7. k=k+1; 8. Until stopping criterion is satisfied.

4

Segmentation Experiments

In this section, numerical experiments are conducted on synthetic and real texture images to validate the performance of the proposed model. All experiments are performed using Matlab 2010b on a Windows 7 platform with an Intel Core 2 Duo CPU at 2.33GHz and 2GB memory.

392 W. Lu et al.

Fig. 2 shows two-phase texture image segmentation. Different types synthetic texture images are set to validate the effectiveness of nonlocal Mumford-Shah model (1).

Fig. 2. Two-phase segmentation for synthetic texture images. First row: initialization used to give an initial guess for piecewise level set φ in model (1); Second row: final segmentation results; Third row: final piecewise level set φ . Fourth row: segmentation results by Chan-Vese model improved in [14].

In Fig. 2, the red contours in the first row are initialization for label function φ . In the second row of Fig. 2, segmentation results are showed by painting the contour φ = 0.5 . It is obvious that model (1) successfully segment these texture images regardless of the patterns of different texture. To better illustrate the final label function, we present the last row. One can see that these label functions are very close to binary images (0 and 1). In addition, it is very obvious that the classical Chan-Vese model [14] cannot obtain the correct results, as this method is based on gray value of objects in the image. Those pixels that have similar intensities will be regarded as the same phase. However, our proposed method uses the similarity (i.e. equation (2)) between two pixels, which are calculated based on two patches centered at the two pixels. The computed similarities will involve more information of surrounding pixels and thus can segment different texture into different phases.

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation

393

In Fig. 3, first three examples are respectively two zebras and a tiger which are often used for texture segmentation test. The texture on their bodies is very typical. There exist small or large gaps in different place. Meanwhile, the texture direction varies dramatically. The final example shows a challenging case (i.e. leopard). The underlying gray values of foreground and background are very similar. Though these images contain complicated texture, final segmentation results shown in the last column of Fig. 3 are very satisfactory. The second are third columns are the intermediate results with iteration 20, 40. Note that contours gradually move to the object of interest according to the different texture information. From the last row, one can see the Chan-Vese again failed in these cases. The proposed model thus outperforms the model in terms of performances.

Fig. 3. Two-phase texture images segmentation. First row: initialization; Second and third rows: intermediate results(iterations are 20 and 40 respectively); Fourth row: final segmentation results; Fifth row: segmentation results by Chan-Vese model [14].

In Fig. 4 and 5, our model with three-phase image segmentation (n=2 in model (19)) and four-phase image segmentation (n=3 in model (19)) will be tested respectively. It is obvious that segmentation results are still satisfactory. It is worth mentioning that region partition strategy cannot produce overlapping or missing segmentation results as it can automatically satisfy the unique partition condition (i.e., one pixel point can only belong to one segmented region). This can be confirmed in Fig. 4 and 5.

394 W. Lu et al.

Fig. 4. Three-phase texture images segmentation. First row: initialization; Second row: segmentation results; Third and fourth row: two label functions φ1 and φ2 in the proposed model (19).

Fig. 5. Four-phase texture images segmentation. First image: initialization; Second image: segmentation results; Third, fourth and fifth images: three label functions φ1 , φ2 and φ3 in the proposed model (19).

5

Conclusions

In this paper, we propose a nonlocal Mumford-Shah model for multiphase texture image segmentation. The proposed model combines nonlocal operators and multiple region partition strategy. Thus, it is capable of segmenting multiphase texture images. Moreover, Split Bregman algorithm is designed for the proposed model. Numerical experiments demonstrate the performance of the proposed model. Acknowledgements. The work has been partially supported by the National Natural Science Foundation of China (61170106).

References 1. Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations. In: AMS, vol. 147. Springer, Berlin (2002) 2. Chan, F.T., Shen, J.: Image processing and analysis: variational, PDE. Wavelet, and Stochastic Methods. SIAM (2005)

Nonlocal Mumford-Shah Model for Multiphase Texture Image Segmentation

395

3. Mumford, D., Shah, J.: Optimal approximations by piecewise smooth functions and associated variational problems. Communications on Pure and Applied Mathematics 42(5), 577–685 (1989) 4. Chan, T.F., Vese, L.A.: Active Contours Without Edges. IEEE. T. Image. Process. 10(2), 266–277 (2001) 5. Osher, S., Sethian, J.A.: Fronts propagation with curvature dependent speed: algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 79(1), 12–49 (1988) 6. Vese, L.A., Chan, T.F.: A multiphase level set framework for image segmentation using the Mumford and Shah model. Int. J. Comput. Vision 50(3), 271–293 (2002) 7. Sandberg, B., Chan, T.F., Vese L.A.: A Level-set and Gabor Based Active Contour Algorithm for Segmenting Textured Images. UCLA CAM Report 02-39 (July 2002) 8. Buades, A., Coll, B., Morel, J.M.: A Review of Image Denoising Algorithms, with a New One. SIAM Multiscale Modeling Simulation 4(2), 490–530 (2005) 9. Gilboa, G., Osher, S.: Nonlocal Operators with Applications to Image Processing. SIAM Multiscale Modeling Simulation 7(3), 1005–1028 (2008) 10. Bresson, X., Chan, T.F.: Nonlocal Unsupervised Variational Image Segmentation Models. UCLA CAM Report 08-67 (October 2008) 11. Osher, S., Fedkiw, R.: Level Set Methods and Dynamic Implicit Surfaces. Springer, New York (2002) 12. Osher, S., Paragios, N.: Geometric Level Set Methods in Imaging, Vision, and Graphics. Springer, New York (2003) 13. Sapiro, G.: Geometric Partial Differential Equations and Image Processing. Cambridge University Press (2001) 14. Duan, J.M., Pan, Z.K.: Some fast projection methods based on Chan-Vese model for image segmentation. Eurasip. J. Image. Vide.. Process. (July 2014)

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images Using Landsat-8 OLI Yu Yang(), Hong Zheng, and Hao Chen School of Automation Science and Electrical Engineering, Beihang University, Beijing 100191, China [email protected]

Abstract. In this paper, an automatic and efficient cloud detection algorithm for multi-spectral high spatial resolution images is proposed. Based on the statistical properties and spectral properties on a large number of the imagery with cloud layers, a multispectral-based progressive optimal scheme for detecting clouds in Landsat-8 imagery is presented. First, a basic process which distinguishes the difference between cloud regions and non-cloud regions is constructed. Based on the spectral properties of cloud and the optimal threshold setting, we obtain a basic cloud detection result which separates the input imagery into the potential cloud pixels and non-cloud pixels. Then, the potential cloud regions and the cloud optimal map are used together to derive the potential cloud layer. An optimal process of probability for clouds over land and water is implemented with a combination of a normalized snow/ice inspection and spectral variability inspection. Finally, in order to obtain the accurate cloud regions from the potential cloud regions, a robust refinement process derived from a guided filter is constructed to guide us in removing non-cloud regions from the potential cloud regions. The boundaries of cloud regions and semitransparent cloud regions are further marked to achieve the final cloud detection results. The proposed algorithm is implemented on the Landsat-8 imagery and evaluated in visual comparison and quantitative evaluation, and the cloudcovered regions were effectively detected without manual intervention. Keywords: Cloud detection · Landsat-8 · High spatial resolution · OLI · Multispectral detection · Refinement process

1

Introduction

Remote sensing image is one of the most valuable data available for studying landcover change, environmental protection, resource exploration and object recognition [1]. It is, however, difficult to acquire an appropriate image for the desired purpose, because many remote sensing images are inevitably covered by cloud. The International Satellite Cloud Climatology Project (ISCCP) data set estimates that the global annual average cloud cover is approximately 40% [2]. The presence of clouds complicates the use of data in the optical domain from earth observation satellites, causing problems for many remote sensing activities, including inaccurate atmospheric correction and false detection of specified object. Cloud detection, therefore, is a fundamental © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 396–407, 2015. DOI: 10.1007/978-3-662-47791-5_44

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images 397

pre-processing for using remote sensing images for diverse applications. Due to the spectral variability of clouds and the Earth's surface, automated accurate separation of clouds from normally illuminated surface conditions is difficult. One common approach is to screen clouds manually. However, this approach is time consuming and will limit efforts to mine the archive to study the history of the Earth's surface. Over the years, there have been numerous studies related to cloud detection. Most of them are designed for moderate spatial resolution sensors such as Advanced Very High Resolution Radiometer (AVHRR) and Moderate Resolution Imaging Spectroradiometer (MODIS) [3] [4] [5] [12]. These sensors are usually equipped with water vapor absorption bands, thermal bands and shortwave infrared bands [6]. Screening of clouds in high spatial resolution sensors has been performed by the Automated Cloud Cover Assessment (ACCA) system [7]. By applying a number of spectral filters, and depending heavily on the thermal infrared band, ACCA generally works well for estimating the overall percentage of clouds in each scene, which was its original purpose. However, it does not provide sufficiently precise boundaries of clouds to be useful for automated analyses of time series of Landsat images [8] [9]. For high spatial resolution sensors, Landsat 8 is equipped with the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS). Although OLI has two additional bands-the coastal aerosol and cirrus bands, they have no analog in previous Landsat sensors which may be useful for cloud detection. Additionally, TIRS will provide two bands in the thermal IR that do not correlate well with the Landsat thermal band. In this article, we propose an automatic cloud detection process for multispectral high spatial resolution images [10] [11]. First, a basic process which distinguishes the difference between cloud regions and non-cloud regions is constructed. Based on the spectral properties of cloud and the optimal threshold setting, we obtain a basic cloud detection result which separates the input imagery into the potential cloud pixels and non-cloud pixels. Then, the potential cloud regions and the cloud optimal map are used together to derive the potential cloud layer. An optimal process of probability for clouds over land and water is implemented with a combination of a normalized snow/ice inspection and spectral variability inspection. Finally, in order to obtain the accurate cloud regions from the potential cloud regions, a robust refinement process derived from a guided filter is constructed to guide us in removing non-cloud regions from the potential cloud regions. The boundaries of cloud regions and semi-transparent cloud regions are further marked to achieve the final cloud detection results.

2

Study Data Used

2.1

Landsat-8

Landsat-8 Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) images consist of 11 spectral bands with a spatial resolution of 30 meters for Bands 1 to 7 and 9(Table 1). New band 1 (ultra-blue) is useful for coastal and aerosol studies. New band 9 is useful for cirrus cloud detection. The resolution for Band 8 (panchromatic) is 15 meters. Thermal bands 10 and 11 are useful in providing more accurate surface temperatures and are collected at 100 meters.

398

2.2

Y. Yang et al.

Study Data

A total of 95 Level 1 Terrain corrected (L1T) Operational Land Imager (OLI) and Thermal Infrared Sensor (TIRS) images from 2012 to 2014 are used. The path is from 117 to 127 and the row is from 30 to 40 at Worldwide Reference System (WRS) All available Landsat OLI and TIRS images with cloud cover less than 70% and more than 10% are used. Table 1. Landsat OLI/TIRS spectral bands Bands Band 1 - Coastal aerosol Band 2 - Blue Band 3 - Green Band 4 - Red Band 5 - Near Infrared (NIR) Band 6 - SWIR 1 Band 7 - SWIR 2 Band 8 - Panchromatic Band 9 - Cirrus Band 10 - Thermal Infrared (TIRS) 1 Band 11 - Thermal Infrared (TIRS) 2

3

Wavelength (micrometers) 0.43 - 0.45 0.45 - 0.51 0.53 - 0.59 0.64 - 0.67 0.85 - 0.88 1.57 - 1.65 2.11 - 2.29 0.50 - 0.68 1.36 - 1.38 10.60 - 11.19 11.50 - 12.51

Resolution (meters) 30 30 30 30 30 30 30 15 30 100 100

Methodology

The proposed method mainly consists of three steps: basic process, optimal process and refinement process in Fig.1. First, in basic process, the input images of band 2(Blue), band 3(Green), band 4(Red) are transformed from RGB color model to HSI color model. With an optimal threshold in HSI color model, a coarse cloud detection result to highlight the potential cloud regions and non-cloud regions is obtained. Second, based on the physical and spectral properties of cloud, we construct an optimal map to remove redundant non-cloud regions from potential cloud regions in the coarse result. Third, a refinement process is used to refine the cloud detection and semi-transparent cloud regions are marked. 3.1

Basic Process

For Landsat-8 imagery, we first transform it from the RGB color model into the HSI color model which follows the human visual perception ability closely and separates the color components in terms of intensity, hue, and saturation. I=

1 ( R + G + B) 3

(1)

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images 399

θ H = 360 − θ

if B ≤ G otherwise

  1   ( R − G ) + ( R − B )  2 θ = arccos  1   ( R − G )2 + ( R − B )( G − B )  2     

(2)

(3)

Fig. 1. Block diagram of the proposed cloud detection algorithm

Component I is intensity equivalent and component H is hue equivalent. The normalized H is expressed as follows:  θ  360 H =  360 − θ  360

if B ≤ G otherwise

(4)

400

Y. Yang et al.

We construct a basic map to highlight the difference between cloud regions and noncloud regions as follows: M basic =

(1 − ε ) B5 + ε I I

(5)

IH

where I I and I H refer to the intensity and hue which are normalized to [0, 1] to compute the basic map M basic ,and

B5 is Landsat 8-band 5 TOA reflectance which

is also normalized to [0, 1]. ε is a ratio factor and in this paper it is usually set to ε = 0.36 . Fig.2 (b) is the basic map results of the original image using M basic .

Fig. 2. Potential cloud detection results (a) Original image. (b) Basic map. (c) Cloud detection result on the optimal threshold

Based on the basic map, Otsu’s method [13] is used to segment the image produced from basic map into potential cloud regions and non-cloud regions, which is the potential cloud detection result. Otsu’s method [13] produces an optimal threshold which divides the original image into foreground and background and separates the two classes (foreground and background) so that their interclass variance is maximum value. When the variance is maximum value, it is the best state of separation and the best threshold is following formulation: Toptimal = max[θ1 (T ) × θ2 (T )( μ1 (T ) − μ2 (T )) 2 ]

Where

θ1 (T ) =  i = 0 pi T

μ2 (T ) =  i =T +1 i ⋅ pi 255

,

θ 2 (T ) =  i =T +1 pi 255

,

(6)

μ1 (T ) =  i =0 i ⋅ pi

and pi is the probability of the gray level

T

,

i . The variable

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images 401

T is in the range of [0, 255]. The optimal threshold Toptimal is applied to basic map M basic , potential cloud detection result R potential is defined by: 1, R potential =  0,

θ1 (T ) ≥ Toptimal

(7)

otherwise

Where 1 refers to those pixels of potential cloud regions and 0 refers to those pixels of non-cloud regions. Due to the fitting problem of optimal threshold in Otsu’s method, the potential cloud detection result may suffer from excessive error detection in Fig.2(c). 3.2

Optimal Process

Optimal process consists of several spectral detections to identify the potential cloud pixels that may be cloudy and sometimes be non-cloudy and other pixels are considered to be completely non-cloud pixels. The first spectral detection is “Index Test” with NDSI index and NDVI index which is heritage from ACCA (the Automated Cloud-Cover Assessment) [14]. Based on the cloud property of “white” character in optical spectral bands, the NDSI and NDVI values are usually around zero. For certain cloud types, such as very thin clouds over highly vegetated area or icy clouds, the NDVI and NDSI values can be larger, but both of them cannot be higher than 0.75. ACCA also uses NDSI threshold of 0.8 to separate clouds from snow pixels. Therefore, we use NDSI and NDVI thresholds of 0.8 to separate potential cloud areas from some of the vegetated or snow covered areas. “Index Test” is defined as follows: Index Test = 0.25 ≤ NDSI ≤ 0.75 and NDVI ≤ 0.75

(8)

NDSI = (Band 3- Band 6) / (Band 3+ Band 6)

(9)

NDVI = (Band 5- Band 4) / (Band 5+ Band 4)

(10)

Where,

The second spectral detection is Haze Optimized Transformation Test (“HOT Test”) which was firstly proposed by Zhang [15] for Landsat data. Based on the idea that the visible bands for most underlying surfaces under non-cloud conditions are highly correlated, the spectral response to haze and thin cloud is different between the blue and red wavelengths. The “HOT Test” in Zhang [15] is established empirically from regression of the non-cloud pixels. “HOT Test” is retrieved for most of the Landsat images using TOA reflectance as inputs for regression and the results are especially helpful for distinguishing haze and thin cloud pixels from non-cloud pixels. Most of clouds like thin and thick clouds and thick aerosols will be identified by “HOT Test”. Note that some bright pixels like rocks, turbid water, or snow/ice underlying surface

402

Y. Yang et al.

may be included, due to the large TOA reflectance images in the visible bands. “Hot Test” is defined as follows: HOT Test=Band 2-0.5Band 4-0.08

(11)

Fig. 3. Coarse cloud detection results (a) Original image. (b) Basic map. (c) Optimal map. (d) Result of coarse cloud detection

The third spectral detection is “B5 Test” which is similar to the test in ACCA[17].The ratio of Band 5 and Band 6 which is larger than 1.0 is used to remove desert and bright rock from non-cloud pixels. Due to the property that the underlying surface of bright non-cloud pixels shows lower reflectance in Band 5 than in Band 3, whereas the reverse is true for cloud, the threshold is set to 2.16. We reduced this threshold of the ratio of Band 5 and Band 4 to 2.35 (sensitivity analysis of the global cloud reference dataset) to include all possible cloud pixels. “B5 Test” may also include other non-cloud pixels, and the main object of this test is separating most of bright rocks from cloud pixels. “B5 Test” is defined as follows: B5 Test=Band5/Band4 ≤ 2.35 and Band5/Band3 ≤ 2.16 and

(12)

Band5/Band6 ≥ 1.0

With the three aforementioned spectral tests, optimal map is constructed to filter the potential cloud detection results to coarse cloud results. Coarse cloud detection result is expressed as follows: Rcoarse = Rpotential  Index Test  HOT Test  B5 Test

(13)

If the potential cloud pixels are more than 99% of the image, the results will be used for the final cloud mask directly, as there are not enough non-cloud pixels for statistical analyses. If the potential cloud pixels are less than 99% of the image, the

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images 403

potential cloud pixels will be sent to the refinement process. As this process overestimates cloud fraction, the optimal process tends to include all possible cloudy pixels. Fig.3.(c) is the binary mask of optimal process result in Fig.3.(a) and the cloud pixels are masked in red in Fig.3.(d). From Fig.3, we can see that there is a large amount of error detection in Fig.3.(b) while coarse cloud detection result in Fig. 3.(c) successfully avoids the serious error detection by using an optimal process. For some cloudy Landsat images, our final coarse cloud detection result will be very close to the ideal cloud detection result. However, the boundaries of cloud regions and semi-transparent cloud regions are not detected accurately showed in the contrast between Fig.4.(a) and Fig.4.(b). Thus, to remove possible non-cloud regions from our coarse cloud detection result and compensate the boundaries of cloud regions, we further incorporate a refinement process into the cloud detection system.

Fig. 4. Details of coarse cloud detection result (a) Original image. (b) Coarse cloud detection

3.3

Refinement Process

In order to achieve better results of cloud detection, we proposed a refinement process using a guided filter to capture the boundaries of cloud regions and semi-transparent cloud regions, which is used to guide the following cloud detection. The guided filter [16] is a nonlinear filter with the properties of edge-preserving and noise-reducing. The guided filter is an explicit image filter involving input image P , guidance image I , and output image f . The basic idea of the guided filtering is local linear mode

I and the filter output f , and f is supposed to a linear transform of I in a square window ωk centered at the pixel k . between the guidance

fi = α k I i + β k , ∀i ∈ ωk

(14)

404

where

Y. Yang et al.

αk

and

βk

ωk ωk . The two coefficients α k

are constant linear coefficients in

coordinate in the square window

1

αk =

ω



I denotes a pixel and β k are defined by

and

Ii Pi − μk Pk

i∈ωk

(15)

δ k2 + ε

β k = Pk − α k μk where

μk

and

δ k2

denote the mean and variance of

cloud detection result Rcoarse

ωk ,

I in ωk , respectively. ω

Pk is the mean of P in ωk .The coarse is used as the input image P , and the guidance image is

denotes the number of pixels in

the original image

(16)

and

I , and the output is the finer cloud detection result R finer which

can be obtained by applying a guided image filtering. In proposed implementation, we typically set the window radius to 50, and

ε = 10−6

compare our finer cloud detection result

for the guided filter. In Fig. 5, we

R finer with Rcoarse , and we notice that

R finer has repaired some hiatuses around the boundary of cloud regions compared with Rcoarse ; what is more,

Fig. 5. Comparison between Guided feathering over

4

R finer contains most semitransparent cloud pixels.

Rcoarse and R finer . (a)

Original image. (b)

Rcoarse . (c)

Rcoarse .(d) R finer .

Results and Discussion

In this section, the effectiveness of proposed cloud detection algorithm by both visual comparisons and quantitative evaluation is demonstrated. We implemented our cloud detection algorithm in MATLAB release 2014a on a computer of CPU2.5GH and Memory 4.00GB. Furthermore, we present a quantitative evaluation of the detection accuracy and runtime to prove the efficiency of our method.

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images 405

4.1

Visual Comparisons

In Fig.6, we compare our method with Fmask method[18], which automatically detects cloud based on multi-spectral images. We observe that the method [18] cannot produce satisfactory results for complex Landsat 8 images while our method can achieve better results. Moreover, as the method [18] involves a complex optimal process, our method is more efficient than method [18].

Fig. 6. Detection result comparisons between method[18] and our method (a) Original image. (b) method[18] (c) our method.

4.2

Quantitative Evaluation

To quantitatively evaluate the efficiency of our cloud detection method, we use the accuracy rate to evaluate the accuracy of our method. The accuracy rate ( AR ) is defined by

406

Y. Yang et al.

AR =

TP − FCN − FNC TP

(17)

FCN denotes the number of cloud pixels identified as non-cloud pixels, FNC denotes the number of non-cloud pixels identified as cloud pixels, and TP

where

denotes the number of pixels in the input image. The evaluation result of cloud detection accuracy is shown in Table 2. Table 2. Evaluation result of cloud detection accuracy Cloud cover of images(%) 10-20 20-30 30-40 40-50 50-60 60-70

5

Number of images 18 22 15 12 13 15

Size of images 7831*7681 7831*7681 7831*7681 7831*7681 7831*7681 7831*7681

Accuracy of method[18] (%)

Accuracy of proposed method (%)

91.25 90.23 87.59 89.24 92.13 93.19

94.35 93.23 94.31 93.65 96.67 98.87

Conclusion

We have presented an automatic and efficient cloud detection algorithm for multispectral high spatial resolution images. First, a basic process which distinguishes the difference between cloud regions and non-cloud regions is constructed. Then, the potential cloud regions and the cloud optimal map are used together to derive the potential cloud layer. Finally, in order to obtain the accurate cloud regions from the potential cloud regions, a robust refinement process derived from a guided filter is constructed to guide us in removing non-cloud regions from the potential cloud regions. To demonstrate the effectiveness and efficiency of our method, we evaluate our method in visual comparisons and quantitative evaluation with Landsat-8 images. Experiment results have proved that our method can produce satisfactory results. The testing results over samples show the accuracy of the algorithm in research is over 93%.But it needs to point out that for thin clouds and cirrus clouds, the identification precision of the algorithm needs to be improved. In addition, because of limited number of cloud samples, the prior information got from samples is limited. Data accumulation should be strengthened and parameters should be adjusted to improve the detection accuracy. In the future, we will take more thermal and detail information into consideration to further improve the accuracy of our cloud detection method.

References 1. Pankiewicz, G.S.: Pattern recognition techniques for the identification of cloud and cloud systems. Meteorological Appl. 2(3), 257–271 (1995) 2. Zhang, Y., Rossow, W.B., Lacis, A.A., Oinas, V., Mishchenko, M.I.: Calculation of radiative fluxes from the surface to top of atmosphere based on ISCCP and other global data sets: Refinements of the radiative transfer model and the input data. Journal of Geophysical Research 109(19), 19105 (2004)

Automated Cloud Detection Algorithm for Multi-spectral High Spatial Resolution Images 407 3. Ackerman, S.A., Strabala, K.I., Menzel, W.P., Frey, R.A., Moeller, C.C., Gumley, L.E.: Discriminating clear sky from clouds with MODIS. Journal of Geophysical Research 103(24), 32141–32157 (1998) 4. Hall, D.K., Riggs, G.A., Salomonson.: Algorithm theoretical basis document (ATBD) for the MODIS snow and sea ice-mapping algorithms (2001) 5. Ackerman, S., Holz, R., Frey, R., Eloranta, E., Maddux, B., McGill, M.: Cloud detection with MODIS. Part II: Validation. J. Atmosp. Ocean. Technol. 25(7), 1073–1086 (2008) 6. Goodman, A.H., Henderson-Sellers, A.: Cloud detection and analysis: A review of recent progress. Atmos. Res. 21(3/4), 203–228 (1998) 7. Irish, R.: Landsat-7 automatic cloud cover assessment algorithms for multispectral, hyperspectral, and ultraspectral imagery. The International Society for Optical Engineering 4049, 348–355 (2000) 8. Hall, D.K., Riggs, G.A., Salomonson, V. V.: Development of methods for mapping global snow cover using moderate resolution imaging spectroradiometer data. Remote Sens. Environ. 54(2), 127–140 (1995) 9. Masek, J.G., Honzak, M., Goward, S.N., Liu, P., Pak, E.: Landsat-7 ETM+ as an observatory for land cover: Initial radiometric and geometric comparisons with Landsat-5 Thematic Mapper. Remote Sensing of Environment 78(1), 118–130 (2001) 10. Platnick, S., King, M.D., Ackerman, S.A., Menzel, W.P., Baum, B.A., Riédi, J.C., et al.: The MODIS cloud products: Algorithms and examples from Terra. IEEE Transactions on Geoscience and Remote Sensing 41(2), 459–473 (2003) 11. Tseng, D.C., Tseng, H.T., Chien, C.L.: Automatic cloud removal from multitemporal SPOT images. Applied Mathematics and Computation 205(2), 584–600 (2008) 12. Oreopoulos, L., Wilson, M.J., Várnai, T.: Implementation on Landsat data of a simple cloud-mask algorithm developed for MODIS land bands. IEEE Geoscience and Remote Sensing Letters 8(4), 597–601 (2011) 13. Otsu, N.: A threshold selection method from gray-level histograms. Automatica 11, 285–296, 23–27 (1975) 14. Salomonson, V.V., Appel, I.: Estimating fractional snow cover from MODIS using the normalized difference snowindex. Remote Sensing of Environment 89(3), 351–360 (2004) 15. Zhang, Y., Guindon, B., Cihlar, J.: An image transform to change characterize and compensate for spatial variability in thin cloud contamination of Landsat images. Remote Sensing of Environment 82(2–3), 173–187 (2002) 16. He, K., Sun, J., Tang, X.: Guided Image Filtering. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part I. LNCS, vol. 6311, pp. 1–14. Springer, Heidelberg (2010) 17. Irish, R., Barker, J.L., Goward, S.N., Arvidson, T.: Characterization of the Landsat-7 ETM+ Automated Cloud-Cover Assessment (ACCA) algorithm. Photogrammetric Engineering and Remote Sensing 72(10), 1179–1188 (2006) 18. Zhu, Z., Woodcock, C.E.: Object-based cloud and cloud shadow detection in Landsat imagery. Remote Sensing of Environment 118(15), 83–94 (2012)

Fast Image Blending Using Seeded Region Growing Yili Zhao1,2() and Dan Xu1 1

Department of Computer Science and Engineering, School of Information, Yunnan University, Kunming 650091, Yunnan, China [email protected], [email protected] 2 School of Computer and Information, Southwest Forestry University, Kunming 650224, Yunnan, China

Abstract. This paper presents a novel approach for combining and blending a set of aligned images into a composite mosaic with no visible seams and minimal texture distortion. To accelerate execution speed in building high resolution mosaics, the compositing seam is found efficiently via seeded region growing using a photometric criterion. A contribution of this paper is to use seeded region growing on image differences to find possible seams over areas of low photometric difference. This can result in significant reduction of blending time. The proposed method presents several advantages. The using of seeded region growing over image pairs guarantees the approximate optimal solution for each intersection region. The independence of such regions makes the algorithm suitable for parallel implementation. The using of priority queue leads to reduced memory requirements and a compact storage of the input data. Finally, it allows the efficient creation of large mosaics, without user intervention. We also evaluate the proposed blending method with pixel based graph cut and watershed based graph cut to illustrate the performance of the approach on image sequences with qualitative and quantitative comparison. Keywords: Image mosaic · Watershed segmentation · Graph cuts · Seeded region growing · Image blending

1

Introduction

Image blending is the final and essential step in producing high quality mosaics. Radiometric variations and registration errors in overlapping images commonly lead to geometric misalignments and photometric differences. The aim of image blending is to produce a visually plausible mosaic with two desirable properties: first, the mosaic should be as similar as possible to the input images, both geometrically and photometrically; second, the seam between the stitched images should be invisible. In this paper, we are interested in developing an image blending method capable of producing seamless 2D mosaics and preserving the appearance and clarity of object textures while dealing with misalignments resulting from image alignment process. A primary motivation for this work is the creation of high resolution mosaics by low-power embedded devices and smart phones. This application stresses the need to compute image mosaics as fast as possible and use less memory. We favor blending using © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 408–415, 2015. DOI: 10.1007/978-3-662-47791-5_45

Fast Image Blending Using Seeded Region Growing

409

contributions from a single image for each pixel point, while minimizing the intensity dissimilarity along the boundary lines of overlapping images. Additionally, we are interested in obtaining and comparing fast methods that could be applied in near real time. The paper is organized as follows. Section 2 overviews relevant work on image blending and provides some background on two techniques that inspire this paper, the graph cut optimization and watershed transform. Section 3 introduces the seeded region growing approach for image blending in the case of two images, followed by the extension to multiple images. Section 4 illustrates the application on selected data sets, and presents relevant quantitative comparisons of our method against both pixellevel graph cut and watershed based graph cut approaches. Finally Section 5 provides a summary and conclusions.

2

Related Work and Background

There are two main approaches to image blending in the literature, assuming that the images have already been aligned [1]. Optimal seam algorithms search for a curve in the overlap region on which the differences between adjacent images are minimal. Transition smoothing methods minimize seam artifacts by smoothing the transition between the images. Transition smoothing methods, commonly referred to as feathering or alpha blending, take the locations of seams between images as a given and attempt to minimize the visibility of seams by smoothing. In feathering or alpha blending, the mosaic image is a weighted combination of the input images. The weighting coefficients (alpha mask) vary spatially as a function of the distance from the seam. A traditional approach is pyramid blending by Burt and Adelson [2], where the images are decomposed into a set of band-pass components, and collapsed using a weighted average over a transition zone which is inversely proportional in size to the band frequency. Pyramid blending has been extensively used for blending images in 2D panoramas [3] and 3-D models. Recent transition smoothing approaches include the multiresolution blending using wavelets and the gradient domain blending [4]. Gradient-domain blending reduces the inconsistencies due to illumination changes and variations in the photometric response of the cameras since dissimilarities in the gradients are invariant to the average image intensity. However, gradient domain blending requires recovering the blended image from image gradient description by solving a discrete Poisson equation. In contract, optimal seam finding methods aim to place the seam between two images where intensity differences in their area of overlap are minimal. The method proposed in this paper fits in this class, and is therefore related to previous work. Uyttendaele et al. [5] search for regions of difference among images using thresholds over the image difference. Each region of difference is assigned to just one image by computing the minimum weight vertex cover over a weighted graph representation. Efros and Freeman proposed image quilting [6] to perform texture synthesis. In image quilting, small blocks from the sample image are copied to the output image. The first block is copied at random, and then subsequent blocks are

410 Y. Zhao and D. Xu placed such that they partly overlap with previously placed blocks of pixels. Efros and Freeman use dynamic programming to choose the minimum cost path from one end of this overlap region to the other. That is, the chosen path is through those pixels where the old and new patch colors are similar. The path determines which patch contributes pixels at different locations in the overlap region. The disadvantage of seaming finding by dynamic programming is that it can only be applied to horizontal or vertical direction. Shai and Ariel proposed seam carving [7] to perform content aware image resizing. In their paper a seam is an optimal 8-connected path of pixels on a single image from top to bottom, or left to right, where optimality is defined by an image energy function. By repeatedly carving out or inserting seams in one direction one can change the aspect ratio of an image. By applying these operators in both directions one can retarget the image to a new size. The optimal seam is also found by using dynamic programming. Thence a best seam is horizontal or vertical only. That is, a vertical seam is an 8-connected path of pixels in the image from top to bottom, containing one, and only one, pixel in each row of the image. Vivek et al. first introduce graph cut for image and video texture synthesis [8]. Given two overlapping images A and B , graph cut is used to find the cut within the overlap region, which creates the best transition between these images. The overlap region is represented as directed graph, where each node represents a pixel position p in the overlap region, which is denoted A(p)and B(p)for the two images A and

B , respectively. Nodes are connected by edges representing connectivity be-tween pixels. Usually 4-connectivity is assumed, i.e., each node is connected to four neighboring nodes. Each edge is given a cost encoding the pixel differences between the two source images at that position. In the simplest case the cost corresponds to the color difference between the images A and B at the neighboring pixels p and q .

w

w = w(p,q, A,B)= A(p)- B(p)+ A(q)- B(q) where

(1)

⋅ is the L2 norm.

Agarwala et al. [6] describe an interactive, computer-assisted framework for combining parts of a set of photographs into a single composite picture. They use graph cuts to find the contribution regions among several images where each pixel is treated independently. Pixel labeling is performed in general terms, by minimizing over all images at the same time. To find a solution, an iterative alpha-expansion graph cut algorithm was used. The application of multi-label graph cuts requires a potentially large number of graph-cut iterations and it will grow with the number of labels. Nuno et al. [9] use watershed segmentation on image differences to find possible cuts over areas of low photometric difference. That allows for searching over a much smaller set of watershed segments, instead of over the entire set of pixels in the intersection region. Watershed transform seeks areas of low difference when creating

Fast Image Blending Using Seeded Region Growing

411

boundaries of each segmen nt, constraining the overall cutting lines to be a seque nce of watershed segment bou undaries results in significant reduction of search spaace. The best seam is found effficiently via graph cut [10], using a photometric criteriion.

3

Image Blending with Seeded Region Growing

Both graph cut based imaage blending and watershed based image blending ttake seam finding as an energy y optimization problem. The difference between them m is graph cut based method uses every pixel in the intersection region as per-nodee in the graph network and wattershed based method divides the regions of image inntersection into sets of disjoin nt segments then finds the labeling of the segments tthat minimizes intensity differences along the seam. The final labeling assignmennt is solved by max-flow/min-ccut algorithm. In contrast, our method takes seam findding as an image segmentation problem, and we use seed region growing algorithm m to find the seam. The seam finding approaach finds the labeling of the segments that minimizes inttensity differences along the seam. s By labeling we refer to the association of each ssegment to one of the imagess. By seam we refer to the combined path that separaates neighboring segments that have h different labels. 3.1

Segmentation of thee Intersection Region

In particular, for the simplee case of 2 images, we start by computing their intersecttion region’s intensity differencces as illustrated in Fig. 1. The main difficulty of seeeded region growing lies in the choice of the right seeds to grow. For seam finding prroblem, we will select region 1 as the first seed, and select region 2 as the second seeed. Like graph cut texture meth hod, we will set the boundary pixels between region 1 and region 3 as the first seed, and a boundary pixels between region 2 and region 3 as the second seed. These are con nstraint seeds, and they indicate pixels that we insist w will come from one particular im mage patch. The regions selection is illustrated as Fig. 2.

Fig. 1. Two original images used for the seeded region growing blending example (a and bb)

412 Y. Zhao and D. Xu

Fig. 2. Intersection region’s intensity differences between (a) and (b)

3.2

Seeded Region Growing Labeling

After the seeds have been set, the seeded region growing method for seam finding can be implemented in three steps: (1) First we go over the whole image and find all foreground pixels (pixels with value different from zero) that have a background neighbor (pixels with value zero). Then we put these pixels in a priority queue, where the priority value is the grey value of that pixel in the differences image. (2) In the second step, we take pixels out of the priority queue, highest priority first, and examine all its neighbors. Neighbor pixels that belong to backgrounds are set to the value of the pixel which is taken out of the priority queue, and are added to the priority queue. (3) When the priority queue is empty, all pixels in the image will have a value different from 0, and the boundary between pixels with first seed value and second seed value is the best seam line needs for image blending. The blending mask and blending result of Fig. 1 using seeded region growing are shown as Fig. 3 and Fig. 4. Image label can be obtained from blending mask directly.

Fig. 3. Image blending mask with seeded region growing

Fast Image Blending Using Seeded Region Growing

413

Fig. 4. Image blending result with seeded region growing

4

Experiments Analysis and Evaluation

The first data set is the two images in Fig. 1, and the blending mask and blending result are illustrated as Fig. 3 and Fig. 4. The proposed image blending method is compared with graph cut blending and watershed blending methods. The blending mask and blending result of graph cut are illustrated as Fig. 5 and Fig. 6, and the blending mask and blending result of watershed are illustrated as Fig. 7 and Fig. 8.

Fig. 5. Image blending mask with graph cut

Fig. 6. Image blending result with graph cut

414 Y. Zhao and D. Xu

Fig. 7. Imag ge blending mask with graph cut and watershed

Fig. 8. Imag ge blending result with graph cut and watershed

Fig. 9 illustrates the effeect of varying the image resolution on the total executtion times for both methods. Seeeded region growing is the fastest, following is watershed method, and graph cut meth hod is the slowest.

Fig. 9. Image blending performance comparision

Fast Image Blending Using Seeded Region Growing

5

415

Conclusion

This paper presented a seam finding approach for automated seamless image blending of aligned images to create a mosaic. A novel aspect is the use of seeded region growing segmentation in the context of image blending. Using seeded region growing led to a reduction of the search time for finding the seam lines between overlapped images while still ensuring that the seam of each overlapped region would be along low difference areas. A central idea in this paper is that seeded region growing segmentation greatly reduces the search time for finding the best seam lines, when compared to searching over all individual pixels in the intersection zone by graph cut optimization. To support this we presented quantitative comparisons of this method against pixel-level blending and watershed based blending. Our approach compares very favorably in terms of time complexity and memory consumption, without noticeable degradation of the seam quality. The use of seeded region growing over image pairs guarantees a very fast solution over each intersection region. Such memory efficiency associated with execution speed makes this technique to scale to large mosaics and suitable for low-power embedded systems and smart phones.

References 1. Levin, A., Zomet, A., Peleg, S., Weiss, Y.: Seamless Image Stitching in the Gradient Domain. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 377–389. Springer, Heidelberg (2004) 2. Burt P., Adelson E.: A multiresolution spline with application to image mosaics. ACM Trans. Graph. 2, 217–236 (1983) 3. Brown M., Lowe D.: Recognising panoramas. In: ICC 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, pp. 1210–1218. IEEE Computer Society, Washington (2003) 4. Agarwala, A., Dontcheva, M., Agrawala, M., Drucker, S., Colburn, A., Curless, M., Salesin, D., Cohen, M.: Interactive digital photomontage. In: Proceedings of SIGGRAPH 2004. IEEE Press, New York (2004) 5. Uyttendaele, M., Eden, A., Szeliski, R.: Eliminating ghosting and exposure artifacts in image mosaics. In: Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR 2001), pp. 509–516. IEEE Press, Hawaii (2001) 6. Efros, A., Freeman, W.: Image quilting for texture synthesis and transfer. In: Proceedings of SIGGRAPH 2001, pp. 341–346. IEEE Press, New York (2001) 7. Avidan S., Shamir A.: Seam carving for content-aware image resizing. ACM Transactions on Graphics, 1270–1278 (2007) 8. Kwatra, V., Schodl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: image and video synthesis using graph cuts. In: ACM Trans. Graphics, Proceedings of the SIGGRAPH 2003, California (2003) 9. Gracias, N., Gleason, A., Negahdaripour, S.: Fast Image Blending using Watersheds and Graph Cuts. Image and Vision Computing 27(5), 597–607 (2009) 10. Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. In: Proc. IEEE Trans. Pattern Anal. Mach. Intell., vol 23 (11), pp. 1222–1239. IEEE press, New York(2004)

Research on the Algorithm of Multidimensional Vector Fourier Transformation Matrix Yue Yang1,2, Aijun Sang1(), Liyue Sun1, Xiaoni Li1, and Hexin Chen1 1

School of Communication Engineering, Jilin University, Changchun 130022, China [email protected] 2 Changchun Guanghua University, Changchun 130000, China

Abstract. The Fourier transform of the promotion is considered, which aims to develop a new model to solve complex data processing problems. This paper introduces the definition of multi-dimensional vector matrix. Based on the multidimensional vector matrix theory, the Fourier transformation is extended to multi-dimensional space, which includes the deduction of unitary orthogonal conjugate and energy concentration. Then the translation theory of twodimension Fourier transform is extended to multi-dimension space. Keywords: Multi-dimension vector matrix · Fourier transform · Orthogonal uni-tary transformation · Energy concentration · Translation theory

In today's high-tech rapid development of information age, orthogonal transformation plays a crucial role in many areas. Such as satellite laser ranging, multi-view video compression coding, artificial neural network to deal with the noise, etc., on the basis of orthogonal transformation as a core algorithm. Among them, the Fourier transform as a variety of orthogonal transformation algorithm of the classic algorithm, is widely used in recent years[1]. However, with a lot of information redundancy in the information age, people also constantly improve the demand of orthogonal transformation operations. To solve this problem, our laboratory proposed multi-dimensional vector matrix theory innovative, not only in the field of orthogonal transform algorithm has been an unprecedented breakthrough, but to break the limitations of the traditional twodimensional model, the concept is extended to multi-dimensional matrix[2] . Using this model, this project will transform algorithm for multi-dimensional orthogonal expansion Fourier transform, and then presents a multi-dimensional vector matrix discrete Fourier transform algorithm. This paper introduces the definition of multi-dimensional vector matrix; nuclear orthogonal matrix of multi-dimensional vector matrix discrete Fourier transform; the deduction and validation of nuclear orthogonal matrix and energy concentration; translation of the spectral properties of the multi-dimensional Fourier transform proved by the deduction and the simulation of three-dimensional spectrum translation. © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 416–422, 2015. DOI: 10.1007/978-3-662-47791-5_46

Research on the Algorithm of Multidimensional Vector Fourier Transformation Matrix

1

Multi-dimensional Vector Matrix Model

1.1

The Definition of 2M-dimensional Vector Matrix

(

417

)

An array of numbers ai1i2 in two directions (one direction has M entries and the other direction has N entries) is called two-dimensional matrix, and the set of all such ma-

(

)

trices is represented as M M ×N .An array of numbers ai1i2in in n directions (each direction has I i entries, 1 ≤ i ≤ n . I i can be called the order in this direction) is called multi-dimensional matrix, and the set of all such matrices is denoted as M I ×I ×× I . n

1 2

If the dimensions of multi-dimensional matrix M K ×K ××K are separated into 1 2 r two sets and the matrix is denoted as M

M





I1 ×I m × J1 ×× J n

can be denoted as

I = ( I1, I 2,..., I m ) , J = ( J 1, J 2,..., J n ) .

M

IJ

M





, where m + n = r .

I and J are the vectors,

, where

can be called



I1 ×I m × J1 ×× J n

dimensional vector matrix according to the vector

2



I1 ×I m × J1 ×× J n

multi-

I and J [3][4].

Multidimensional Vector Matrix of Fourier Transform

The promotion of multidimensional vector matrix Fourier transform algorithm breaks through the bound of the traditional two-dimensional Fourier transform. When we process the multidimensional data, multidimensional vector matrix model can be used to solve the problem. For traditional two-dimensional Fourier transform matrix

F = WIJ fWIJ*T , the key is to find a two-dimensional matrix multiplication significance of kernel function square

,to make it meet the: W W IJ

IJ

*T

=E.

In the same way, for the multidimensional vector matrix Fourier transform, In the sense of multidimensional vector multiplication of multidimensional vector nuclear matrix

WIJ . I , J for the vector, I = (I1 , I 2 ,  , I m )

to make it meet the vector matrix is:

WIJ WIJ

*T

, J = (J , J , , J ) , 1

2

n

= E . So the Fourier transform for multidimensional

FIJ = WIJ fWIJ *T . The f is the multidimensional vector matrix, F

is the multidimensional vector transform coefficient matrix. Multidimensional vector Fourier transform nuclear matrix is as follows:

WIJ ={w v1v2×××vmu1u2×××uM }IJ = where:

1 e- j 2p v1u1 / N1 ××× e- j 2p vM uM / N M N1 N 2 ××× N M

vi = 0,1,, Ni - 1; ui = 0,1,, Ni - 1; i = 1, 2,, M and M , N i ∈ N * .

418

Y. Yang et al.

Because the unitary nuclear matrix of Fourier transform is plural, its conjugate orthogonal condition is: WIJ -1 = WIJ *T . Besides, transpose of

WIJ *T is the complex conjugate

WIJ .

The prove of the conjugate orthogonality for the Fourier transform nuclear matrix:

WIJWIJ *T = =

=

æ N1 -1 N2 -1 NM -1 1 ´ çç å å  å wu1u2 uM v1v2 vM wv1v2 vM s1 s2 sM N1 N 2 ××× N m è v1 = 0 v2 = 0 vM = 0

1 N1 N 2 ××× N m

N1 -1 N 2 -1

å å

v1 = 0

1 N1 N 2 ××× N m

v2 = 0

N m -1

××× å e- j 2p v1u1 / N1 ××× e- j 2p vmum / Nm ´ e j 2p v1s1 / N1 ××× e- j 2p vm sm / Nm vm = 0

N1 -1 N 2 -1

N m -1

××× å e- j 2p v1 ( u1 - s1 )/ N1 ××× e- j 2p vm ( um - sm )/ Nm

å å

v1 = 0

v2 = 0

vm = 0

Suppose: N1 -1 - j

åe

I1 =

2p v1 ( u1 - s1 ) N1

V1 = 0

N 2 -1 - j

åe

I2 =

2p v2 ( u 2 - s2 ) N2

V2 = 0

××× N m -1 - j

Im =

åe

2p vm ( um - sm ) Nm

Vm = 0

\WIJWIJ *T = Take

1 I1I 2 ××× I m N1 N 2 ××× N m

I1 as an example: I1 =

N1 -1 - j

åe

2p v1 ( u1 - s1 ) N1

= e0 + e

V1 = 0

I1 =

N1 −1 − j 2π v1 ( u − s ) 1 1 N1

e

V1 = 0

=e + e 0

−j

2π ( u1 − s1 ) N1

Respectively discuss when (1)

u1 = s1

ö ÷÷ ø

+e

−j

4π ( u1 − s1 ) N1

+ ⋅⋅⋅e

u1 = s1 and u1 ≠ s1 :

−j

2π ( N1 −1) ( u1 − s1 ) N1

Research on the Algorithm of Multidimensional Vector Fourier Transformation Matrix

-j

I1 = 1 + e

2p (0) N1

-j

+e

4p (0) N1

-j

+×××e

419

2p ( N1 -1) (0) N1

1 1 1 1 I1 = ´ N1 = 1××× Im = ´ Nm = 1 N1 N1 Nm Nm In a similar way:

I 2 = I 3 = ××× = I m = 1

u1 ¹ s1 Suppose u1 - s1 = k1 , so:

(2)

I1 =

1 - e - j 2 k1p -j

1- e

2 k1p N1

 0 < k1 < N1 , 1 - e \ I1 =

1 - e - j 2 k1p 1- e

In a similar way So

2k p -j 1 N1

=

-j

2 k1p N1

¹0

0 -j

1- e

2 k1p N1

=0

I 2 = I 3 = ××× = I m = 0

*T

WIJWIJ =E , and E is a multidimensional complex unit vector matrix.

The above means the transformation nuclear matrix of Multidimensional vector Fourier transform Satisfy the orthogonality of conjugate and can be used for Fourier Transform.

3

Translation of the Coefficients of the Multidimensional Frequency Domain Space

The conjugate symmetry of Fourier transform derived from the conjugate symmetry of the unitary transformation of nuclear. It can make the frequency spectrum of signal focus on the four corners of the matrix. When making a signal for two-dimensional Fourier transform, we used to transform the results of the original point to the center of the square of frequency domain in order to analyze the distribution of the twodimensional spectrum more clearly. This is the theory called the translation theory of Fourier transform.

420

3.1

Y. Yang et al.

The Prove of Fourier Transform Translation Theory of Three Dimensional Vector Matrix

If: f ( x , y , z ) « F (u , v, t ) , then

f ( x, y , z ) e - j 2p ( u0 x + v0 y +t0 z )/ N « F (u - u 0 , v - v0 , t - t0 ) We can conclude: N -1 N -1 N -1

F (u - u0 , v - v0 , t - t0 ) = å x= 0

N -1

=å x =0

N -1

N -1

y=0

z =0

åå

åå y =0

f ( x, y, z)e- j 2p [(u -u0 ) x+(v- v0 ) y +(t -t0 ) z ]/ N

z=0

f ( x, y, z )e - j 2p (ux +vy +tz )/ N e j 2p (u0 +v0 +t0 )/ N

= F ( x, y, z )e j 2p (u0 +v0 +t0 )/ N f ( x, y , z )e - j 2p ( u0 x +v0 y +t0 z )/ N « F (u - u0 , v - v0 , t - t0 )

u0 = v0 = t0 = N / 2 , then:

If

F (u - u0 , v - v0 , t - t0 ) = F ( x, y, z )e jp ( x + y + z )  e jp = -1 \ F (u - u0 , v - v0 , t - t0 ) = F ( x, y, z )(-1)( x + y + z ) So: f ( x, y , z )( -1)

3.2

x+ y+z

« F (u -

N N N ,v - ,t - ) 2 2 2

The Spectrum Translation of Three-Dimensional Vector Matrix Fourier Transform

We use a three-dimensional random data

f IJ as 8´8´8 , make the simulation expe-

riment as the above properties. To do the Fourier transform of multidimensional vector matrix on

f IJ = f (i´ j )´k

. For the transform formula of transform is

WII × f IJ ×WJJ T = FIJ . The vector

I, J

is

i = (i, j ) , j = (k ) , WII

is a

four-dimensional eight-order vector matrix for the unitary transformation matrix of

WJJ *T is a general two-dimensional eight-order vector matrix for the unitary transformation matrix of Discrete Fourier Transform. FIJ is Discrete Fourier Transform.

the three-dimensional coefficient matrix result after transformation. After transformation, results of the spectrum analysis are shown in figure 1, figure 2.

Research on the Algorithm of Multidimensional Vector Fourier Transformation Matrix

421

Fig. 1. Spectrum translation of multidimensional data Fourier transform

Fig. 2. Spectrum distribution of the fifth plane to the k directions

Figure 1 shows that after three-dimensional Fourier transform spectrum translation, the energy focus on the fifth plane in the direction of k. Figure 2 shows the energy focus on the center of the fifth plane, which means that it is the center of threedimensional vector matrix. The results show that the Fourier transform spectrum translation which based on the multidimensional vector model is right and effective.

4

Conclusion

In this paper, we extended two-dimensional Fourier transform to multi-dimensional Fourier transform. The theory was established by the conjugate orthogonality and energy concentration property of nuclear matrix of unitary trans-formation. Then promote Fourier transform spectrum shift theory to the multi-dimensional fields.

422

Y. Yang et al.

Acknowledgment. This work is supported by the following projects. (1) Project of National Science Foundation of China (Grant No.61171078); (2) Project of National Science Foundation of Jilin Province(Grant No.2013010145JC) (3) Project of International Cooperation and Exchange Foundation of Jilin Province, China (Grant No.20130413053GH) (4) Project of International Cooperation and Exchange Foundation of Jilin Province, China (Grant No.20140414013GH); (5) Project of Youth Science Foundation of Jilin Province(Grant No.20130522164JH)

References 1. He, J.: Digital image processing. Xian university of electronic science and technology press, 2003.07 Version 1 2. Sang, A.J., Chen, M.S., Chen, H.X., Liu, L.L., Sun, T.N.: Multi-dimensional vector matrix theory and its application in color image coding. The Imaging Science Journal 58(3), 171–176(6) (2010) 3. Bao, H.J., Sang, A.J., Chen, H.X.: Inverse Operation of Four-dimensional Vector Matrix. International Journal of Intelligent Systems and Applications 3(5), 19–27 (2011) 4. Sang, A.J., Bao, H.J., Chen, H.X., Zhao, G.Q.: Four-dimensional Vector Matrix Determinant and Inverse. In: The 3rd International Conference on Networks Security, Wireless Communications and Trusted Computing, April 23-24, pp. 1151–1154 (2011) (EI)

A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation and Confidence Analysis Hang Li1(), Ze Wang1, Tian Tian2, and Hui Huang1 1

Research and Development Center, China Academy of Launch Vehicle Technology, Beijing 100076, People’s Republic of China [email protected] 2 China University of Geosciences, Wuhan 430074, People’s Republic of China

Abstract. In this paper, a small target detection algorithm in infrared image is proposed. First, an infrared image is coarse-to-fine segmented automatically by self-adaptive histogram segmentation. After detecting small abnormal region in segmented image and defining them to candidate targets, the abnormality based confidence of each candidate target is calculated and sorted. Finally, the candidate target with the maximum confidence is weighed to be real one. The experiments demonstrate that the proposed method is efficient and robust. Keywords: Infrared small target detection · Coarse-to-fine segment · Confidence

1

Introduction

Small infrared target detection is a key technology in infrared image process and pattern recognition. It is a hard problem for the variability of the target’s appearance due to different atmospheric conditions, background, camera ego-motion and et al. [1]. By assuming hot targets appears as bright spots in the scene, variable methods are proposed to solve this problem. Gu et al. [3] proposed a kernel-based nonparametric regression method for background prediction and clutter removal, furthermore applied in target detection. Wang et al. [4] presented a highpass filter based on LS-SVM to detect small target. Wang et al. [5] provided a real time small target detection method based on cubic facet model. In general, traditional small target detection can be divided into two classes. Gao et al. [6] applied self-similarity descriptor to model and detect targets. One uses grayscales based threshold to segment the infrared image and detected small target [2-4], the other fuses multiple features to segment the target region in fusion image [5-8]. Both of these methods rely on the ‘abnormality’ of the target region. The ‘abnormality’ means that, compared with the neighboring background regions, the target regions has lower or higher intensity and its intensity distribution is more uniform. In the segmentation stage, the present methods always apply a single segmenting threshold or adaptive threshold in fixed window. However, considering the interrupts derived from some factors, such as noise and sea clusters, it is hard to identify the segment boundary © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 423–432, 2015. DOI: 10.1007/978-3-662-47791-5_47

424

H. Li et al.

by using one-time segmentation. Thus the segmenting methods mentioned above cannot effectively detect the small target under complex background. To solve this problem, a small target detection approach is proposed here by using the principle of maximum entropy (POME) based self-adaptive segmentation and confidence analysis. As shown in Fig.1, the approach has three stages. First, a selfadaptive segmentation is applied to coarse-to-fine segments the infrared image. Then, the candidate targets are searched and detected in the segmented image. Finally, by calculating and sorting the confidence of each candidate target, the real target is weighed to be the one with the highest confidence. The experimental results validate the good performance of the proposed method in infrared small target detection.

Fig. 1. The procedure of proposed method. a is an infrared image. It is coarsely segmented to b and then b is finely segmented to c. Candidate targets are searched in c and performed in infrared image d. By calculating the confidences of candidate targets, the one with maximum confidence is weighed to be the real target and shown in e and f.

2

POME Based Self-adaptive Segmentation

2.1

The Principle of Maximum Entropy (POME)

Entropy defines the uncertainty of the random variables. When the entropy is larger, the random variables are more unstable. In this sense, the principle of maximum entropy presents a fact that, if part of prior knowledge is known, the most reasonable inference about the unknown distribution is the most uncertain one according to the known knowledge. It's the only choice because the other choices would be added other constraints and assumptions, and at the same time these constraints and assumptions according to the information which cannot be launched. The objective function based on POME is J = max H (Y X ) = p∈P

1

 p ( x, y ) log p ( y x ) ( )

(1)

x, y

Where H (Y X ) is denoted as conditional entropy. p( x, y ) is joint probability density and p ( y x ) is conditional probability density.

A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation

2.2

425

Objective Function Based on POME

Assuming that the infrared image I has n grayscales X = { xi }i = 0 , where xi ∈ n −1

and xi ∈ [ 0,255] . The probability of grayscale xi is defined as p ( xi ) . Thus the entropy of infrared image I is n −1

H ( X ) = − p ( xi ) log p ( xi ),  p ( xi ) = 1

(2)

i =0

Suppose that there is t which defines ROI regions as  ROI = { x0 , x1 ,..., xi = t − 1}

and defines the background as  background = { xi = t , xi +1 ,..., xn −1} . The entropies of ROI regions and background are denoted respectively as 255   H ROI = −  p ( xi ) log p ( xi ) xi = t t −1   H background = −  p ( xi ) log p ( xi )  xi = 0

(3)

where H ( X ) = H ROI + H background . By using threshold t , I is segmented to be I ,

( )

whose entropy is H X

where X = { x0 = 0, x1 = 255} . Thus there is

( )

H X = H ROI + H background

(4)

255   255   H ROI = −   p ( xi )  log  p ( xi )  xi = t  xi = t   t −1 t −1    H p x log p ( xi ) ( )    background = −  i  x =0   xi = 0  i 

(5)

Base on Eq. (1), the objective functions are

{

J1 = max H ROI = H

{

J 2 = max H background = H

({ x }

{ xi ≥ t} ⊆  ROI )}

(6)

({ x }

{ xi < t} ⊆ background )}

(7)

n −1 i i=0 n −1 i i=0

 p ( xi )  p ( xi { xi < t} ⊆ background ) =   p ( xi ) x  i 0

xi < t

(8) xi ≥ t

426

H. Li et al.

 p ( xi )  p ( xi { xi ≥ t} ⊆  ROI ) =   p ( xi ) x  i 0

xi ≥ t (9) xi < t

Thus according to Eq. (8) and Eq. (9), the objective function can be simplified by

{

J1 = max H ROI − H ROI

}

(10)

{

J 2 = max H background − H background

}

(11)

In traditional segment method based on POME, Eq. (10) and Eq. (11) always become

{

}

{

}

J = max H ROI − H ROI + H background − H background = max H ( xi ) − H ( xi )

(12)

And this kind of method iterates each grayscales to find t to satisfy Eq. (12). However the traversal algorithm wastes time and usually mistakenly segments part of target regions to background. To ensure segmenting the target regions completely, coarse-to-fine segmentation is used to get rid of most background regions first and then to obtain the ROI regions which are composed by target regions and false alarms.

J1 = max { H ( X ) = H ROI + H background }

{

{ ( )

}

J 2 = find t min H X = H ROI + H background and H ROI > 0

(13)

}

(14)

Because the uniform distribution has maximum entropy, Eq. (13) is transformed as

(

)

n′−1 1  J1 : f ( xi , xi ∈ X ) →  y j : p y j , Y = { y j } =  j =0 n′  

Where Y = { y j }

n ′−1 j =0

(15)

follows uniform distribution and Y ∈  m , y j ∈ [ 0, 255] . Due to

the entropy function is convex, its minimum value is equal to 0 under the situation that t = 0 or t = 255 . Thus the Eq. (14) is transformed as

J 2 : t = yn′− 2 2.3

(16)

The Coarse-to-fine Segmentation

Considering that segmentation in one time is hard to completely segment targets, we use segmentation twice to coarse-to-fine segment infrared image to ROI regions and background. As shown in Figs.2 (a2-b2), histogram equalization is firstly used to

A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation

427

homogenize the intensity space and transform the distribution of histogram to be uniform distribution. Then by using a self-adaptive threshold thres0 , the infrared image, shown in Fig.2 b1, is coarsely segmented in order to get rid of most background regions, shown in Fig.2 c1. Later histogram equalization is used to segment image, shown in Fig.2 (c2-d2), to enhance the contrast of the segmented image. Finally another self-adaptive threshold thres1 is applied to segment the enhanced image. The candidate targets shown in Fig.1d are searched and detected in the segmented image shown in Fig.2 e1.

thres0

thres1

Fig. 2. The flowchart of segmentation based on non-linear histogram equalization, coordinate images in each steps and their histograms. Images (a1 to e1) respectively show respectively the input infrared image, intermediate results and segmented image while Images (a2-d2) show the histograms corresponding to a1-d1.

Assuming that the infrared image has m grayscales defined by X = { xi }i = 0 , its m −1

cumulative distribution function in grayscale xi is defined as m −1

c ( xi ) =  p ( xi )

(17)

i =0

Where p ( xi ) denotes as the probability of grayscale xi in histogram. It is clear that, after histogram equalization, the distribution of grayscales follows uniform distribution. Thus assuming that there are m1 grayscales Y = { y j }

m1 −1 j =0

reserved after histogram

equalization, based on Eq.(15) and Eq.(16), the histogram equalization function is defined as:

   256  ( m − 1) c ( xi ) yj =  + 0.5 ×      p ( xi )  m1   i 

(18)

428

H. Li et al.

Then the self-adaptive threshold thres0 is denoted as

 thres0 =  yt 

m1 −1



  p ( y ) ≤ 0.5  p ( y ) ≤  p ( y ) t

j

j =0

t +1

j

j =0

j

j =0



(19)

The threshold is only related to m1 , where m1 can be determined by

  t  m1 =  m 1 ≤ m1 ≤ m, min   y j − 0.5   j =1   

(20)

Then histogram equalization is applied again. Assuming that there are m2 grayscales

Z = { zk }k =2 0 reserved and the histogram equalization function is defined as: m −1

   ( m1 − 1) c ( y j )   256  zk =  + 0.5 ×     p( yj )   m2   j 

(21)

Where m2 = m1 2 and the self-adaptive threshold thres1 is denoted as thres1 =

zm2 − 2 + zm2 −1 2

(22)

Histogram equalization is a contrast adjustment without affecting the global contrast by extending intensity range of image and merging neighboring, low probability gray levels. This property ensures the feasibility of our segmenting method that the high intensity regions in original image can always map to the high one in equalized image. But it also leaves some background pixels whose intensities are similar to that of target region, in the segmenting result. To capture the real target from those backgrounds, a selection method is then presented by using confidence measure and empirical size constraint.

3

Confidence Measure of Candidate Targets

By defining the regions except  in the original infrared image as  , the confidence of Ai takes the form

C ( Ai ) = C1 ( Ai ) × C2 ( Ai ) × C3 ( Ai ) Where C1 shows the size confidence

(23)

A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation

429

0 ri > tr C1 ( Ai ) =  1 ri ≤ tr

(24)

And tr indicates the maximum value of the radius ri . C2 shows the intensity confidence [9] C2 ( Ai ) =

1+ e

1 (

− λ1 μ − μ A

i

(25)

)

μ is the mean grey value of  . μA is the mean grey value of Ai . The higher the i

intensity of Ai , the more the value of C2 is close to 1; otherwise, the value of C2 is close to 0. C3 shows the contrast confidence

()

C3 Ai =

1 1+ e

(

(

− λ2 μ  − μ  − μ A − μ B i

i

))

(26)

μ  is the mean grey value of  and μB is the mean grey value of Bi . If the cani

didate target has a higher contrast than its neighboring background, the value of C3 is close to 1; otherwise, the value is close to 0. C2 and C3 are sigmoid functions

whose slopes are controlled, respectively, by λ1 and λ2 , where λ1 , λ2 ~ [1,+∞ ) . To sum up, the candidate target Ai with the largest confidence is decided to be the real one.

4

Experimental Work

Five typical infrared images sequences are selected to test the proposed method. Each image sequence contains 100 images with size 150 ×150 and the experiments were performed on MATLAB R2011a software in a 3.2GHZ personal computer. The slopes λ1 = λ2 = 5 and the radius’ maximum value tr = 7 . First, five images chosen from the five infrared image sequences, respectively, were tested and the results were intuitively performed in Fig. 3. Figs. 3a-e indicate the infrared images are under complex background such as roadside, sky, and sky-sea, sea background and heavy fog. It is clear that the real targets in these images are detected correctly. Then, the Receiver Operation Characteristic (ROC) curves of the target regions detected by the LSSVM method [1], the Facet method [2], the GST method [3] and our method were compared for the robustness of the detected target region. The true positive rate is defined as the ratio of the detected pixels’ number to the one of the real targets’ pixels, and the false-positive rate is denoted as the ratio of the false alarms’ number to the total pixels’ in the infrared image [5]. The ROC curves in Figs. 4a–e

430

H. Li et al.

a

b

c

d

e

1

1

0.8

0.8

0.6 0.4 Facet Approach LSSVM Approach GST Approach Our Approach

0.2 0 0

0.2

0.4 0.6 False Positive Rate

a

0.8

0.6 0.4 Facet Approach LSSVM Approach GST Approach Our Approach

0.2

1

0 0

0.2

0.4 0.6 False Posittive Rate

0.8

0.6 0.4 Facet Approach LSSVM Approach GST Approach Our Approach

0.2

1

0 0

b

0.2

0.4 0.6 False Positive Rate

c

0.8

0.6 0.4 Facet Approach LSSVM Approach GST Approach Our Approach

0.2

1

True Positive Rate

1 0.8 True Positive Rate

1 0.8 True Positive Rate

1 0.8

True Positive Rate

True Positive Rate

Fig. 3. Infrared images and result images under different backgrounds, where a-e label a couuple of infrared image and its resultt

0 0

0.2

0.4 0.6 False Positive Rate

0.8

0.6 0.4 Facet Ap pproach LSSVM Approach A GST App proach Our Apprroach

0.2

1

d

0 0

0.2

0.4 0.6 False Positive Rate

0 0.8

1

e

Fig. 4. The ROC curves of four methods in five infrared images shown in Fig.3 a~e

successively referred to thee targets shown in Figs. 3a–e. As shown in Fig. 4, the pproposed method represented by b the green curves performs much better than all the otther three methods. In order to objectively compare c our method with the others, a common metric [5] is applied to evaluate the acccuracy of small target detection method by assessing thheir background suppression. Signal to clutter ratio gain:

SCR Gain = ( S C )out

( S C )in

((27)

where S is the signal amplitu ude and C is the clutter standard deviation in a single fraame. Meanwhile the time consum ming performance is also considered for comparison. Experimental results, inccluding SCR gain and run time, are listed in Table 1, whhich indicates that our method and LSSVM method have the best two time consum ming performance. Moreover ou ur method performs better in accuracy than LSSVM m method. In addition, the metho ods with best SCR gain are our method and GST methoods. But GST method is slowerr than ours. It is evident that our method maintains beetter performances under differeent backgrounds. Thus it is a correct and efficient methhod for small target detection in n infrared image.

A Small Infrared Target Detection Method Based on Coarse-to-Fine Segmentation

431

Table 1. Performances comparison of four small target detective methods while a, b, c, d and e present the infrared images shown in Fig.3 Method Facet LSSVM GST Ours

Metrics SCR gain

a 5.437

b 6.906

c 11.43

d 8.975

e 1.781

Time (s)

0.446

0.420

0.381

0.659

0.530

SCR gain Time (s) SCR gain Time (s) SCR gain Time (s)

6.331 0.527 20.25 0.416 28.64 0.277

8.490 0.233 27.03 0.431 21.76 0.263

9.127 0.250 19.84 0.612 14.66 0.302

5.255 0.603 12.42 0.757 43.90 0.376

2.349 0.222 2.454 0.457 6.881 0.384

Table 2. The detection ratio (%) of four target detection methods in five infrared image set, where a~e correspond to infrared image a~e shown in Fig.3 respectively Method LSSVM GST Facet Ours

a 82.1 74.5 59.8 96.7

b 80.5 20.5 79.5 87.0

c 80.8 75.6 75.1 93.0

d 60.3 63.5 87.3 96.8

e 52.3 47.5 53.5 70.9

For further comparison, five infrared image sets are used to validate these methods. Table 2 lists the detective ratios, which show the proportions of detected real targets for four methods in the five infrared image sequences. In Table 2, it is clear that LSSVM method shows stable performance contrast with GST method. The facet method in all database performs robust as well as our method, but its detection ratio is lower. It is evident that our method shows better detection ratios under different situations, thus it is a robust method for infrared small target detection.

5

Conclusion

Based on coarse-to-fine segmentation and confidence analysis, our approach can be used in infrared small target detection and verified good performances via experiments. The objective function origins from POME ensure the targets segmented from the infrared image completely. And the abnormality based confidence can be able to distinguish real targets and false alarms. However, due to the assumption that targets are brighter than neighboring background, our method cannot be used in infrared image whose targets in dark. Experimental results suggest that the presented method is efficient and robust.

432

H. Li et al.

References 1. Shaik, J., Iftekharuddin, K.M.: Detection and tracking of targets in infrared images using Bayesian techniques. Optics and Laser Tech. 41, 832–842 (2009) 2. Yang, L., Yang, J., Yang, K.: Adaptive detection for infrared small target under sea-sky complex background. Electron. Lett. 40(17), 1083–1085 (2004) 3. Gu, Y.F., Wang, C., Liu, B.X., Zhang, Y.: Kernel-based nonparametric regression method for clutter removal in infrared small-target detection applications. IEEE Geosci. Remote Sens. Lett. 7, 469–473 (2010) 4. Wang, P., Tian, J.W., Gao, C.Q.: Infrared small target detection using dinfraredectional highpass filters based on LS-SVM. Electron. Lett. 45(3), 156–158 (2009) 5. Wang, G.D., Chen, C., Shen, X.B.: Facet-based infrared small target detection method. Electron. Lett. 41(22), 1244–1246 (2005) 6. Gao, C.D., Zhang, T.Q., Li, Q.: Small infrared target detection using sparse ring representation. IEEE Aerosp. Electron. Syst. Mag. 27(3), 21–30 (2012) 7. Qi, S.X., Ma, J., Tao, C.: A Robust Directional Saliency-Based Method for Infrared SmallTarget Detection Under Various Complex Backgrounds. IEEE Trans. Geosci. and Remote Lett. 99, 1–5 (2012) 8. Wang, X., Lv, G.F., Xu, L.Z.: Infrared dim target detection based on visual attention. Infrared Physics and Technology 55(6), 513–521 (2012) 9. Yilmaz, K., Shafique, M.: Target Tracking in Airborne Forward Looking Infrared Imagery. Image and Vision Comput. J. 21(7), 623–635 (2003) 10. Parag, T.: Coupled label and intensity MRF models for IR target detection. In: CVPR Workshops, pp. 7–13 (2011) 11. Zhang, L., Wu, B., Nevatia, R.: Pedestrian detection in infrared images based on local shape features. In: CVPR Workshop OTCBVS, pp. 1–8 (2007)

Rapid Recognition of Cave Target Using AirBorne LiDAR Data Guangjun Dong1(), Zhu Chaojie2, Haifang Zhou3, and Sheng Luo1 1

Information Engineering University, Zhengzhou 450052, China [email protected] 2 The Office Agency of Xizang Troops Stationed in Sichuan, Chengdu, China 3 National University of Defense Technology, Changsha, China

Abstract. Based on the special geometry of cave targets, a method for LiDAR point cloud data rapid recognition of residents cave and natural cave has been proposed. By calculating the distance between LiDAR points and sensor position, the target can be quickly identified if there is a distance sudden change. In order to improve the accuracy of target recognition, the number threshold of sudden change points and region-wide threshold are introduced, and the recognized target contour points can be further distinguished in target front plane or inside plane. Keywords: LiDAR · Multistorey building · Region growing

1

Introduction

AirBorne LiDAR is a new measure of earth observation from space that integrate 3D laser scanning,GPS and INS, it can quickly obtain high precision three-dimensional coordinates of the ground objects, it has characteristic of all-weather, all time because of its active earth-observation system, So it is widely used in the area of surveying and mapping. For conventional applications of airborne LiDAR, Such as topographic survey, DSM, DOM, DLG generation, and so on. The operation mode is simple relatively because of a lot of operating experience about traditional photogrammetry can be referenced. But for identification and measurement of the cave target(Such as caves, natural caves and special military targets, etc.), The planning of measurement task and follow-up treatment will encounter new challenges due to its special geometrical space feature. Therefore, it is significant to study cloud data simulation of cave targets and target recognition using the simulation results, so it can assess the ability of identification and measurement of cave target by LiDAR systems, and provides technical indicators for subsequent measurement task planning.

2

Principle and Methods

2.1

The Imaging Mode of LiDAR System

At present, there are mainly two ways for earth observation of LiDAR systems: mechanic scan modes and plane array mode. Single-point measurements can only be © Springer-Verlag Berlin Heidelberg 2015 T. Tan et al. (Eds.): IGTA 2015, CCIS 525, pp. 433–437, 2015. DOI: 10.1007/978-3-662-47791-5_48

434

G. Dong et al.

achieved on a specific direction by mechanical scanning, for example, Scans measurement along the flight direction is achieved by means of the flight platform, measurement vertical to the flight direction is achieved by mechanical device, now there are three scanning representative styles: constant speed mirror scanning ,concussion microscope scanning and fibre optics array scanning. Plane array imaging modality is a Radar imaging technology with good development prospects at this stage,its principle is that modulate the gain of the laser echo returned by the image intensifier, and finally get the intensity map on the CCD, then calculating three-dimensional distance information of the target by intensity map. No matter what imaging modalities the LiDAR system we adopt, 3D point cloud data of observation targets will be obtained eventually. Take concussion microscope scanning(the method is common now, other scanning methods are no longer discussed alone)and plane array imaging modality as examples, the Imaging principle is shown in figure 1:

Fig. 1. (a) Illustration of concussion microscope scanning (b) Illustration of plane array imaging modality

2.2

Definition of Three-dimensional Model for the Target

The department of cave target makes a distinction from the general ground targets depend on the geometry of cavity embedded inside, take the ordinary cave as an example, Fig. 2(a) illustrates the structure, The model is uniquely determined in geometric space by four parameters :the width dW, height dH, depth dL, high dome dh, also model location point P and azimuth κ(rotation angle of the model along the direction perpendicular to the ground in the space). 2.3

Rapid Identification of Cave Target Based on the Distance Sudden Change

The special of cave target makes a distinction from the general ground targets (Such as buildings, trees, etc.) depend on the geometry of cavity embedded inside (Fig. 3).Take single-line laser scanning of target surface as an example, in accordance with the

Rapid Recognition of Cave Target Using AirBorne LiDAR Data

435

Fig. 2. The definition of 3D model for the target

direction from left to right to scan the target a and the target b (ordinary buildings and residents cave) . There are three categories of laser point witnin target a, while target b has positive walls, the bottom surface, the internal walls and other types of laser point. Then calculate and analyze the distance between various types of laser points with the sensor S1, S2, assume that the distance of various types of laser point respectively is d1 d2 d3,Two inequalities exist da1>da2 and da2>da3 for the target a, while for target b, db1