Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII 3030585190, 9783030585198

The 30-volume set, comprising the LNCS books 12346 until 12375, constitutes the refereed proceedings of the 16th Europea

368 36 108KB

English Pages 805 [845] Year 2020

Report DMCA / Copyright

DOWNLOAD FILE

Computer Vision – ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVII
 3030585190, 9783030585198

Table of contents :
Foreword
Preface
Organization
Contents – Part XVII
Class-Wise Dynamic Graph Convolution for Semantic Segmentation
1 Introduction
2 Related Work
3 Approach
3.1 Preliminaries
3.2 Overall Framework
3.3 Class-Wise Dynamic Graph Convolution Module
3.4 Loss Function
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Implementation Details
4.3 Ablation Study
4.4 Comparisons with State-of-the-Arts
5 Conclusions
References
Character-Preserving Coherent Story Visualization
1 Introduction
2 Related Work
2.1 GAN-based Text-to-Image Synthesis
2.2 Evaluation Metrics of Image Generation
3 Character-Preserving Coherent Story Visualization
3.1 Overview
3.2 Story and Context Encoder
3.3 Figure-Ground Aware Generation
3.4 Loss Function
3.5 Fréchet Story Distance
4 Experimental Results
4.1 Implementation Details
4.2 Dataset
4.3 Baselines
4.4 Qualitative Comparison
4.5 Quantitative Comparison
4.6 Architecture Search
4.7 FSD Analysis
5 Conclusions
References
GINet: Graph Interaction Network for Scene Parsing
1 Introduction
2 Related Work
3 Approach
3.1 Framework of Graph Interaction Network (GINet)
3.2 Graph Interaction Unit
3.3 Semantic Context Loss
4 Experiments
4.1 Datasets
4.2 Implementation Details
4.3 Experiments on Pascal-Context
4.4 Experiments on COCO Stuff
4.5 Experiments on ADE20K
5 Conclusion
References
Tensor Low-Rank Reconstruction for Semantic Segmentation
1 Introduction
2 Related Work
3 Methodology
3.1 Overview
3.2 Tensor Generation Module
3.3 Tensor Reconstruction Module
3.4 Global Pooling Module
3.5 Network Details
3.6 Relation to Previous Approaches
4 Experiments
4.1 Implementation Details
4.2 Results on Different Datasets
4.3 Ablation Study
4.4 Further Discussion
5 Conclusion
References
Attentive Normalization
1 Introduction
2 Related Work
3 The Proposed Attentive Normalization
3.1 Background on Feature Normalization
3.2 Background on Feature Attention
3.3 Attentive Normalization
4 Experiments
4.1 Ablation Study
4.2 Image Classification in ImageNet-1000
4.3 Object Detection and Segmentation in COCO
5 Conclusion
References
Count- and Similarity-Aware R-CNN for Pedestrian Detection
1 Introduction
2 Related Work
3 Baseline Two-Stage Detection Framework
4 Our Approach
4.1 Detection Branch
4.2 Count-and-Similarity Branch
4.3 Inference
5 Experiments
5.1 Datasets and Evaluation Metrics
5.2 Implementation Details
5.3 CityPersons Dataset
5.4 CrowdHuman Dataset
5.5 Results on Person Instance Segmentation
6 Conclusion
References
TRADI: Tracking Deep Neural Network Weight Distributions
1 Introduction
2 TRAcking of the Weight DIstribution (TRADI)
2.1 Notations and Hypotheses
2.2 TRAcking of the DIstribution (TRADI) of Weights of a DNN
2.3 Training the DNNs
2.4 TRADI Training Algorithm Overview
2.5 TRADI Uncertainty During Testing
3 Related Work
4 Experiments
4.1 Toy Experiments
4.2 Regression Experiments
4.3 Classification Experiments
4.4 Uncertainty Evaluation for Out-of-Distribution (OOD) Test Samples
5 Conclusion
References
Spatiotemporal Attacks for Embodied Agents
1 Introduction
2 Related Work
3 Adversarial Attacks for the Embodiment
3.1 Motivations
3.2 Problem Definition
4 Spatiotemporal Attack Framework
4.1 Temporal Attention Stimulus
4.2 Spatially Contextual Perturbations
4.3 Optimization Formulations
5 Experiments
5.1 Experimental Setting
5.2 Evaluation Metrics
5.3 Implementation Details
5.4 Attack via a Differentiable Renderer
5.5 Transfer Attack onto a Non-differentiable Renderer
5.6 Generalization Ability of the Attack
5.7 Improving Agent Robustness with Adversarial Training
5.8 Ablation Study
6 Conclusion
References
Caption-Supervised Face Recognition: Training a State-of-the-Art Face Model Without Manual Annotation
1 Introduction
2 Related Work
3 Methodology
4 Dataset
5 Experiments
5.1 Experiment Setting
5.2 Comparison to Fully Supervised Training
5.3 Comparison to SSL and MIL Methods
5.4 Ablation Study and Discussion
6 Conclusion
References
Unselfie: Translating Selfies to Neutral-Pose Portraits in the Wild
1 Introduction
2 Related Work
3 Our Method
3.1 Datasets
3.2 Nearest Pose Search
3.3 Coordinate-Based Inpainting
3.4 Composition
4 Experiments
4.1 Comparisons with Existing Methods
4.2 Ablation Study
4.3 Limitations
5 Conclusion
References
Design and Interpretation of Universal Adversarial Patches in Face Detection
1 Introduction
2 Related Work
3 Interpretation of Adversarial Patch as Face
3.1 Preliminaries on Face Detection
3.2 Design of Adversarial Patch
3.3 Generality
3.4 Interpretation of Adversarial Patch
4 Improved Optimization of Adversarial Patch
4.1 Evaluation Metric
4.2 Improved Optimization
4.3 Experimental Results
5 Conclusions
References
Few-Shot Object Detection and Viewpoint Estimation for Objects in the Wild
1 Introduction
2 Related Work
3 Approach
3.1 Few-Shot Learning Setup
3.2 Network Description
3.3 Learning Procedure
4 Experiments
4.1 Few-Shot Object Detection
4.2 Few-Shot Viewpoint Estimation
4.3 Evaluation of Joint Detection and Viewpoint Estimation
5 Conclusion
References
Weakly Supervised 3D Hand Pose Estimation via Biomechanical Constraints
1 Introduction
2 Related Work
3 Method
3.1 Biomechanical Constraints
3.2 Zroot Refinement
3.3 Final Loss
4 Implementation
5 Evaluation
5.1 Datasets
5.2 Evaluation Metric
5.3 Effect of Weak-Supervision
5.4 Ablation Study
5.5 Bootstrapping with Synthetic Data
5.6 Bootstrapping with Real Data
6 Conclusion
References
Dynamic Dual-Attentive Aggregation Learning for Visible-Infrared Person Re-identification
1 Introduction
2 Related Work
3 Proposed Method
3.1 Baseline Cross-modality Re-ID
3.2 Intra-modality Weighted-Part Aggregation
3.3 Cross-modality Graph Structured Attention
3.4 Dynamic Dual Aggregation Learning
4 Experimental Results
4.1 Experimental Settings
4.2 Ablation Study
4.3 Comparison with State-of-the-Art Methods
5 Conclusion
References
Contextual Heterogeneous Graph Network for Human-Object Interaction Detection
1 Introduction
2 Related Work
3 Approach
3.1 Preliminary
3.2 Pipeline
3.3 Contextual Learning
3.4 HOI Prediction
4 Experiments
4.1 Datasets and Metrics
4.2 Implementation Details
4.3 Ablation Studies
4.4 Performance and Comparison
5 Conclusions
References
Zero-Shot Image Super-Resolution with Depth Guided Internal Degradation Learning
1 Introduction
2 Related Work
3 Approach
3.1 Depth Guided Training Data Generation
3.2 Network Structure
3.3 Bi-cycle Training
4 Discussion
5 Experiment
5.1 Dataset and Training Setup
5.2 Comparison with the State of the Arts
5.3 Visual Comparison
5.4 Super-Resolving Image with Estimated Depth
5.5 Ablation Study
6 Conclusion
References
A Closest Point Proposalpg for MCMC-based Probabilistic Surface Registration
1 Introduction
2 Background
2.1 Gaussian Process Morphable Model (GPMM)
2.2 Analytical Posterior Model
3 Method
3.1 Approximating the Posterior Distribution
3.2 CP-proposal
4 Experiments
4.1 Convergence Comparison
4.2 Posterior Estimation of Missing Data
4.3 Registration Accuracy - ICP vs CPD vs CP-proposal
5 Conclusion
References
Interactive Video Object Segmentation Using Global and Local Transfer Modules
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 Network Architecture
3.2 Training Phase
3.3 Inference Phase
4 Experimental Results
4.1 Comparative Assessment
4.2 User Study
4.3 Ablation Studies
5 Conclusions
References
End-to-end Interpretable Learning of Non-blind Image Deblurring
1 Introduction
1.1 Related Work
1.2 Main Contributions
2 Proposed Method
2.1 A Convolutional HQS Algorithm
2.2 Convolutional PCR Iterations
2.3 An End-to-end Trainable CHQS Algorithm
3 Implementation and Results
3.1 Implementation Details
3.2 Experimental Validation of CPCR and CHQS
3.3 Uniform Deblurring
3.4 Non-uniform Motion Blur Removal
3.5 Deblurring with Approximated Blur Kernels
4 Conclusion
References
Employing Multi-estimations for Weakly-Supervised Semantic Segmentation
1 Introduction
2 Related Work
2.1 Semantic Segmentation
2.2 Weakly-Supervised Semantic Segmentation
2.3 Learning from Noisy Labels
3 Pilot Experiments
4 Approach
4.1 The Class Activation Map
4.2 Multi-type Seeds
4.3 Multi-scale Seeds
4.4 Multi-architecture Seeds
4.5 The Weighted Selective Training
5 Experiments
5.1 Dataset
5.2 Implementation Details
5.3 The Influence of Multiple Seeds
5.4 The Weighted Selective Training
5.5 Comparison with Related Works
6 Conclusions
References
Learning Noise-Aware Encoder-Decoder from Noisy Labels by Alternating Back-Propagation for Saliency Detection
1 Introduction
2 Related Work
3 Proposed Framework
3.1 Noise-Aware Encoder-Decoder Network
3.2 Maximum Likelihood via Alternating Back-Propagation
3.3 Comparison with Variational Inference
3.4 Network Architectural Design
4 Experiments
4.1 Experimental Setup
4.2 Comparison with the State-of-the-Art Methods
4.3 Ablation Study
4.4 Model Analysis
5 Conclusion
References
Rethinking Image Deraining via Rain Streaks and Vapors
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 SNet
3.2 VNet
3.3 ANet
3.4 Network Training
3.5 Visualizations
4 Experiments
4.1 Dataset Constructions
4.2 Ablation Studies
4.3 Evaluations with State-of-the-Art
5 Concluding Remarks
References
Finding Non-uniform Quantization Schemes Using Multi-task Gaussian Processes
1 Introduction
2 Related Work
3 Method
3.1 Constraining the Space
3.2 Exploring the Space
3.3 Sampling the Space
4 Experiments and Results
5 Conclusion
References
Is Sharing of Egocentric Video Giving Away Your Biometric Signature?
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Extracting Gait Signatures from Egocentric Videos
3.2 Recognizing Wearer from First Person Video
3.3 Extracting Gait from Sparse Optical Flow
3.4 Recognizing Wearer from Third Person Video
4 Datasets Used
5 Experiments and Results
5.1 Hyper-parameters and Ablation Study
5.2 Wearer Recognition in Egocentric Videos
5.3 Wearer Recognition in Third Person Videos
5.4 Model Interpretability
6 Conclusion and Future Work
References
Captioning Images Taken by People Who Are Blind
1 Introduction
2 Related Work
3 VizWiz-Captions
3.1 Dataset Creation
3.2 Dataset Analysis
4 Algorithm Benchmarking
5 Conclusions
References
Improving Semantic Segmentation via Decoupled Body and Edge Supervision
1 Introduction
2 Related Work
3 Method
3.1 Decoupled Segmentation Framework
3.2 Body Generation Module
3.3 Edge Preservation Module
3.4 Decoupled Body and Edge Supervision
3.5 Network Architecture
4 Experiment
4.1 Ablation Studies
4.2 Visual Analysis
4.3 Results on Other Datasets
5 Conclusions
References
Conditional Entropy Coding for Efficient Video Compression
1 Introduction
2 Background and Related Work
2.1 Deep Image Compression
2.2 Video Compression
2.3 Internal Learning
3 Entropy-Focused Video Compression
3.1 Single-Image Encoder/Decoder
3.2 Conditional Entropy Model for Video Encoding
3.3 Rate-distortion Loss Function
4 Internal Learning of the Frame Code
5 Experiments
5.1 Datasets, Metrics, and Video Codecs
5.2 Runtime and Rate-distortion on UVG
5.3 Rate-distortion on NorthAmerica
5.4 Varying Framerates on UVG and CDVL
5.5 Qualitative Results
6 Conclusion
References
Differentiable Feature Aggregation Search for Knowledge Distillation
1 Introduction
2 Related Work
3 Method
3.1 Feature Distillation
3.2 Differentiable Group-Wise Search
3.3 Time Complexity Analysis
3.4 Implementation Details
4 Experiments
4.1 CIFAR-100
4.2 CINIC-10
4.3 The Effectiveness of Differentiable Search
5 Conclusion
References
Attention Guided Anomaly Localization in Images
1 Introduction
2 Related Works
3 Proposed Approach: CAVGA
3.1 Unsupervised Approach: CAVGAu
3.2 Weakly Supervised Approach: CAVGAw
4 Experimental Setup
5 Experimental Results
6 Ablation Study
7 Conclusion
References
Self-supervised Video Representation Learning by Pace Prediction
1 Introduction
2 Related Work
3 Our Approach
3.1 Pace Prediction
3.2 Contrastive Learning
3.3 Network Architecture and Training
4 Experiments
4.1 Datasets and Implementation Details
4.2 Ablation Studies
4.3 Action Recognition
4.4 Video Retrieval
5 Conclusion
References
Full-Body Awareness from Partial Observations
1 Introduction
2 Related Work
3 Approach
3.1 Base Models
3.2 Iterative Adaptation to Partial Visibility
3.3 Implementation Details
4 Experiments
4.1 Datasets and Annotations
4.2 Experimental Setup
4.3 Results on VLOG
4.4 Generalization Evaluations
4.5 Additional Comparisons
5 Discussion
References
Reinforced Axial Refinement Network for Monocular 3D Object Detection
1 Introduction
2 Related Work
3 Approach
3.1 Baseline and the Curse of Sampling in 3D Space
3.2 Towards Higher Sampling Efficiency
3.3 Refining 3D Detection with Reinforcement Learning
3.4 Parameter-Aware Data Enhancement
3.5 Implementation Details
4 Experiments
4.1 Dataset and Evaluation
4.2 Comparison to the State-of-the-Arts
4.3 Diagnostic Studies
4.4 Computational Costs
5 Conclusions
References
Self-supervised Multi-task Procedure Learning from Instructional Videos
1 Introduction
1.1 Prior Work
1.2 Paper Contributions
2 Self-supervised Procedure Learning
2.1 Proposed Framework
2.2 Proposed Learning Method
3 Experiments
3.1 Experimental Setup
3.2 Experimental Results
4 Conclusions
References
CosyPose: Consistent Multi-view Multi-object 6D Pose Estimation
1 Introduction
2 Related Work
3 Multi-view Multi-object 6D Object Pose Estimation
3.1 Approach Overview
3.2 Stage 1: Object Candidate Generation
3.3 Stage 2: Object Candidate Matching
3.4 Stage 3: Scene Refinement
4 Results
4.1 Single-View Single-Object Experiments
4.2 Multi-view Experiments
5 Conclusion
References
In-Domain GAN Inversion for Real Image Editing
1 Introduction
1.1 Related Work
2 In-Domain GAN Inversion
2.1 Domain-Guided Encoder
2.2 Domain-Regularized Optimization
3 Experiments
3.1 Experimental Settings
3.2 Semantic Analysis of the Inverted Codes
3.3 Inversion Quality and Speed
3.4 Real Image Editing
3.5 Ablation Study
4 Discussion and Conclusion
References
Key Frame Proposal Network for Efficient Pose Estimation in Videos
1 Introduction
2 Related Work
3 Proposed Approach
3.1 Atomic Dynamics-Based Representation of Temporal Data
3.2 Key Frame Selection Unsupervised Loss
3.3 Human Pose Interpolation
3.4 Architecture, Training, and Inference
3.5 Online Key Frame Detection
4 Experiments
4.1 Data Preprocessing and Evaluation Metrics
4.2 Qualitative Examples
4.3 Ablation Studies
4.4 Comparison Against the State-of-Art
4.5 Robustness of Our Approach
5 Conclusion
References
Exchangeable Deep Neural Networks for Set-to-Set Matching and Learning
1 Introduction
2 Preliminaries: Set-to-Set Matching
2.1 Mappings of Exchangeability
3 Matching and Learning for Sets
3.1 Cross-Set Feature Transformation
3.2 Calculating Matching Score for Sets
3.3 Training for Pairs of Sets
4 Related Works
5 Experiments
5.1 Overall Architecture
5.2 Baselines for Comparisons
5.3 Training Settings
5.4 Fashion Set Matching
5.5 Group Re-identification
5.6 Ablation Study
6 Conclusion
References
Making Sense of CNNs: Interpreting Deep Representations and Their Invariances with INNs
1 Introduction
2 Background
3 Approach
3.1 Recovering the Invariances of Deep Models
3.2 Interpreting Representations and Their Invariances
4 Experiments
4.1 Comparison to Existing Methods
4.2 Understanding Models
4.3 Effects of Data Shifts on Models
4.4 Modifying Representations
5 Conclusion
References
Cross-Modal Weighting Network for RGB-D Salient Object Detection
1 Introduction
2 Related Work
3 Proposed Method
3.1 Network Overview and Motivation
3.2 Low- and Middle-Level Cross-Modal Weighting
3.3 High-Level Cross-Modal Weighting
3.4 Implementation Details
4 Experiments
4.1 Datasets and Evaluation Metrics
4.2 Comparison with State-of-the-Art Methods
4.3 Ablation Studies
5 Conclusion
References
Open-Set Adversarial Defense
1 Introduction
2 Related Work
3 Background
4 Proposed Method
5 Experimental Results
5.1 Datasets
5.2 Baseline Methods
5.3 Quantitative Results
5.4 Ablation Study
5.5 Qualitative Results
6 Conclusion
References
Deep Image Compression Using Decoder Side Information
1 Introduction
2 Related Work
3 Deep Distributed Source Coding for Images
3.1 Architecture
3.2 Using Side Information
4 Experiments
4.1 Implementation Details
4.2 Results
4.3 Ablation Study
5 Conclusions
References
Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation
1 Introduction
2 Related Work
2.1 Synthetic Content Creation
2.2 Graph Generation
3 Methodology
3.1 Representing Synthetic Scenes
3.2 Generative Model
3.3 Training
4 Experiments
4.1 Multi MNIST
4.2 Aerial 2D
4.3 3D Driving Scenes
5 Conclusion
References
A Generic Visualization Approach pgfor Convolutional Neural Networks
1 Introduction
2 Related Work
3 Constrained Attention Filter (CAF)
3.1 Class-Oblivious Variant
3.2 Class-Specific Variant
4 Experiments
4.1 WSOL Using Classification Networks
4.2 WSOL Using Retrieval Networks
4.3 Recurrent Networks' Attention
4.4 Ablation Study
5 Conclusion
References
Interactive Annotation of 3D Object Geometry Using 2D Scribbles
1 Introduction
2 Related Work
3 Interactive 3D Annotation
3.1 Annotation Setup
3.2 Scribble Interaction Module
3.3 Point Interaction Module
4 Experiments
4.1 Experimental Settings
4.2 ShapeNet Annotation
4.3 Annotating Real Scans
4.4 Analysis
4.5 User Study
5 Conclusion
References
Hierarchical Kinematic Human Mesh Recovery
1 Introduction
2 Related Work
3 Approach
3.1 3D Body Representation
3.2 Hierarchical Kinematic Pose and Shape Estimation
3.3 Overall Learning Objective
3.4 In-the-Loop Optimization
4 Experiments and Results
5 Summary
References
Multi-loss Rebalancing Algorithm for Monocular Depth Estimation
1 Introduction
2 Related Work
3 Proposed Algorithm
3.1 Loss Function Space
3.2 Loss Rebalancing Algorithm
4 Experimental Results
4.1 Implementation Details
4.2 Datasets and Evaluation Metrics
4.3 Comparison with Conventional Algorithms
4.4 Ablation Studies
4.5 Different Backbone Networks
4.6 Time Complexity
5 Conclusions
References
Author Index

Polecaj historie